DM的主备集群部署模式下,备集群可以作为热备提供只读服务,在主集群发生故障的情况下可以切换为新主,降低服务不可用时间,以下内容为主备集群的主动切换和故障情况下的被动切换过程分析。
SQL> select * from v$instance;
行号 NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID DSC_SEQNO DSC_ROLE
---------- ---------- ------------- --------------- -------------- -------------------------- ------------------- ------------------- ------- ------- ----------- ----------- --------
BUILD_VERSION BUILD_TIME
---------------------------------- --------------------
1 GRP1_RT_01 GRP1_RT_01 1 kylin10-57-102 DM Database Server x64 V8 DB Version: 0x7000c 2025-10-11 12:06:10 OPEN PRIMARY 453331 0 NULL
1-4-6-2024.12.25-255012-20119-ENT Dec 27 2024 06:02:42
主动切换一般是为了对主库做一些维护操作,为了保证服务可用,就需要把备库切换为新主库保证服务不会中断。
主动切换可以使用普通监视器在前台运行时执行命令行指令,主要有两个步骤:
以下使用普通监视器对主备集群进行手工切换的输出日志:
login
用户名:SYSDBA
密码:
[monitor] 2025-10-11 16:01:59: 登录监视器成功!
switchover GRP1.GRP1_RT_02
此操作需谨慎, 将会导致主库发生切换, 是否继续使用GRP1.GRP1_RT_02执行SWITCHOVER操作(YES/NO/Y/N)?
y
[monitor] 2025-10-11 16:02:32: 开始切换实例GRP1_RT_02
[monitor] 2025-10-11 16:02:32: 通知守护进程GRP1_RT_01切换SWITCHOVER状态
[monitor] 2025-10-11 16:02:32: 守护进程(GRP1_RT_01)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2025-10-11 16:02:32: 切换守护进程GRP1_RT_01为SWITCHOVER状态成功
[monitor] 2025-10-11 16:02:32: 通知守护进程GRP1_RT_02切换SWITCHOVER状态
[monitor] 2025-10-11 16:02:32: 守护进程(GRP1_RT_02)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2025-10-11 16:02:32: 切换守护进程GRP1_RT_02为SWITCHOVER状态成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01开始执行ALTER DATABASE MOUNT语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01执行ALTER DATABASE MOUNT语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02开始执行ALTER DATABASE MOUNT语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02执行ALTER DATABASE MOUNT语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01开始执行ALTER DATABASE STANDBY语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01执行ALTER DATABASE STANDBY语句成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02开始执行ALTER DATABASE PRIMARY语句
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_02执行ALTER DATABASE PRIMARY语句成功
[monitor] 2025-10-11 16:02:32: 通知实例GRP1_RT_02修改所有归档状态无效
[monitor] 2025-10-11 16:02:32: 修改所有实例归档为无效状态成功
[monitor] 2025-10-11 16:02:32: 实例GRP1_RT_01开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_01执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_02开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_02执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_01开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_01执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2025-10-11 16:02:33: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2025-10-11 16:02:33: 通知守护进程GRP1_RT_01切换OPEN状态
[monitor] 2025-10-11 16:02:33: 守护进程(GRP1_RT_01)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2025-10-11 16:02:33: 切换守护进程GRP1_RT_01为OPEN状态成功
[monitor] 2025-10-11 16:02:33: 通知守护进程GRP1_RT_02切换OPEN状态
[monitor] 2025-10-11 16:02:33: 守护进程(GRP1_RT_02)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2025-10-11 16:02:34: 切换守护进程GRP1_RT_02为OPEN状态成功
[monitor] 2025-10-11 16:02:34: 通知组(GRP1)的守护进程执行清理操作
[monitor] 2025-10-11 16:02:34: 清理守护进程(GRP1_RT_01)请求成功
2025-10-11 16:02:34
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 453331 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT DETACHED
192.168.57.103 52142 2025-10-11 16:02:34 GLOBAL VALID OPEN GRP1_RT_02 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID FALSE
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.57.103 5236 OK GRP1_RT_02 OPEN PRIMARY 0 0 REALTIME VALID 4988 68955 4988 68955 NONE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT DETACHED
192.168.57.102 52141 2025-10-11 16:02:34 GLOBAL VALID OPEN GRP1_RT_01 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID FALSE
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.57.102 5236 OK GRP1_RT_01 OPEN STANDBY 0 0 REALTIME INVALID 4986 68864 4986 68864 NONE
DATABASE(GRP1_RT_01) APPLY INFO FROM (GRP1_RT_02), REDOS_PARALLEL_NUM (1), WAIT_APPLY[FALSE]:
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[4986, 4986, 4986], (RLSN, SLSN, KLSN)[68864, 68864, 68864], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (68864)
#================================================================================#
[monitor] 2025-10-11 16:02:34: 清理守护进程(GRP1_RT_02)请求成功
[monitor] 2025-10-11 16:02:34: 实例GRP1_RT_02切换成功
[monitor] 2025-10-11 16:02:36: 守护进程(GRP1_RT_02)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-10-11 16:02:36 RECOVERY OK GRP1_RT_02 OPEN PRIMARY VALID 19 68955 68955
[monitor] 2025-10-11 16:02:41: 守护进程(GRP1_RT_02)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-10-11 16:02:41 OPEN OK GRP1_RT_02 OPEN PRIMARY VALID 19 68955 68955
从以上日志中可以清晰地看到,将GRP1_RT_02切换为新主时,监视器主要做了这些操作:
从以上过程可以看到,主动切换主备的过程中,监视器会主动向主备库和主备守护进程发送一系列命令,先将所有实例状态转换为配置状态,再做归档配置,并且会对守护进程进行切换为切换状态,经过后续的一系列操作后最后会清理主备库守护进程的清理,主备才算切换完成。
切换过程中,原主库GRP1_RT_01的守护进程切换日志如下:
2025-10-11 16:02:31.013 [INFO] dmwatcher P0000025576 T0000000000000025615 dw2_group_set_mon_info org info, mid:210076262, cmd:0, dseq:0, code:100, status:0, mon_ip:, rarch_status:0, processed:0, sql:
2025-10-11 16:02:31.013 [INFO] dmwatcher P0000025576 T0000000000000025615 dw2_group_set_mon_info new info, mid:210076262, cmd:0, dseq:2357559919, code:0, status:1, mon_ip:::ffff:192.168.57.103, rarch_status:0, processed:0, sql:
2025-10-11 16:02:31.013 [INFO] dmwatcher P0000025576 T0000000000000025615 dw2_group_process_set_mid, old mid:210076262, new mid:210076262, dseq:2357559919
2025-10-11 16:02:31.127 [INFO] dmwatcher P0000025576 T0000000000000025615 dw2_group_set_mon_info org info, mid:210076262, cmd:0, dseq:2357559919, code:0, status:1, mon_ip:::ffff:192.168.57.103, rarch_status:0, processed:0, sql:
......(省略n行)
2025-10-11 16:02:36.769 [INFO] dmwatcher P0000025576 T0000000000000025580 Local(GRP1_RT_01) ohis is included in remote(GRP1_RT_02) ohis, and apply info matched, l_m_s_i(6), return act(ACT_KEEP)!
2025-10-11 16:02:37.266 [INFO] dmwatcher P0000025576 T0000000000000025580 Local apply info, p_db_magic:431410808, n_apply_ep:1, pkg_seq_arr:[4986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], apply_lsn_arr:[68864, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]!
2025-10-11 16:02:37.266 [INFO] dmwatcher P0000025576 T0000000000000025580 Remote apply info in ctl, ctl_guid:GRP1_RT_02_19, p_db_magic:431410808, n_apply_ep:1, pkg_seq_arr:[4986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], apply_lsn_arr:[68864, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]!
2025-10-11 16:02:37.266 [INFO] dmwatcher P0000025576 T0000000000000025580 Local(GRP1_RT_01) ohis is included in remote(GRP1_RT_02) ohis, and apply info matched, l_m_s_i(6), return act(ACT_KEEP)!
2025-10-11 16:02:39.775 [INFO] dmwatcher P0000025576 T0000000000000025591 Local db apply info changed, old info[p_db_magic:431410808, n_apply_ep:1], new info to set[p_db_magic:1810686012, n_apply_ep:1]!
2025-10-11 16:02:39.775 [INFO] dmwatcher P0000025576 T0000000000000025591 dw2_process_ohis_info, n_ohis changed from 18 to 19, from_ep:GRP1_RT_01, ep_seqno:0, hpc_master:0
2025-10-11 16:02:39.775 [INFO] dmwatcher P0000025576 T0000000000000025591 dw2_process_ohis_info local apply info, p_db_magic:1810686012, n_apply_ep:1, n_fill_ep:1 pkg_seq_arr:[4988, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], apply_lsn_arr:[68955, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]!
2025-10-11 16:02:41.095 [INFO] dmwatcher P0000025576 T0000000000000025747 远程实例的模式、状态或者归档状态发生变化,原状态是:
2025-10-11 16:02:41.095 [INFO] dmwatcher P0000025576 T0000000000000025747 Instance: 守护进程状态(RECOVERY) 实例状态(OK) 实例名(GRP1_RT_02) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(19) FLSN(68955) CLSN(68955) SLSN(68955) SSLSN(68955)
2025-10-11 16:02:41.152 [INFO] dmwatcher P0000025576 T0000000000000025747 远程实例的模式、状态或者归档状态发生变化,新状态是:
2025-10-11 16:02:41.215 [INFO] dmwatcher P0000025576 T0000000000000025747 Instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(GRP1_RT_02) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(19) FLSN(68955) CLSN(68955) SLSN(68955) SSLSN(68955)
主要步骤如下:
新主库GRP1_RT_02的守护进程切换日志如下:
2025-10-11 16:02:31.008 [INFO] dmwatcher P0000010386 T0000000000000010471 dw2_group_set_mon_info org info, mid:210076262, cmd:0, dseq:0, code:100, status:0, mon_ip:, rarch_status:0, processed:0, sql:
2025-10-11 16:02:31.008 [INFO] dmwatcher P0000010386 T0000000000000010471 dw2_group_set_mon_info new info, mid:210076262, cmd:0, dseq:2357559919, code:0, status:1, mon_ip:::ffff:192.168.57.103, rarch_status:0, processed:0, sql:
2025-10-11 16:02:31.008 [INFO] dmwatcher P0000010386 T0000000000000010471 dw2_group_process_set_mid, old mid:210076262, new mid:210076262, dseq:2357559919
2025-10-11 16:02:31.123 [INFO] dmwatcher P0000010386 T0000000000000010556 远程实例的模式、状态或者归档状态发生变化,原状态是:
2025-10-11 16:02:31.123 [INFO] dmwatcher P0000010386 T0000000000000010556 Instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(GRP1_RT_01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(18) FLSN(68864) CLSN(68864) SLSN(68864) SSLSN(68864)
2025-10-11 16:02:31.184 [INFO] dmwatcher P0000010386 T0000000000000010556 远程实例的模式、状态或者归档状态发生变化,新状态是:
2025-10-11 16:02:31.246 [INFO] dmwatcher P0000010386 T0000000000000010556 Instance: 守护进程状态(LOGIN CHECK) 实例状态(OK) 实例名(GRP1_RT_01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(18) FLSN(68864) CLSN(68864) SLSN(68864) SSLSN(68864)
......(省略n行)
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 dw2_clear_ep_cmd_info_low, clear ep(GRP1_RT_02) cmd info, and reset curr_ep to NULL.
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 notify ep(GRP1_RT_02) set dw_stat to NONE success!
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 dw2_clear_ep_cmd_info_low, clear ep(GRP1_RT_02) cmd info.
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 Clear all ep g_dw_status finished, Recovery finished!
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 switch sub_state to sub_stat_start!
2025-10-11 16:02:41.089 [INFO] dmwatcher P0000010386 T0000000000000010390 设置GRP1守护进程为OPEN(SUB:STARTUP)状态
原主库GRP1_RT_01的切换日志如下:
2025-10-11 16:02:31.189 [INFO] database P0000025410 T0000000000000025590 utsk_cmd_add, cmd info: cmd=217, dseq=1760155593, name_in=, begin_lsn=-1!
2025-10-11 16:02:31.190 [INFO] database P0000025410 T0000000000000025590 utsk_set_global_dw_stat, begin, msg_dseq:1760155593
2025-10-11 16:02:31.190 [INFO] database P0000025410 T0000000000000025590 set g_dw_stat from NONE to MON_LOGIN_CHECK success, g_dw_recover_stop is 0
2025-10-11 16:02:31.190 [INFO] database P0000025410 T0000000000000025590 utsk_set_global_dw_stat, finished, msg_dseq:1760155593, set code:0
2025-10-11 16:02:31.316 [INFO] database P0000025410 T0000000000000025590 utsk_cmd_add, cmd info: cmd=13, dseq=1760155594, name_in=, begin_lsn=-1!
2025-10-11 16:02:31.316 [INFO] database P0000025410 T0000000000000025442 utsk_cmd_exec, cmd:13, sys_status:OPEN, dseq:1760155594
2025-10-11 16:02:32.136 [INFO] database P0000025410 T0000000000000025590 utsk_cmd_add, cmd info: cmd=217, dseq=1760155595, name_in=, begin_lsn=-1!
......(省略n行)
2025-10-11 16:02:40.056 [INFO] database P0000025410 T0000000000000025451 mtsk_process_get_lsn_arr_info, mal_post to site(1) (msg_len:326, msg_send code:0), apply lsn filled code:0
2025-10-11 16:02:40.071 [INFO] database P0000025410 T0000000000000025440 rapply_rlog_consecutive_check_for_async, EP[0]'s rpkg->max_lsn[68955] == apply_lsn[68955], ignore this pkg!
2025-10-11 16:02:40.075 [INFO] database P0000025410 T0000000000000025488 Receive mal cmd from ep(GRP1_RT_02) to set local dw_status flag
2025-10-11 16:02:40.075 [INFO] database P0000025410 T0000000000000025488 set g_dw_stat from DW_RECOVERY to NONE success, g_dw_recover_stop is 0
2025-10-11 16:02:41.093 [INFO] database P0000025410 T0000000000000025488 Receive mal cmd from ep(GRP1_RT_02) to set local dw_status flag
2025-10-11 16:02:41.093 [INFO] database P0000025410 T0000000000000025488 set g_dw_stat from NONE to NONE success, g_dw_recover_stop is 0
新主库GRP1_RT_02的切换日志如下:
2025-10-11 16:02:32.429 [INFO] database P0000010219 T0000000000000010401 utsk_cmd_add, received sql exec cmd:1, dseq:1760155593, sql:SP_SET_GLOBAL_DW_STATUS(0, 6)
2025-10-11 16:02:32.429 [INFO] database P0000010219 T0000000000000010401 utsk_cmd_add, cmd info: cmd=1, dseq=1760155593, name_in=, begin_lsn=-1!
2025-10-11 16:02:32.429 [INFO] database P0000010219 T0000000000000010401 set g_dw_stat from NONE to MON_SWITCHOVER success, g_dw_recover_stop is 0
2025-10-11 16:02:32.429 [INFO] database P0000010219 T0000000000000010401 utsk_dw_sql exec SP_SET_GLOBAL_DW_STATUS(0, 6), code=0
2025-10-11 16:02:32.549 [INFO] database P0000010219 T0000000000000010401 utsk_cmd_add, received sql exec cmd:1, dseq:1760155594, sql:SP_APPLY_KEEP_PKG()
2025-10-11 16:02:32.549 [INFO] database P0000010219 T0000000000000010401 utsk_cmd_add, cmd info: cmd=1, dseq=1760155594, name_in=, begin_lsn=-1!
2025-10-11 16:02:32.549 [INFO] database P0000010219 T0000000000000010401 utsk_dw_sql, before enqueue, len:0
2025-10-11 16:02:32.549 [INFO] database P0000010219 T0000000000000010401 utsk_dw_sql, after enqueue, len:1
......(省略n行)
2025-10-11 16:02:33.063 [INFO] database P0000010219 T0000000000000010259 hpc_dw_apply_info_check_make, p_db_magic(431410808), n_apply_ep(1), apply_pkg_seq([4986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), apply_lsn_arr([68864, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), db_magic(1810686012), n_ep(1), apply_info_lsn(68864), pkg_seq_arr([4986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), file_lsn([68864, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), startup_flag(1)
2025-10-11 16:02:33.066 [INFO] database P0000010219 T0000000000000010259 ohis_ctl_add success, ctl info: rguid:GRP1_RT_02_19, sys_mode:PRIMARY, p_iname:GRP1_RT_01, c_iname:GRP1_RT_02, p_db_magic:431410808, c_db_magic:1810686012, n_ep:1, pkg_seqno:[4986], lsn_arr:[68864]
......(省略n行)
2025-10-11 16:18:38.606 [INFO] database P0000010219 T0000000000000010253 mtsk_apply_lsn_info_fill, n_rapply=1, apply_lsn_arr[15]=0!
2025-10-11 16:18:38.606 [INFO] database P0000010219 T0000000000000010253 mtsk_process_get_lsn_arr_info, mal_post to site(0) (msg_len:326, msg_send code:0), apply lsn filled code:0
2025-10-11 16:18:38.619 [INFO] database P0000010219 T0000000000000010251 rapply_rlog_consecutive_check_for_async, EP[0]'s rpkg->max_lsn[69034] == apply_lsn[69034], ignore this pkg!
2025-10-11 16:18:38.622 [INFO] database P0000010219 T0000000000000010295 Receive mal cmd from ep(GRP1_RT_01) to set local dw_status flag
2025-10-11 16:18:38.622 [INFO] database P0000010219 T0000000000000010295 set g_dw_stat from DW_RECOVERY to NONE success, g_dw_recover_stop is 0
2025-10-11 16:18:39.627 [INFO] database P0000010219 T0000000000000010295 Receive mal cmd from ep(GRP1_RT_01) to set local dw_status flag
2025-10-11 16:18:39.628 [INFO] database P0000010219 T0000000000000010295 set g_dw_stat from NONE to NONE success, g_dw_recover_stop is 0
被动切换可以有多种方式模拟主库故障,例如宕机、磁盘故障、网络故障等。
以下测试过程用网卡故障模拟主库的网络故障。
[GRP1_RT_02] 2025-10-28 22:35:04.980 [INFO] dmwatcher P0000001623 T0000000000000001630 没有收到远程守护进程(GRP1_RT_01)消息,原状态为(OPEN),距上次接收消息时间间隔(11)s, 设置远程守护进程为ERROR状态
[GRP1_RT_02] 2025-10-28 22:35:04.980 [INFO] dmwatcher P0000001623 T0000000000000001630 Local instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(GRP1_RT_02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) PO
CNT(30) FLSN(73347) CLSN(73347) SLSN(73347) SSLSN(73347)
[GRP1_RT_02] 2025-10-28 22:35:05.030 [INFO] dmwatcher P0000001623 T0000000000000001630 Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(GRP1_RT_01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(3
0) FLSN(73347) CLSN(73347) SLSN(73347) SSLSN(73347)
[monitor] 2025-10-28 22:35:05: dwmon_check_watch_error_low, receive dmwatcher(GRP1_RT_01) msg timeout(far_err_time:10), close port(ip:192.168.57.102, port:52141, vio:5, n_fixed:1)!
[monitor] 2025-10-28 22:35:05: dwmon tcp port vio(5) close, inst_name:GRP1_RT_01, ip:192.168.57.102, port:52141, n_fixed:1.
[monitor] 2025-10-28 22:35:05: <RECEIVE TIMEOUT GRP1_RT_01>
[monitor] 2025-10-28 22:35:05: 接收守护进程(GRP1_RT_01)消息超时
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-10-28 22:34:54 ERROR OK GRP1_RT_01 OPEN PRIMARY VALID 30 73347 73347
[monitor] 2025-10-28 22:35:05: </RECEIVE TIMEOUT GRP1_RT_01>
[monitor] 2025-10-28 22:35:05: dwmon_tcp_recv failed, close port, vio:0, mid:1784258418, errno:0, code:-6007
[monitor] 2025-10-28 22:35:05: dwmon tcp port vio(0) close, inst_name:, ip:192.168.57.102, port:52141, n_fixed:1.
[monitor] 2025-10-28 22:35:05: dwmon_inst_can_takeover_by_prim_info, check_pstat = 1, n_pinfo = 1, but ok_cnt = 0, return FALSE!
[monitor] 2025-10-28 22:35:05: [!!! 实例GRP1_RT_01[PRIMARY, OPEN, ISTAT_SAME:TRUE]仍然处于活动状态,或者模式/状态或LSN值不符合被其他实例自动接管条件 !!!]
[monitor] 2025-10-28 22:35:05: [!!! Check primary[GRP1_RT_01] auto_failover is TRUE in watcher error, but find no standby instance can do takeover !!!]
[monitor] 2025-10-28 22:35:07: dwmon_inst_can_takeover_by_prim_info, primary name: GRP1_RT_01, instance name: GRP1_RT_02, n_pinfo = 1, ok_cnt = 1, return TRUE!
[monitor] 2025-10-28 22:35:07: [!!! 实例GRP1_RT_01[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障,实例GRP1_RT_02[STANDBY, OPEN, ISTAT_SAME:TRUE]符合自动接管条件 !!!]
[monitor] 2025-10-28 22:35:07: dwmon_grp_auto_takeover_info_get, get tkov info of GRP1 success, n_inst = 1.
[monitor] 2025-10-28 22:35:07: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管
[monitor] 2025-10-28 22:35:07: <AUTO TAKEOVER GRP1_RT_02>
[monitor] 2025-10-28 22:35:07: 通知组(GRP1)当前活动的守护进程设置MID
[monitor] 2025-10-28 22:35:07: Begin to wait site(GRP1_RT_02) complete...
[monitor] 2025-10-28 22:35:07: [!!! dwmon_tcp_msg_send, get master tcp port failed, send cmd msg(cmd:5, name_sendto:GRP1) to dmwatcher() !!!]
[monitor] 2025-10-28 22:35:07: Wait site(GRP1_RT_02) finished, code=0!
[monitor] 2025-10-28 22:35:07: 通知组(GRP1)当前活动的守护进程设置MID成功
[monitor] 2025-10-28 22:35:07: 开始使用实例GRP1_RT_02接管
[monitor] 2025-10-28 22:35:07: 通知守护进程GRP1_RT_02切换TAKEOVER状态
[monitor] 2025-10-28 22:35:07: 守护进程(GRP1_RT_02)状态切换 [OPEN-->TAKEOVER]
[monitor] 2025-10-28 22:35:07: 切换守护进程GRP1_RT_02为TAKEOVER状态成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行ALTER DATABASE MOUNT语句
[monitor] 2025-10-28 22:35:07: Begin to wait site(GRP1_RT_02) complete...
[monitor] 2025-10-28 22:35:07: Wait site(GRP1_RT_02) finished, code=0!
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02执行ALTER DATABASE MOUNT语句成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行ALTER DATABASE PRIMARY语句
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02执行ALTER DATABASE PRIMARY语句成功
[monitor] 2025-10-28 22:35:07: 通知实例GRP1_RT_02修改所有归档状态无效
[monitor] 2025-10-28 22:35:07: 修改所有实例归档为无效状态成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2025-10-28 22:35:07: ohis_inst_info_copy_low, inst(GRP1_RT_02) apply info changed, old info[p_db_magic:431410808, n_apply_ep:1], new info to set[p_db_magic:1810686012, n_apply_ep:0]!
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2025-10-28 22:35:07: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor] 2025-10-28 22:35:08: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor] 2025-10-28 22:35:08: 通知守护进程GRP1_RT_02切换OPEN状态
[monitor] 2025-10-28 22:35:08: 守护进程(GRP1_RT_02)状态切换 [TAKEOVER-->OPEN]
[monitor] 2025-10-28 22:35:08: 切换守护进程GRP1_RT_02为OPEN状态成功
[monitor] 2025-10-28 22:35:08: 通知组(GRP1)的守护进程执行清理操作
[monitor] 2025-10-28 22:35:08: Notify instance(GRP1_RT_02) to clear monitor info and wait complete!
[monitor] 2025-10-28 22:35:08: Begin to wait site(GRP1_RT_02) complete...
[monitor] 2025-10-28 22:35:08: dwmon_cmd_msg_send_low failed, get tcp_port failed(inst_name:, ip:192.168.57.102, port:52141)!
[monitor] 2025-10-28 22:35:08: [!!! dwmon_tcp_msg_send to master tcp_port failed, code:-6010, (inst_name:, ip:192.168.57.102, port:52141, vio:0) !!!]
[monitor] 2025-10-28 22:35:08: 清理守护进程(GRP1_RT_02)请求成功
[monitor] 2025-10-28 22:35:08: Wait site(GRP1_RT_02) finished, code=0!
[monitor] 2025-10-28 22:35:08: 使用实例GRP1_RT_02接管成功
[monitor] 2025-10-28 22:35:08: </AUTO TAKEOVER GRP1_RT_02>
[monitor] 2025-10-28 22:35:08: 组(GRP1)使用实例GRP1_RT_02自动接管成功
监视器主要执行过程如下:
整个切换过程耗时4秒左右
2025-10-28 22:35:05.318 [INFO] dmwatcher P0000001621 T0000000000000001626 没有收到远程守护进程(GRP1_RT_02)消息,原状态为(OPEN),距上次接收消息时间间隔(11)s, 设置远程守护进程为ERROR状态
2025-10-28 22:35:05.318 [INFO] dmwatcher P0000001621 T0000000000000001626 Local instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(GRP1_RT_01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(30) FLSN(73347) CLSN(73347) SLSN(73347) SSLSN(73347)
2025-10-28 22:35:05.369 [INFO] dmwatcher P0000001621 T0000000000000001626 Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(GRP1_RT_02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(30) FLSN(73347) CLSN(73347) SLSN(73347) SSLSN(73347)
2025-10-28 22:35:05.419 [WARNING] dmwatcher P0000001621 T0000000000000001626 Tcp port from dmwatcher vio(6) close, ip:192.168.57.103, port:52142, mid(-1), is confirm(FALSE), from name:GRP1_RT_02
2025-10-28 22:35:05.420 [INFO] dmwatcher P0000001621 T0000000000000001626 dw2_tcp_port_close_low, reset report_err to TRUE, ip:192.168.57.103, port:52142!
2025-10-28 22:35:05.420 [INFO] dmwatcher P0000001621 T0000000000000002005 dw2_group_tcp_recv_thread, receive tcp msg failed, close tcp port, vio:0, mid:-1, from_flag:0, from_name:GRP1_RT_02, dw_closed:1, ip:192.168.57.103, port:52142, errno:0, error:Failure occurs in data_recv_inet_once, code(0) len(8128, 0), code:-6007
2025-10-28 22:35:05.420 [INFO] dmwatcher P0000001621 T0000000000000002005 dw2_tcp_port_close_low, reset report_err to TRUE, ip:192.168.57.103, port:52142!
2025-10-28 22:35:05.420 [INFO] dmwatcher P0000001621 T0000000000000002004 dw2_tcp_port_close_low, reset report_err to TRUE, ip:192.168.57.103, port:52142!
2025-10-28 22:35:11.212 [ERROR] dmwatcher P0000001621 T0000000000000001628 Can't connect to DM server on '192.168.57.103' port(52142) errno(115)
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000001626 Tcp msg send over (10)s, close tcp port(from_flag: 1, vio: 7)
2025-10-28 22:35:35.338 [WARNING] dmwatcher P0000001621 T0000000000000001626 Tcp port from dmmonitor vio(7) close, ip:::ffff:192.168.57.103, port:50950, mid(1784258418), is confirm(TRUE), from name:dmmonitor
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000001626 dw2_tcp_port_close_low, reset report_err to TRUE, ip:, port:0!
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002007 dw2_group_send_tcp_msg failed, from flag:1, from name:dmmonitor, close tcp port
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002007 dw2_tcp_port_close_low, reset report_err to TRUE, ip:, port:0!
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002007 dw2_group_info_send, send to port(from flag:1, from name:dmmonitor, ip:::ffff:192.168.57.103) over 5s, code:-6007!
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002007 dw2_group_tcp_send_thread, code:-6007, from flag:1, from name:dmmonitor, ip:::ffff:192.168.57.103
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002007 dw2_tcp_port_close_low, reset report_err to TRUE, ip:, port:0!
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002008 dw2_group_tcp_recv_thread, receive tcp msg failed, close tcp port, vio:0, mid:-1, from_flag:1, from_name:dmmonitor, dw_closed:1, ip:::ffff:192.168.57.103, port:50950, errno:0, error:Failure occurs in data_recv_inet_once, code(0) len(8128, 0), code:-6007
2025-10-28 22:35:35.338 [INFO] dmwatcher P0000001621 T0000000000000002008 dw2_tcp_port_close_low, reset report_err to TRUE, ip:, port:0!
原主库守护进程将自己的状态置为ERROR状态,并不断尝试连接备库实例
2025-10-28 22:35:04.980 [INFO] dmwatcher P0000001623 T0000000000000001630 没有收到远程守护进程(GRP1_RT_01)消息,原状态为(OPEN),距上次接收消息时间间隔(11)s, 设置远程守护进程为ERROR状态
2025-10-28 22:35:04.980 [INFO] dmwatcher P0000001623 T0000000000000001630 Local instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(GRP1_RT_02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(30) FLSN(73347)
CLSN(73347) SLSN(73347) SSLSN(73347)
2025-10-28 22:35:05.030 [INFO] dmwatcher P0000001623 T0000000000000001630 Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(GRP1_RT_01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(30) FLSN(73347) CLSN
(73347) SLSN(73347) SSLSN(73347)
2025-10-28 22:35:05.081 [WARNING] dmwatcher P0000001623 T0000000000000001630 Tcp port from dmwatcher vio(4) close, ip:::ffff:192.168.57.102, port:55472, mid(0), is confirm(FALSE), from name:GRP1_RT_01
2025-10-28 22:35:05.081 [INFO] dmwatcher P0000001623 T0000000000000001630 dw2_tcp_port_close_low, reset report_err to TRUE, ip:, port:0!
2025-10-28 22:35:05.081 [INFO] dmwatcher P0000001623 T0000000000000001643 dw2_group_tcp_recv_thread, receive tcp msg failed, close tcp port, vio:0, mid:-1, from_flag:0, from_name:GRP1_RT_01, dw_closed:1, ip:::
......(省略n行)
dseq:3931742076, code:100, status:1, mon_ip:::ffff:192.168.57.103, rarch_status:0, processed:0, sql:
2025-10-28 22:35:08.018 [INFO] dmwatcher P0000001623 T0000000000000001630 收到监视器通知守护进程切换OPEN状态命令
2025-10-28 22:35:08.018 [INFO] dmwatcher P0000001623 T0000000000000001630 After monitor execute SWITCHOVER/TAKEOVER/OPEN INSTANCE command, reset all standby instance recover info
2025-10-28 22:35:08.018 [INFO] dmwatcher P0000001623 T0000000000000001630 switch sub_state to sub_stat_start!
2025-10-28 22:35:08.018 [INFO] dmwatcher P0000001623 T0000000000000001630 设置GRP1守护进程为OPEN(SUB:STARTUP)状态
2025-10-28 22:35:08.069 [INFO] dmwatcher P0000001623 T0000000000000001630 dw2_set_mon_info_code_low, set code:0, current dseq:3931742076, last_code:100, mid:1784258418!
2025-10-28 22:35:08.127 [INFO] dmwatcher P0000001623 T0000000000000001891 dw2_group_reset_mon_info, in dw2 process mon clear, from mid:1784258418, old mid:1784258418, new mid:0
2025-10-28 22:35:08.127 [INFO] dmwatcher P0000001623 T0000000000000001891 收到监视器CLEAR命令
2025-10-28 22:35:08.127 [INFO] dmwatcher P0000001623 T0000000000000001891 监视器CLEAR命令执行成功
主要步骤如下:
2025-10-28 22:35:09.941 [INFO] database P0000001613 T0000000000000001948 mal_site_link_check->mal_recv_force_wait failed, code:-6010!, wait_time:10
2025-10-28 22:35:09.941 [WARNING] database P0000001613 T0000000000000001948 [!!!DSC INFO!!!]Network connect to IP address 192.168.57.103 has been disconnected.
2025-10-28 22:35:09.941 [WARNING] database P0000001613 T0000000000000001948 [!!!DSC INFO!!!]Site(0) or site(1) interface with IP address no longer running or dsc_ep service on mal_site[192.168.57.103:61142] has broken.It cause network disconnection.
2025-10-28 22:35:10.941 [INFO] database P0000001613 T0000000000000001948 mal_site_link_check->mal_recv_force_wait failed, code:-6010!, wait_time:1
2025-10-28 22:35:10.941 [WARNING] database P0000001613 T0000000000000001948 [!!!DSC INFO!!!]Network connect to IP address 192.168.57.103 has been disconnected.
2025-10-28 22:35:10.941 [WARNING] database P0000001613 T0000000000000001948 [!!!DSC INFO!!!]Site(0) or site(1) interface with IP address no longer running or dsc_ep service on mal_site[192.168.57.103:61142] has broken.It cause network disconnection.
2025-10-28 22:35:10.941 [INFO] database P0000001613 T0000000000000001948 mal_site_link_check from mal_site(0) to mal_site(1) failed, call mal_site_port_close!
2025-10-28 22:35:10.941 [WARNING] database P0000001613 T0000000000000001948 MAL mgr thrd site(0) lost connect to site(1), ctl_handle(53), data_handle(54), dsc_handle(0)
2025-10-28 22:35:10.941 [WARNING] database P0000001613 T0000000000000001942 mal_site_tsk_check site(0) connect lost to site(1), socket handle = 0, mal sys status = 0, try get port again
2025-10-28 22:35:10.941 [INFO] database P0000001613 T0000000000000001942 send CMD_MAL_LINK_CHECK(350): (mal_id:12044914, stmt_id:65676, mppexec_id:0, pln_op_id:65535, org_site :0, src_site:0, dest_site:1, build_time:-1)
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001943 mal_site_letter_recv code=-6007, errno=0, site(0) recv from site(1) failed, socket handle = 0
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001943 MAL receive site(0) lost connect to site(1), ctl_handle(0), data_handle(0), dsc_handle(0)
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001944 mal_site_letter_recv code=-6007, errno=0, site(0) recv from site(1) failed, socket handle = 0
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001943 site(0) ctl_link mal_site_letter_recv from site(1) failed, socket handle = 0, mal sys status is 0, try to get mal_port again
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001944 MAL receive site(0) lost connect to site(1), ctl_handle(0), data_handle(0), dsc_handle(0)
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001944 site(0) data_link mal_site_letter_recv from site(1) failed, socket handle = 0, mal sys status is 0, try to get mal_port again
2025-10-28 22:35:10.942 [WARNING] database P0000001613 T0000000000000001941 mal_site_tsk_check site(0) connect lost to site(1), socket handle = 0, mal sys status = 0, try get port again
2025-10-28 22:35:10.942 [INFO] database P0000001613 T0000000000000001941 send CMD_MAL_LINK_CHECK(350): (mal_id:12044914, stmt_id:65677, mppexec_id:0, pln_op_id:65535, org_site :0, src_site:0, dest_site:1, build_time:175)
2025-10-28 22:35:10.942 [INFO] database P0000001613 T0000000000000001941 send CMD_MAL_LINK_CHECK(350): (mal_id:14205394, stmt_id:65679, mppexec_id:0, pln_op_id:65535, org_site :0, src_site:0, dest_site:1, build_time:-1)
2025-10-28 22:35:10.942 [INFO] database P0000001613 T0000000000000001941 send CMD_MAL_LINK_CHECK(350): (mal_id:12044914, stmt_id:65680, mppexec_id:0, pln_op_id:65535, org_site :0, src_site:0, dest_site:1, build_time:138)
2025-10-28 22:35:26.059 [ERROR] database P0000001613 T0000000000000001948 Can't connect to DM server on '192.168.57.103' port(61142) errno(115)
主要步骤如下:
2025-10-28 22:35:05.955 [INFO] database P0000001619 T0000000000000001959 mal_site_link_check->mal_recv_force_wait failed, code:-6010!, wait_time:10
2025-10-28 22:35:05.955 [WARNING] database P0000001619 T0000000000000001959 [!!!DSC INFO!!!]Network connect to IP address 192.168.57.102 has been disconnected.
2025-10-28 22:35:05.955 [WARNING] database P0000001619 T0000000000000001959 [!!!DSC INFO!!!]Site(1) or site(0) interface with IP address no longer running or dsc_ep service on mal_site[192.168.57.102:61141] ha
s broken.It cause network disconnection.
2025-10-28 22:35:06.955 [INFO] database P0000001619 T0000000000000001959 mal_site_link_check->mal_recv_force_wait failed, code:-6010!, wait_time:1
2025-10-28 22:35:06.955 [WARNING] database P0000001619 T0000000000000001959 [!!!DSC INFO!!!]Network connect to IP address 192.168.57.102 has been disconnected.
2025-10-28 22:35:06.956 [WARNING] database P0000001619 T0000000000000001959 [!!!DSC INFO!!!]Site(1) or site(0) interface with IP address no longer running or dsc_ep service on mal_site[192.168.57.102:61141] ha
s broken.It cause network disconnection.
2025-10-28 22:35:06.956 [INFO] database P0000001619 T0000000000000001959 mal_site_link_check from mal_site(1) to mal_site(0) failed, call mal_site_port_close!
2025-10-28 22:35:06.956 [WARNING] database P0000001619 T0000000000000001959 MAL mgr thrd site(1) lost connect to site(0), ctl_handle(8), data_handle(9), dsc_handle(0)
......(省略n行)
2025-10-28 22:35:07.921 [INFO] database P0000001619 T0000000000000001693 altdb open pseg recv end, code = 0!
2025-10-28 22:35:07.921 [INFO] database P0000001619 T0000000000000001693 nsvr_process_before_open begin.
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 nsvr_process_before_open success.
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 altdb open rlog_flush_notify_ex start!
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 altdb open rlog_flush_notify_ex end!
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 alter database open success!
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 utsk_process_dw_sql_exec, prepare sql:ALTER DATABASE OPEN FORCE, code:0, dseq:1761661753
2025-10-28 22:35:07.937 [INFO] database P0000001619 T0000000000000001693 hpc_clear_active_trx adjust n_crash_active_trx from 0 to 0.
2025-10-28 22:35:07.941 [INFO] database P0000001619 T0000000000000001977 purge over, total 0 trx, 0 pages purged
2025-10-28 22:35:07.941 [INFO] database P0000001619 T0000000000000001977 n_roll_trx:0, n_purged_trx:0, n_active_trx:0, n_cmt_trx:0, n_pre_cmt_trx:0, n_to_release_trx:0
2025-10-28 22:35:07.966 [INFO] database P0000001619 T0000000000000002011 utsk_cmd_add, received sql exec cmd:1, dseq:1761661754, sql:SP_SET_GLOBAL_DW_STATUS(7, 0)
2025-10-28 22:35:07.966 [INFO] database P0000001619 T0000000000000002011 utsk_cmd_add, cmd info: cmd=1, dseq=1761661754, name_in=, begin_lsn=-1!
2025-10-28 22:35:07.966 [INFO] database P0000001619 T0000000000000002011 set g_dw_stat from MON_TAKEOVER to NONE success, g_dw_recover_stop is 0
2025-10-28 22:35:07.966 [INFO] database P0000001619 T0000000000000002011 utsk_dw_sql exec SP_SET_GLOBAL_DW_STATUS(7, 0), code=0
2025-10-28 22:35:08.118 [INFO] database P0000001619 T0000000000000001689 trx4_min_tid_collect set min_active_id_opt, min_active_id: 158450, first_tid: 158445
主要步骤如下:
主备模式下的监视器还有很多高可用功能需要深入去了解,例如探测主库备库是否为在线状态机制、网络不稳定状态下的主备切换过程、如何防止脑裂问题等,需要在实际使用过程中去探究深层次的原理进行后续实践问题解决。
文章
阅读量
获赞
