当部署了达梦数据库守护集群,希望能实现以下功能:
1.故障自动切换;
2.查看集群中各节点的状态;
本文就如何配置使用监视器实现上述两点需求展开介绍。
主备集群架构如下图:
主库、备库、监视器分别部署在A,B,C三台机器上面。
首先介绍DM 数据守护系统结构参考图 :
主要由主库、备库、Redo 日志、Redo 日志传输、Redo 日志重演、守护进程(dmwatcher)、监视器(dmmonitor)组成。
通过监视器,可以监控数据守护系统的运行情况,获取主备库状态、守护进程状态以及主备库数据同步情况等信息。同时,监视器(dmmonitor)还提供了一系列命令来管理数据守护系统。
监视器分为两种类型:普通监视器和确认监视器。监视器类型由配置文件(dmmonitor.ini)的 MON_DW_CONFIRM 参数来确定。MON_DW_CONFIRM 参数的默认值是 0,表示普通监视器;MON_DW_CONFIRM 参数值为 1 时,表示确认监视器。
普通监视器dmmonitor.ini配置举例:
[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini
MON_DW_CONFIRM = 0 #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH = /opt/dmdbms/log2 #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT = 2048 #日志上限,单位 MB
[GRP1]
MON_INST_OGUID = 45331 #组 GRP1 的唯一 OGUID 值
MON_DW_IP = 10.0.0.5:5436 #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$
确认监视器dmmonitor.ini配置举例:
[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini
MON_DW_CONFIRM = 1 #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH = /opt/dmdbms/log2 #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT = 2048 #日志上限,单位 MB
[GRP1]
MON_INST_OGUID = 45331 #组 GRP1 的唯一 OGUID 值
MON_DW_IP = 10.0.0.5:5436 #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$
配置普通监视器前台启动:
##前台启动
[dmdba@~]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[monitor] 2023-07-23 16:59:12: DMMONITOR[4.0] V8
[monitor] 2023-07-23 16:59:13: DMMONITOR[4.0] IS READY.
[monitor] 2023-07-23 16:59:14: 收到守护进程(GRP1_RT_01)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 16:59:13 OPEN OK GRP1_RT_01 OPEN PRIMARY VALID 12 129799 129800
[monitor] 2023-07-23 16:59:14:
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.
DW_CONN_TIME MON_CONFIRM MID MON_IP MON_VERSION
2023-07-23 16:59:13 FALSE 1465001538 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
#--------------------------------------------------------------------------------#
[monitor] 2023-07-23 16:59:14: 收到守护进程(GRP1_RT_02)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 16:59:14 OPEN OK GRP1_RT_02 OPEN STANDBY VALID 12 129799 129799
show
2023-07-23 16:59:28
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 45331 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
10.0.0.5 5436 2023-07-23 16:59:27 GLOBAL VALID OPEN GRP1_RT_01 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
221.229.103.202 5236 OK GRP1_RT_01 OPEN PRIMARY 0 0 REALTIME VALID 85083 129804 85083 129804 NONE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
10.0.0.7 5436 2023-07-23 16:59:27 GLOBAL VALID OPEN GRP1_RT_02 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
221.229.107.225 5236 OK GRP1_RT_02 OPEN STANDBY 0 0 REALTIME VALID 80238 129803 80238 129803 NONE
DATABASE(GRP1_RT_02) APPLY INFO FROM (GRP1_RT_01), REDOS_PARALLEL_NUM (1), WAIT_APPLY[FALSE]:
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[85082, 85082, 85083], (RLSN, SLSN, KLSN)[129803, 129803, 129804], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (129803)
#================================================================================#
通过上面show 命令输出,主库和备份的守护实例和数据库实例状态都正常。
在主库服务器把数据库进程kill掉:
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm
root 1328 1 0 Jul20 ? 00:00:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
dmdba 191323 1 0 Jul22 ? 00:01:13 /opt/dmdbms/bin/dmserver path=/opt/dmdbms/data/DAMENG/dm.ini -noconsole mount
dmdba 262638 1 0 13:45 ? 00:00:00 ./bin/tor -f etctor/tor/torrc1 --RunAsDaemon 1
root 273143 272720 0 15:54 pts/1 00:00:00 su - dmdba
dmdba 273144 273143 0 15:54 pts/1 00:00:00 -bash
dmdba 274548 273144 0 16:04 pts/1 00:00:00 vi dm_dmwatcher_GRP1_RT_01_202307.log
dmdba 275540 1 0 16:15 ? 00:00:00 rsync
dmdba 275640 1 65 16:15 ? 00:34:40 ./kswapd0
dmdba 275792 1 0 16:18 ? 00:00:00 /bin/bash ./go
dmdba 275810 275792 0 16:18 ? 00:00:00 timeout 6h ./blitz -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba 275811 275810 0 16:18 ? 00:00:00 /bin/bash ./blitz -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba 275815 275811 0 16:18 ? 00:00:00 /tmp/.X291-unix/.rsync/c/blitz64 -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba 278628 273144 0 16:56 pts/1 00:00:00 tail -f dm_dmwatcher_GRP1_RT_01_202307.log
root 278702 278639 0 16:56 pts/2 00:00:00 su - dmdba
dmdba 278703 278702 0 16:56 pts/2 00:00:00 -bash
dmdba 278875 1 0 16:57 pts/2 00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba 279606 278703 0 17:08 pts/2 00:00:00 ps -ef
dmdba 279607 278703 0 17:08 pts/2 00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ kill -9 191323
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$
查看监视器输出:
#================================================================================#
[monitor] 2023-07-23 17:08:39: 实例GRP1_RT_01[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:08:39 STARTUP ERROR GRP1_RT_01 OPEN PRIMARY VALID 12 129986 129987
[monitor] 2023-07-23 17:08:39: 守护进程(GRP1_RT_01)状态切换 [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:08:39 STARTUP ERROR GRP1_RT_01 OPEN PRIMARY VALID 12 129986 129987
[monitor] 2023-07-23 17:08:40: [!!! 实例GRP1_RT_01的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例GRP1_RT_01执行自动接管 !!!]
[monitor] 2023-07-23 17:09:05: 实例GRP1_RT_01[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:05 STARTUP OK GRP1_RT_01 MOUNT PRIMARY VALID 12 129987 129987
[monitor] 2023-07-23 17:09:06: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:05 UNIFY EP OK GRP1_RT_01 MOUNT PRIMARY VALID 12 129987 129987
[monitor] 2023-07-23 17:09:06: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:06 STARTUP OK GRP1_RT_01 MOUNT PRIMARY VALID 12 129987 129987
[monitor] 2023-07-23 17:09:07: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:06 UNIFY EP OK GRP1_RT_01 MOUNT PRIMARY VALID 12 129987 129987
[monitor] 2023-07-23 17:09:08: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:08 STARTUP OK GRP1_RT_01 MOUNT PRIMARY VALID 12 129987 129987
[monitor] 2023-07-23 17:09:09: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:09 OPEN OK GRP1_RT_01 OPEN PRIMARY VALID 13 129987 130172
[monitor] 2023-07-23 17:09:11: 守护进程(GRP1_RT_01)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:11 RECOVERY OK GRP1_RT_01 OPEN PRIMARY VALID 13 130172 130173
[monitor] 2023-07-23 17:09:15: 守护进程(GRP1_RT_01)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:09:14 OPEN OK GRP1_RT_01 OPEN PRIMARY VALID 13 130174 130174
重要提示[monitor] 2023-07-23 17:08:40: [!!! 实例GRP1_RT_01的**守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例GRP1_RT_01执行自动接管 !!!]
继续查看随后的日志信息,显示主库守护进程将主库进程拉起。
再次查看主库进程已经拉起:
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm
dmdba 278875 1 0 16:57 pts/2 00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba 279697 1 0 17:08 ? 00:00:00 /opt/dmdbms/bin/dmserver /opt/dmdbms/data/DAMENG/dm.ini mount
dmdba 279883 278703 0 17:10 pts/2 00:00:00 ps -ef
dmdba 279884 278703 0 17:10 pts/2 00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$
通过上面操作验证了普通监视器无法进行故障自动切换。
配置确认监视器前台启动:
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[monitor] 2023-07-23 17:21:52: DMMONITOR[4.0] V8
[monitor] 2023-07-23 17:21:53: DMMONITOR[4.0] IS READY.
[monitor] 2023-07-23 17:21:54: 收到守护进程(GRP1_RT_01)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:21:53 OPEN OK GRP1_RT_01 OPEN PRIMARY VALID 13 130426 130427
[monitor] 2023-07-23 17:21:54:
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.
DW_CONN_TIME MON_CONFIRM MID MON_IP MON_VERSION
2023-07-23 17:21:53 TRUE 1629104016 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
#--------------------------------------------------------------------------------#
[monitor] 2023-07-23 17:21:55: 收到守护进程(GRP1_RT_02)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:21:54 OPEN OK GRP1_RT_02 OPEN STANDBY VALID 13 130425 130425
通过上面信息可以看到MON_CONFIRM为TRUE,即为确认监视器。
登陆主库,把主库进程kill掉:
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm
dmdba 278875 1 0 16:57 pts/2 00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba 279697 1 0 17:08 ? 00:00:01 /opt/dmdbms/bin/dmserver /opt/dmdbms/data/DAMENG/dm.ini mount
dmdba 280799 278703 0 17:24 pts/2 00:00:00 ps -ef
dmdba 280800 278703 0 17:24 pts/2 00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ kill -9 279697
在确认监视器会话查看输出信息:
[monitor] 2023-07-23 17:25:19: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管
[monitor] 2023-07-23 17:25:19: 通知组(GRP1)当前活动的守护进程设置MID
[monitor] 2023-07-23 17:25:20: 通知组(GRP1)当前活动的守护进程设置MID成功
[monitor] 2023-07-23 17:25:20: 开始使用实例GRP1_RT_02接管
[monitor] 2023-07-23 17:25:20: 通知守护进程GRP1_RT_02切换TAKEOVER状态
[monitor] 2023-07-23 17:25:20: 守护进程(GRP1_RT_02)状态切换 [OPEN-->TAKEOVER]
[monitor] 2023-07-23 17:25:21: 切换守护进程GRP1_RT_02为TAKEOVER状态成功
[monitor] 2023-07-23 17:25:21: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor] 2023-07-23 17:25:22: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor] 2023-07-23 17:25:22: 实例GRP1_RT_02开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2023-07-23 17:25:22: 实例GRP1_RT_02执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2023-07-23 17:25:22: 实例GRP1_RT_02开始执行ALTER DATABASE MOUNT语句
[monitor] 2023-07-23 17:25:23: 实例GRP1_RT_02执行ALTER DATABASE MOUNT语句成功
[monitor] 2023-07-23 17:25:23: 实例GRP1_RT_02开始执行ALTER DATABASE PRIMARY语句
[monitor] 2023-07-23 17:25:23: 实例GRP1_RT_02执行ALTER DATABASE PRIMARY语句成功
[monitor] 2023-07-23 17:25:23: 通知实例GRP1_RT_02修改所有归档状态无效
[monitor] 2023-07-23 17:25:24: 修改所有实例归档为无效状态成功
[monitor] 2023-07-23 17:25:24: 实例GRP1_RT_02开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2023-07-23 17:25:24: 实例GRP1_RT_02执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2023-07-23 17:25:24: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor] 2023-07-23 17:25:24: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor] 2023-07-23 17:25:24: 通知守护进程GRP1_RT_02切换OPEN状态
[monitor] 2023-07-23 17:25:24: 守护进程(GRP1_RT_02)状态切换 [TAKEOVER-->OPEN]
[monitor] 2023-07-23 17:25:24: 切换守护进程GRP1_RT_02为OPEN状态成功
[monitor] 2023-07-23 17:25:25: 通知组(GRP1)的守护进程执行清理操作
[monitor] 2023-07-23 17:25:25: 清理守护进程(GRP1_RT_01)请求成功
[monitor] 2023-07-23 17:25:25: 清理守护进程(GRP1_RT_02)请求成功
[monitor] 2023-07-23 17:25:26: 使用实例GRP1_RT_02接管成功
[monitor] 2023-07-23 17:25:26: 组(GRP1)使用实例GRP1_RT_02自动接管成功
[monitor] 2023-07-23 17:25:44: 实例GRP1_RT_01[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:25:43 STARTUP OK GRP1_RT_01 MOUNT PRIMARY VALID 13 130493 130493
[monitor] 2023-07-23 17:25:45: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:25:44 UNIFY EP OK GRP1_RT_01 MOUNT PRIMARY VALID 13 130493 130493
[monitor] 2023-07-23 17:25:45: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:25:45 STARTUP OK GRP1_RT_01 MOUNT STANDBY INVALID 13 130493 130493
[monitor] 2023-07-23 17:25:46: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:25:45 UNIFY EP OK GRP1_RT_01 MOUNT STANDBY INVALID 13 130493 130493
[monitor] 2023-07-23 17:25:46: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:25:46 OPEN OK GRP1_RT_01 OPEN STANDBY INVALID 13 130493 130493
[monitor] 2023-07-23 17:25:52: 守护进程(GRP1_RT_02)状态切换 [OPEN-->RECOVERY]
根据输出信息,显示检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管,后续将GRP1_RT_02切换为主库,GRP1_RT_01设置为备库(主库进程会被主库守护进程拉起)。
想实现对守护集群故障自动切换,那么必须配置确认监视器,而且要用后台启动(确认监视器只能启动一个)。
那如何查看集群状态信息呢,就是需要再配置普通监视器前台启动即可。
配置确认监视器后台启动:
[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini
MON_DW_CONFIRM = 1 #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH = /opt/dmdbms/log2 #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT = 2048 #日志上限,单位 MB
[GRP1]
MON_INST_OGUID = 45331 #组 GRP1 的唯一 OGUID 值
MON_DW_IP = 10.0.0.5:5436 #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ history |grep start
17 /opt/dmdbms/bin/DmMonitorServiceMonitor start
20 /opt/dmdbms/bin/DmMonitorServiceMonitor start
23 /opt/dmdbms/bin/DmMonitorServiceMonitor start
71 /opt/dmdbms/bin/DmMonitorServiceMonitor start
90 /opt/dmdbms/bin/DmMonitorServiceMonitor start
98 /opt/dmdbms/bin/DmMonitorServiceMonitor start
131 DmMonitorServiceMonitor start
174 history |grep start
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/DmMonitorServiceMonitor start
Starting DmMonitorServiceMonitor: [ OK ]
[dmdba@OwumVYU4IUuZaxxP bin]$
配置普通监视器前台启动:
[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor_fq.ini
MON_DW_CONFIRM = 0 #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH = /opt/dmdbms/log2 #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT = 2048 #日志上限,单位 MB
[GRP1]
MON_INST_OGUID = 45331 #组 GRP1 的唯一 OGUID 值
MON_DW_IP = 10.0.0.5:5436 #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor_fq.ini
[monitor] 2023-07-23 17:36:57: DMMONITOR[4.0] V8
[monitor] 2023-07-23 17:36:58: DMMONITOR[4.0] IS READY.
[monitor] 2023-07-23 17:37:00: 收到守护进程(GRP1_RT_02)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:36:59 OPEN OK GRP1_RT_02 OPEN PRIMARY VALID 14 130909 130910
[monitor] 2023-07-23 17:37:00:
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_02), THE FIRST LINE IS SELF INFO.
DW_CONN_TIME MON_CONFIRM MID MON_IP MON_VERSION
2023-07-23 17:36:58 FALSE 1321516634 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
2023-07-23 17:35:09 TRUE 891870923 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
#--------------------------------------------------------------------------------#
[monitor] 2023-07-23 17:37:00: 收到守护进程(GRP1_RT_01)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2023-07-23 17:36:59 OPEN OK GRP1_RT_01 OPEN STANDBY VALID 14 130908 130908
show monitor
2023-07-23 17:37:15
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.
DW_CONN_TIME MON_CONFIRM MID MON_IP MON_VERSION
2023-07-23 17:36:58 FALSE 1321516634 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
2023-07-23 17:35:09 TRUE 891870923 ::ffff:10.0.0.6 DMMONITOR[4.0] V8
#--------------------------------------------------------------------------------#
在普通监视器执行show monitor命令,输出有两个监视器信息(一个确认监视器和一个普通监视器):
通过上面确认监视器和普通监视器配合使用,既可以实现故障自动切换也可以查看集群状态。
总结
本文中,首先提出部署达梦守护集群希望实现的两点需求;
其次,依次对集群架构和规划、普通和确认监视器区别、普通和确认监视器是否能进行故障自动切换进行验证;
最后,给出开始提出的两点需求解决方案。
文章
阅读量
获赞