注册
达梦数据库守护集群(主备)确认监视器和普通监视器配合使用技巧
培训园地/ 文章详情 /

达梦数据库守护集群(主备)确认监视器和普通监视器配合使用技巧

派拉蒙 2023/09/20 1077 0 0

概述

当部署了达梦数据库守护集群,希望能实现以下功能:
1.故障自动切换;
2.查看集群中各节点的状态;
本文就如何配置使用监视器实现上述两点需求展开介绍。

1.集群架构和规划介绍

主备集群架构如下图:
image.png

主库、备库、监视器分别部署在A,B,C三台机器上面。
规划.png

2.普通监视器和确认监视器区别

首先介绍DM 数据守护系统结构参考图 :
监视器.png

主要由主库、备库、Redo 日志、Redo 日志传输、Redo 日志重演、守护进程(dmwatcher)、监视器(dmmonitor)组成。
通过监视器,可以监控数据守护系统的运行情况,获取主备库状态、守护进程状态以及主备库数据同步情况等信息。同时,监视器(dmmonitor)还提供了一系列命令来管理数据守护系统。

监视器分为两种类型:普通监视器和确认监视器。监视器类型由配置文件(dmmonitor.ini)的 MON_DW_CONFIRM 参数来确定。MON_DW_CONFIRM 参数的默认值是 0,表示普通监视器;MON_DW_CONFIRM 参数值为 1 时,表示确认监视器。

普通监视器dmmonitor.ini配置举例:

[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini

MON_DW_CONFIRM             = 0  #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH               = /opt/dmdbms/log2  #监视器日志文件存放路径
MON_LOG_INTERVAL           = 60  #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE          = 512  #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT        = 2048  #日志上限,单位 MB
[GRP1]
MON_INST_OGUID           = 45331  #组 GRP1 的唯一 OGUID 值
MON_DW_IP                = 10.0.0.5:5436  #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP                = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ 

确认监视器dmmonitor.ini配置举例:

[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini

MON_DW_CONFIRM             = 1  #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH               = /opt/dmdbms/log2  #监视器日志文件存放路径
MON_LOG_INTERVAL           = 60  #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE          = 512  #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT        = 2048  #日志上限,单位 MB
[GRP1]
MON_INST_OGUID           = 45331  #组 GRP1 的唯一 OGUID 值
MON_DW_IP                = 10.0.0.5:5436  #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP                = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ 

3.验证普通监视器无法进行故障自动切换

配置普通监视器前台启动:

##前台启动
[dmdba@~]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[monitor]         2023-07-23 16:59:12: DMMONITOR[4.0] V8
[monitor]         2023-07-23 16:59:13: DMMONITOR[4.0] IS READY.

[monitor]         2023-07-23 16:59:14: 收到守护进程(GRP1_RT_01)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 16:59:13  OPEN           OK        GRP1_RT_01       OPEN        PRIMARY   VALID    12       129799          129800          

[monitor]         2023-07-23 16:59:14: 
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.

DW_CONN_TIME         MON_CONFIRM    MID            MON_IP                   MON_VERSION                                                     
2023-07-23 16:59:13  FALSE          1465001538     ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
#--------------------------------------------------------------------------------#

[monitor]         2023-07-23 16:59:14: 收到守护进程(GRP1_RT_02)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 16:59:14  OPEN           OK        GRP1_RT_02       OPEN        STANDBY   VALID    12       129799          129799          

show 
2023-07-23 16:59:28 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             45331       FALSE           AUTO            FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
10.0.0.5            5436         2023-07-23 16:59:27  GLOBAL    VALID     OPEN           GRP1_RT_01       OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
221.229.103.202     5236       OK        GRP1_RT_01       OPEN        PRIMARY   0          0            REALTIME  VALID    85083           129804          85083           129804          NONE                  

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
10.0.0.7            5436         2023-07-23 16:59:27  GLOBAL    VALID     OPEN           GRP1_RT_02       OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
221.229.107.225     5236       OK        GRP1_RT_02       OPEN        STANDBY   0          0            REALTIME  VALID    80238           129803          80238           129803          NONE                  

DATABASE(GRP1_RT_02) APPLY INFO FROM (GRP1_RT_01), REDOS_PARALLEL_NUM (1), WAIT_APPLY[FALSE]:
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[85082, 85082, 85083], (RLSN, SLSN, KLSN)[129803, 129803, 129804], N_TSK[0], TSK_MEM_USE[512] 
REDO_LSN_ARR: (129803)


#================================================================================#

通过上面show 命令输出,主库和备份的守护实例和数据库实例状态都正常。
在主库服务器把数据库进程kill掉:

[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm 
root        1328       1  0 Jul20 ?        00:00:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
dmdba     191323       1  0 Jul22 ?        00:01:13 /opt/dmdbms/bin/dmserver path=/opt/dmdbms/data/DAMENG/dm.ini -noconsole mount
dmdba     262638       1  0 13:45 ?        00:00:00 ./bin/tor -f etctor/tor/torrc1 --RunAsDaemon 1
root      273143  272720  0 15:54 pts/1    00:00:00 su - dmdba
dmdba     273144  273143  0 15:54 pts/1    00:00:00 -bash
dmdba     274548  273144  0 16:04 pts/1    00:00:00 vi dm_dmwatcher_GRP1_RT_01_202307.log
dmdba     275540       1  0 16:15 ?        00:00:00 rsync
dmdba     275640       1 65 16:15 ?        00:34:40 ./kswapd0
dmdba     275792       1  0 16:18 ?        00:00:00 /bin/bash ./go
dmdba     275810  275792  0 16:18 ?        00:00:00 timeout 6h ./blitz -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba     275811  275810  0 16:18 ?        00:00:00 /bin/bash ./blitz -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba     275815  275811  0 16:18 ?        00:00:00 /tmp/.X291-unix/.rsync/c/blitz64 -t 515 -f 1 -s 12 -S 8 -p 0 -d 1 p ip
dmdba     278628  273144  0 16:56 pts/1    00:00:00 tail -f dm_dmwatcher_GRP1_RT_01_202307.log
root      278702  278639  0 16:56 pts/2    00:00:00 su - dmdba
dmdba     278703  278702  0 16:56 pts/2    00:00:00 -bash
dmdba     278875       1  0 16:57 pts/2    00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba     279606  278703  0 17:08 pts/2    00:00:00 ps -ef
dmdba     279607  278703  0 17:08 pts/2    00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ 
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ kill   -9 191323 
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ 

查看监视器输出:

#================================================================================#

[monitor]         2023-07-23 17:08:39: 实例GRP1_RT_01[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:08:39  STARTUP        ERROR     GRP1_RT_01       OPEN        PRIMARY   VALID    12       129986          129987          

[monitor]         2023-07-23 17:08:39: 守护进程(GRP1_RT_01)状态切换 [OPEN-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:08:39  STARTUP        ERROR     GRP1_RT_01       OPEN        PRIMARY   VALID    12       129986          129987          

[monitor]         2023-07-23 17:08:40: [!!! 实例GRP1_RT_01的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例GRP1_RT_01执行自动接管 !!!]

[monitor]         2023-07-23 17:09:05: 实例GRP1_RT_01[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:05  STARTUP        OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    12       129987          129987          

[monitor]         2023-07-23 17:09:06: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:05  UNIFY EP       OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    12       129987          129987          

[monitor]         2023-07-23 17:09:06: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:06  STARTUP        OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    12       129987          129987          

[monitor]         2023-07-23 17:09:07: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:06  UNIFY EP       OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    12       129987          129987          

[monitor]         2023-07-23 17:09:08: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:08  STARTUP        OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    12       129987          129987          

[monitor]         2023-07-23 17:09:09: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:09  OPEN           OK        GRP1_RT_01       OPEN        PRIMARY   VALID    13       129987          130172          

[monitor]         2023-07-23 17:09:11: 守护进程(GRP1_RT_01)状态切换 [OPEN-->RECOVERY]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:11  RECOVERY       OK        GRP1_RT_01       OPEN        PRIMARY   VALID    13       130172          130173          

[monitor]         2023-07-23 17:09:15: 守护进程(GRP1_RT_01)状态切换 [RECOVERY-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:09:14  OPEN           OK        GRP1_RT_01       OPEN        PRIMARY   VALID    13       130174          130174          

重要提示[monitor] 2023-07-23 17:08:40: [!!! 实例GRP1_RT_01的**守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例GRP1_RT_01执行自动接管 !!!]
继续查看随后的日志信息,显示主库守护进程将主库进程拉起。

再次查看主库进程已经拉起:

[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm
dmdba     278875       1  0 16:57 pts/2    00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba     279697       1  0 17:08 ?        00:00:00 /opt/dmdbms/bin/dmserver /opt/dmdbms/data/DAMENG/dm.ini mount
dmdba     279883  278703  0 17:10 pts/2    00:00:00 ps -ef
dmdba     279884  278703  0 17:10 pts/2    00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ 

通过上面操作验证了普通监视器无法进行故障自动切换。

4.验证确认监视器可以接管故障自动切换

配置确认监视器前台启动:

[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor.ini
[monitor]         2023-07-23 17:21:52: DMMONITOR[4.0] V8
[monitor]         2023-07-23 17:21:53: DMMONITOR[4.0] IS READY.

[monitor]         2023-07-23 17:21:54: 收到守护进程(GRP1_RT_01)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:21:53  OPEN           OK        GRP1_RT_01       OPEN        PRIMARY   VALID    13       130426          130427          

[monitor]         2023-07-23 17:21:54: 
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.

DW_CONN_TIME         MON_CONFIRM    MID            MON_IP                   MON_VERSION                                                     
2023-07-23 17:21:53  TRUE           1629104016     ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
#--------------------------------------------------------------------------------#

[monitor]         2023-07-23 17:21:55: 收到守护进程(GRP1_RT_02)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:21:54  OPEN           OK        GRP1_RT_02       OPEN        STANDBY   VALID    13       130425          130425          

通过上面信息可以看到MON_CONFIRM为TRUE,即为确认监视器。

登陆主库,把主库进程kill掉:

[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ ps -ef |grep dm
dmdba     278875       1  0 16:57 pts/2    00:00:00 /opt/dmdbms/bin/dmwatcher path=/opt/dmdbms/data/DAMENG/dmwatcher.ini -noconsole
dmdba     279697       1  0 17:08 ?        00:00:01 /opt/dmdbms/bin/dmserver /opt/dmdbms/data/DAMENG/dm.ini mount
dmdba     280799  278703  0 17:24 pts/2    00:00:00 ps -ef
dmdba     280800  278703  0 17:24 pts/2    00:00:00 grep dm
[dmdba@OwumVYU4IUuZaxxP-0003 ~]$ kill   -9 279697

在确认监视器会话查看输出信息:

[monitor]         2023-07-23 17:25:19: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管

[monitor]         2023-07-23 17:25:19: 通知组(GRP1)当前活动的守护进程设置MID
[monitor]         2023-07-23 17:25:20: 通知组(GRP1)当前活动的守护进程设置MID成功
[monitor]         2023-07-23 17:25:20: 开始使用实例GRP1_RT_02接管
[monitor]         2023-07-23 17:25:20: 通知守护进程GRP1_RT_02切换TAKEOVER状态
[monitor]         2023-07-23 17:25:20: 守护进程(GRP1_RT_02)状态切换 [OPEN-->TAKEOVER]
[monitor]         2023-07-23 17:25:21: 切换守护进程GRP1_RT_02为TAKEOVER状态成功
[monitor]         2023-07-23 17:25:21: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor]         2023-07-23 17:25:22: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor]         2023-07-23 17:25:22: 实例GRP1_RT_02开始执行SP_APPLY_KEEP_PKG()语句
[monitor]         2023-07-23 17:25:22: 实例GRP1_RT_02执行SP_APPLY_KEEP_PKG()语句成功
[monitor]         2023-07-23 17:25:22: 实例GRP1_RT_02开始执行ALTER DATABASE MOUNT语句
[monitor]         2023-07-23 17:25:23: 实例GRP1_RT_02执行ALTER DATABASE MOUNT语句成功
[monitor]         2023-07-23 17:25:23: 实例GRP1_RT_02开始执行ALTER DATABASE PRIMARY语句
[monitor]         2023-07-23 17:25:23: 实例GRP1_RT_02执行ALTER DATABASE PRIMARY语句成功
[monitor]         2023-07-23 17:25:23: 通知实例GRP1_RT_02修改所有归档状态无效
[monitor]         2023-07-23 17:25:24: 修改所有实例归档为无效状态成功
[monitor]         2023-07-23 17:25:24: 实例GRP1_RT_02开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2023-07-23 17:25:24: 实例GRP1_RT_02执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2023-07-23 17:25:24: 实例GRP1_RT_02开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor]         2023-07-23 17:25:24: 实例GRP1_RT_02执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor]         2023-07-23 17:25:24: 通知守护进程GRP1_RT_02切换OPEN状态
[monitor]         2023-07-23 17:25:24: 守护进程(GRP1_RT_02)状态切换 [TAKEOVER-->OPEN]
[monitor]         2023-07-23 17:25:24: 切换守护进程GRP1_RT_02为OPEN状态成功
[monitor]         2023-07-23 17:25:25: 通知组(GRP1)的守护进程执行清理操作
[monitor]         2023-07-23 17:25:25: 清理守护进程(GRP1_RT_01)请求成功
[monitor]         2023-07-23 17:25:25: 清理守护进程(GRP1_RT_02)请求成功
[monitor]         2023-07-23 17:25:26: 使用实例GRP1_RT_02接管成功

[monitor]         2023-07-23 17:25:26: 组(GRP1)使用实例GRP1_RT_02自动接管成功

[monitor]         2023-07-23 17:25:44: 实例GRP1_RT_01[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:25:43  STARTUP        OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    13       130493          130493          

[monitor]         2023-07-23 17:25:45: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:25:44  UNIFY EP       OK        GRP1_RT_01       MOUNT       PRIMARY   VALID    13       130493          130493          

[monitor]         2023-07-23 17:25:45: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:25:45  STARTUP        OK        GRP1_RT_01       MOUNT       STANDBY   INVALID  13       130493          130493          

[monitor]         2023-07-23 17:25:46: 守护进程(GRP1_RT_01)状态切换 [STARTUP-->UNIFY EP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:25:45  UNIFY EP       OK        GRP1_RT_01       MOUNT       STANDBY   INVALID  13       130493          130493          

[monitor]         2023-07-23 17:25:46: 守护进程(GRP1_RT_01)状态切换 [UNIFY EP-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:25:46  OPEN           OK        GRP1_RT_01       OPEN        STANDBY   INVALID  13       130493          130493          

[monitor]         2023-07-23 17:25:52: 守护进程(GRP1_RT_02)状态切换 [OPEN-->RECOVERY]

根据输出信息,显示检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管,后续将GRP1_RT_02切换为主库,GRP1_RT_01设置为备库(主库进程会被主库守护进程拉起)。

5.确认监视器和普通监视器配合使用

想实现对守护集群故障自动切换,那么必须配置确认监视器,而且要用后台启动(确认监视器只能启动一个)。
那如何查看集群状态信息呢,就是需要再配置普通监视器前台启动即可。

配置确认监视器后台启动:

[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor.ini

MON_DW_CONFIRM             = 1  #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH               = /opt/dmdbms/log2  #监视器日志文件存放路径
MON_LOG_INTERVAL           = 60  #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE          = 512  #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT        = 2048  #日志上限,单位 MB
[GRP1]
MON_INST_OGUID           = 45331  #组 GRP1 的唯一 OGUID 值
MON_DW_IP                = 10.0.0.5:5436  #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP                = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ history |grep start
   17  /opt/dmdbms/bin/DmMonitorServiceMonitor start
   20  /opt/dmdbms/bin/DmMonitorServiceMonitor start
   23  /opt/dmdbms/bin/DmMonitorServiceMonitor start
   71   /opt/dmdbms/bin/DmMonitorServiceMonitor start
   90  /opt/dmdbms/bin/DmMonitorServiceMonitor start
   98  /opt/dmdbms/bin/DmMonitorServiceMonitor start
  131  DmMonitorServiceMonitor start
  174  history |grep start
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/DmMonitorServiceMonitor start
Starting DmMonitorServiceMonitor:                          [ OK ]
[dmdba@OwumVYU4IUuZaxxP bin]$ 

配置普通监视器前台启动:

[dmdba@OwumVYU4IUuZaxxP bin]$ cat dmmonitor_fq.ini 

MON_DW_CONFIRM             = 0  #0:非确认(故障手切) 1:确认(故障自切)
MON_LOG_PATH               = /opt/dmdbms/log2  #监视器日志文件存放路径
MON_LOG_INTERVAL           = 60  #每隔 60s 定时记录系统信息到日志文件
MON_LOG_FILE_SIZE          = 512  #单个日志大小,单位 MB
MON_LOG_SPACE_LIMIT        = 2048  #日志上限,单位 MB
[GRP1]
MON_INST_OGUID           = 45331  #组 GRP1 的唯一 OGUID 值
MON_DW_IP                = 10.0.0.5:5436  #IP 对应 MAL_HOST,PORT 对应 MAL_DW_PORT
MON_DW_IP                = 10.0.0.7:5436
[dmdba@OwumVYU4IUuZaxxP bin]$ /opt/dmdbms/bin/dmmonitor /opt/dmdbms/bin/dmmonitor_fq.ini
[monitor]         2023-07-23 17:36:57: DMMONITOR[4.0] V8
[monitor]         2023-07-23 17:36:58: DMMONITOR[4.0] IS READY.

[monitor]         2023-07-23 17:37:00: 收到守护进程(GRP1_RT_02)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:36:59  OPEN           OK        GRP1_RT_02       OPEN        PRIMARY   VALID    14       130909          130910          

[monitor]         2023-07-23 17:37:00: 
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_02), THE FIRST LINE IS SELF INFO.

DW_CONN_TIME         MON_CONFIRM    MID            MON_IP                   MON_VERSION                                                     
2023-07-23 17:36:58  FALSE          1321516634     ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
2023-07-23 17:35:09  TRUE           891870923      ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
#--------------------------------------------------------------------------------#

[monitor]         2023-07-23 17:37:00: 收到守护进程(GRP1_RT_01)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2023-07-23 17:36:59  OPEN           OK        GRP1_RT_01       OPEN        STANDBY   VALID    14       130908          130908          

show monitor
2023-07-23 17:37:15 
#--------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(GRP1_RT_01), THE FIRST LINE IS SELF INFO.

DW_CONN_TIME         MON_CONFIRM    MID            MON_IP                   MON_VERSION                                                     
2023-07-23 17:36:58  FALSE          1321516634     ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
2023-07-23 17:35:09  TRUE           891870923      ::ffff:10.0.0.6          DMMONITOR[4.0] V8
                                              
#--------------------------------------------------------------------------------#

在普通监视器执行show monitor命令,输出有两个监视器信息(一个确认监视器和一个普通监视器):
修改.png

通过上面确认监视器和普通监视器配合使用,既可以实现故障自动切换也可以查看集群状态。

总结
本文中,首先提出部署达梦守护集群希望实现的两点需求;
其次,依次对集群架构和规划、普通和确认监视器区别、普通和确认监视器是否能进行故障自动切换进行验证;
最后,给出开始提出的两点需求解决方案。

评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服