本文档主要描述了DM8 两节点DSC+单机实时备库集群修改心跳、业务IP步骤,文档中的内容均经过真实测试。
本次测试环境为两节点DSC+1实时备机集群,修改集群内所有服务器的心跳、业务IP,调整单节点备机仅使用业务IP。
集群IP变更信息如下:
| 主机 | 变更前业务IP | 变更前心跳IP | 变更后业务IP | 变更后心跳IP | 备注 |
|---|---|---|---|---|---|
| dsc01 | 192.168.59.142 | 192.168.48.142 | 192.168.59.132 | 192.168.48.132 | 数据库节点1 |
| dsc02 | 192.168.59.143 | 192.168.48.143 | 192.168.59.133 | 192.168.48.133 | 数据库节点2 |
| dw01 | 192.168.59.144 | 192.168.48.144 | 192.168.59.134 | 实时备机/监视器 |
操作步骤:
步骤1:停止备库守护进程(dw01节点)
# systemctl stop DmWatcherServiceWatcher
步骤2:停止DSC主库守护进程(dsc01、dsc02节点)
# systemctl stop DmWatcherServiceWatcher
步骤3:登录DSC监视器(监视器节点)
$ dmcssm /dmsoft/dsc_config/dmcssm.ini
步骤4:停止DSC DMSERVER服务
ep stop grp_dsc
步骤5:停止DSC DMASM服务
ep stop grp_asm
步骤6:停止DSS服务(dsc01、dsc02节点)
# systemctl stop DmCSSServiceCSS
步骤7:停止备库实例(dw01节点)
# systemctl stop DmCSSServiceCSS
注意:
1、停止集群前需检查当前集群状态是否正常,检查集群日志是否存在错误信息,避免其他因素影响本次变更;
2、集群停止需按照顺序执行,避免发生切换;
3、生产环境修改IP前建议做全库备份,避免因操作失败导致集群异常无法恢复情况。
修改服务器网卡配置IP,重启网络服务,修改IP时仔细检查配置文件。若服务器需要重启,需要先关闭集群服务的开机自启,避免因IP变更导致自启失败。
修改前建议先备份配置文件,若修改失败需要回滚配置。
操作步骤:
步骤1:备份DSC集群配置文件(dsc01、dsc02节点)
$ cd /dmsoft
$ cp -r dsc_config dsc_config_20251106_bak
步骤2:备份备库配置文件(dw01节点)
$ cd /dmdata/DAMENG
$ cp dmmal.ini dmmal.ini.20251106.bak
步骤3:登录dmasmcmd,导出dmdcr_cfg.ini(dsc01操作)
$ /dmsoft/dmdbms/bin/dmasmcmd
ASM> export dcrdisk '/dev/dm/asm-dmdcr' TO '/dmsoft/dsc_config/dmdcr_cfg.ini'
步骤4:编辑dmdcr_cfg.ini,修改IP地址(dsc01节点)
$ vim /dmsoft/dsc_config/dmdcr_cfg.ini
DCR_N_GRP = 3 #集群环境有多少个GROUP,范围:1~16
DCR_VTD_PATH = /dev/dm/asm-dmvote #规划为 vote 的磁盘
DCR_OGUID = 251024 #消息标识,一个组里面只有一个。
[GRP] #新建一个 GROUP
DCR_GRP_TYPE = CSS #组类型(CSS/ASM/DB)
DCR_GRP_NAME = GRP_CSS #组名
DCR_GRP_N_EP = 2 #组内节点个数
DCR_GRP_DSKCHK_CNT = 60 #磁盘心跳容错时间,单位:秒
[GRP_CSS]
DCR_EP_NAME = CSS0 #CSS 节点名
DCR_EP_HOST = 192.168.48.132 #心跳地址
DCR_EP_PORT = 11286 #CSS 端口
[GRP_CSS]
DCR_EP_NAME = CSS1
DCR_EP_HOST = 192.168.48.133
DCR_EP_PORT = 11286
[GRP]
DCR_GRP_TYPE = ASM
DCR_GRP_NAME = GRP_ASM
DCR_GRP_N_EP = 2
DCR_GRP_DSKCHK_CNT = 60
[GRP_ASM]
DCR_EP_NAME = ASM0 #ASM节点名,和dmasvrmal的MAL_INST_NAME一致
DCR_EP_SHM_KEY = 42424 #共享内存标识
DCR_EP_SHM_SIZE = 1024 #共享内存大小
DCR_EP_HOST = 192.168.48.132 #心跳地址
DCR_EP_PORT = 11276 #ASM 端口
DCR_EP_ASM_LOAD_PATH = /dev/dm
[GRP_ASM]
DCR_EP_NAME = ASM1
DCR_EP_SHM_KEY = 42425
DCR_EP_SHM_SIZE = 1024
DCR_EP_HOST = 192.168.48.133
DCR_EP_PORT = 11276
DCR_EP_ASM_LOAD_PATH = /dev/dm
[GRP]
DCR_GRP_TYPE = DB
DCR_GRP_NAME = GRP_DSC
DCR_GRP_N_EP = 2
DCR_GRP_DSKCHK_CNT = 60
[GRP_DSC]
DCR_EP_NAME = DSC0 #实例名,和dm.ini的INSTANCE_NAME一致
DCR_EP_SEQNO = 0 #组内序号,不能重复
DCR_EP_PORT = 5136 #实例端口,和dm.ini的PORT_NUM 一致
[GRP_DSC]
DCR_EP_NAME = DSC1
DCR_EP_SEQNO = 1
DCR_EP_PORT = 5136
步骤5:编辑dmasvrmal.ini,修改IP地址(dsc01、dsc02节点)
$ vim /dmsoft/dsc_config/dmasvrmal.ini
[MAL_INST0]
MAL_INST_NAME = ASM0
MAL_HOST = 192.168.48.132 #心跳地址
MAL_PORT = 11266 #MAL 监听端口
[MAL_INST1]
MAL_INST_NAME = ASM1
MAL_HOST = 192.168.48.133
MAL_PORT = 11266
步骤6:编辑dmmal.ini,修改IP地址(dsc01、dsc02、dw01节点)
$ vim dmmal.ini
MAL_CHECK_INTERVAL = 30 #MAL 链路检测时间间隔
MAL_CONN_FAIL_INTERVAL = 10 #判定 MAL 链路断开的时间
[MAL_INST1]
MAL_INST_NAME = DSC0
MAL_HOST = 192.168.48.132
MAL_PORT = 61141
MAL_INST_HOST = 192.168.59.132
MAL_INST_PORT = 5136
MAL_DW_PORT = 52141
MAL_INST_DW_PORT = 33141
[MAL_INST2]
MAL_INST_NAME = DSC1
MAL_HOST = 192.168.48.133
MAL_PORT = 61141
MAL_INST_HOST = 192.168.59.133
MAL_INST_PORT = 5136
MAL_DW_PORT = 52141
MAL_INST_DW_PORT = 33141
[MAL_INST3]
MAL_INST_NAME = DW01
MAL_HOST = 192.168.59.134
MAL_PORT = 61141
MAL_INST_HOST = 192.168.59.134
MAL_INST_PORT = 5136
MAL_DW_PORT = 52141
MAL_INST_DW_PORT = 33141
步骤7:编辑DSC集群监控dmcssm.ini,修改IP地址(DSC监视器节点)
$ vim /dmsoft/dsc_config/dmcssm.ini
CSSM_OGUID = 251024
CSSM_CSS_IP = 192.168.48.132:11286
CSSM_CSS_IP = 192.168.48.133:11286
CSSM_LOG_PATH = /dmlog/mlog
CSSM_LOG_FILE_SIZE = 512
CSSM_LOG_SPACE_LIMIT = 2048
步骤8:编辑主备集群监控dmcssm.ini,修改IP地址(dw01节点)
$ vim /dmsoft/dmmon/dmmonitor.ini
MON_DW_CONFIRM = 0
MON_LOG_PATH = /dmlog/mlog #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔60s定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 32 #每个日志文件最大32M
MON_LOG_SPACE_LIMIT = 1024 #限定日志文件总占用空间,0为不限制
[GRP1]
MON_INST_OGUID = 453332 #组 GRP1 的唯一OGUID值
#以下配置为监视器到组 GRP1 的守护进程的连接信息,以“IP:PORT”的形式配置
#IP 对应 dmmal.ini 中的 MAL_HOST,PORT 对应 dmmal.ini 中的 MAL_DW_PORT
MON_DW_IP = 192.168.48.132:52141/192.168.48.133:52141
MON_DW_IP = 192.168.59.134:52141
注意
1、修改dmdcr_cfg.ini文件前先备份,检查配置是否和当前集群情况一致,避免因之前修改过集群配置文件但未更新DCR磁盘情况;
2、所有节点的dmmal.ini文件需保持一致,建议先修改其中一个节点的配置,再拷贝至其他节点。
操作步骤:
步骤1:登录dmasmcmd,初始化磁盘(dsc01节点操作)
$ /dmsoft/dmdbms/bin/dmasmcmd
ASM> init dcrdisk '/dev/dm/asm-dmdcr' from '/dmsoft/dsc_config/dmdcr_cfg.ini' identified by 'Dameng_123'
按照顺序依次启动DSC集群DMDSS、DMASM、DMSERVER、备库DMSERVER、DSC节点DMWATCHER、备库WATCHER服务(本次测试DSC集群配置了DSS自动拉起DMASM、DMSERVER服务,只需启动DMCSS服务,若未配置需要手动启动各节点DMASM、DMSERVER服务)。
操作步骤:
步骤1:启动DSS服务(dsc01、dsc02节点,若配置自动拉起跳过步骤2、3、4,等待CSS自动拉起DMASM、DMSERVER服务后再执行步骤5即可)
# systemctl start DmCSSServiceCSS
步骤2:登录DSC监视器(监视器节点)
$ dmcssm /dmsoft/dsc_config/dmcssm.ini
步骤3:停止DSC DMASM服务
ep startup grp_asm
步骤4:停止DSC DMSERVER服务
ep startup grp_dsc
步骤5:启动备库实例(dw01节点)
# systemctl start DmServiceDW01
步骤5:启动DSC主库守护进程(dsc01、dsc02节点)
# systemctl start DmWatcherServiceWatcher
步骤5:启动备库守护进程(dw01节点)
# systemctl start DmWatcherServiceWatcher
启动集群后检查集群日志是否存在报错,登录监视器查看集群状态。
操作步骤:
步骤1:登录DSC监视器(监视器节点)
$ dmcssm /dmsoft/dsc_config/dmcssm.ini
步骤2:查看DSC集群状态
show
monitor current time:2025-11-06 11:37:13, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================
DSC_MODE = FULL
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid pid ts
2025-11-06 11:37:13 CSS0 0 11286 Control Node OPEN WORKING OK TRUE 4661352 7193 4663226
2025-11-06 11:37:13 CSS1 1 11286 Normal Node OPEN WORKING OK TRUE 4661929 6099 4663799
=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid pid ts
2025-11-06 11:37:13 ASM0 0 11276 Control Node OPEN WORKING OK TRUE 4694251 7275 4696056
2025-11-06 11:37:13 ASM1 1 11276 Normal Node OPEN WORKING OK TRUE 4697609 6170 4699404
=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid pid ts
2025-11-06 11:37:13 DSC0 0 5136 Control Node OPEN WORKING OK TRUE 10147718 8026 10147972
2025-11-06 11:37:13 DSC1 1 5136 Normal Node OPEN WORKING OK TRUE 10153529 6919 10153774
==================================================================================================================
操作步骤:
步骤1:登录守护集群监视器(dw01节点)
$ dmmonitor /dmsoft/dmmon/dmmonitor.ini
[monitor] 2025-11-06 12:51:39: DMMONITOR[4.0] V8
[monitor] 2025-11-06 12:51:39: DMMONITOR[4.0] IS READY.
[monitor] 2025-11-06 12:51:39:
#-----------------------------------------------------------------------------------------------#
GET MONITOR CONNECT INFO FROM DMWATCHER(DSC0), THE FIRST LINE IS SELF INFO.
DW_CONN_TIME MON_CONFIRM MID MON_IP MON_VERSION
2025-11-06 12:51:38 FALSE 1573726430 ::ffff:192.168.48.134 DMMONITOR[4.0] V8
#-----------------------------------------------------------------------------------------------#
[monitor] 2025-11-06 12:51:39: 收到守护进程(DSC1)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-11-06 12:51:38 STARTUP OK DSC0 OPEN PRIMARY VALID 11 48326 48326
[monitor] 2025-11-06 12:51:39: 收到守护进程(DW01)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-11-06 12:51:39 OPEN OK DW01 OPEN STANDBY VALID 11 48326 48326
[monitor] 2025-11-06 12:51:39: 收到守护进程(DSC0)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2025-11-06 12:51:38 OPEN OK DSC0 OPEN PRIMARY VALID 11 48326 48326
说明
1、本次共测试修改两次IP,第一次测试时查看守护集群状态正常,第二次测试时守护进程(DSC1)消息显示实例状态错误,未确认具体原因。
2、DSC1实例状态仅显示异常,实际DMCSSM监视器中状态正常,数据同步测试正常。
操作步骤:
步骤1:dsc01节点数据库,插入测试数据
$ disql SYSDBA:5136
SQL> create table t_test1106(id int);
SQL> insert into t_test1106 values(1106);
SQL>commit;
步骤2:dw01备库查询数据是否同步
$ disql SYSDBA:5136
SQL> select * from t_test1106;
行号 id
---------- -----------
1 1106
已用时间: 7.992(毫秒). 执行号:1.
文章
阅读量
获赞
