注册
DM 数据库主备同步与故障切换测试
技术分享/ 文章详情 /

DM 数据库主备同步与故障切换测试

codePanda 2025/12/31 124 0 0

DM 数据守护(Data Watch)是一种集成化的高可用、高性能数据库解决方案,是数据库异地容灾的首选方案。通过部署 DM 数据守护,可以在硬件故障(如磁盘损坏)、自然灾害(地震、火灾)等极端情况下,避免数据损坏、丢失,保障数据安全,并且可以快速恢复数据库服务,满足用户不间断提供数据库服务的要求。与常规的数据库备份(Backup)、还原(Restore)技术相比,数据守护可以更快地恢复数据库服务。随着数据规模不断增长,通过还原手段恢复数据,往往需要数个小时、甚至更长时间,而数据守护基本不受数据规模的影响,只需数秒时间就可以将备库切换为主库对外提供数据库服务。

一、测试环境

DM 版本为DM8,主备模式实时同步。

角色 主机名 实例名 IP 端口
主库 panda01 SCM01 192.168.66.201 15236
备库 panda02 SCM02 192.168.66.202 25236

二、验证主备同步功能

主库操作

在主库上创建表空间、用户并插入数据

-- 创建表空间和用户 create tablespace DATA datafile '/dm8/dmdata/SCM01/DATA01.DBF' size 200 AUTOEXTEND OFF; create user panda identified by "dm_OPS_123" DEFAULT tablespace DATA; grant dba to panda; -- 创建业务表 panda.testdw,插入 4 条数据并提交 [dmdba@panda01 SCM01]$ /dm8/dmdbms/bin/disql panda/dm_OPS_123@192.168.66.201:15236 服务器[192.168.66.201:15236]:处于主库打开状态 登录使用时间 : 4.349(ms) create table panda.testdw (id int,time TIMESTAMP DEFAULT SYSDATE); insert into panda.testdw (id) values(1); insert into panda.testdw (id) values(2); insert into panda.testdw (id) values(3); insert into panda.testdw (id) values(4); commit;

主库可以正常写入,符合预期。

备库验证

登录备库后可以看到:无法创建表空间、无法插入数据、可以正常查询主库数据

[dmdba@panda02 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.202:25236 服务器[192.168.66.202:25236]:处于备库打开状态 登录使用时间 : 4.538(ms) disql V8 SQL> create tablespace DATA datafile '/dm8/dmdata/SCM01/DATA01.DBF' size 200 AUTOEXTEND OFF; create tablespace DATA datafile '/dm8/dmdata/SCM01/DATA01.DBF' size 200 AUTOEXTEND OFF; [-710]:试图在STANDBY模式下,修改用户库. 已用时间: 4.917(毫秒). 执行号:0. SQL> select path from v$datafile; 行号 PATH ---------- ---------------------------- 1 /dm8/dmdata/SCM01/DATA01.DBF 2 /dm8/dmdata/SCM02/MAIN.DBF 3 /dm8/dmdata/SCM02/ROLL.DBF 4 /dm8/dmdata/SCM02/TEMP.DBF 5 /dm8/dmdata/SCM02/SYSTEM.DBF 已用时间: 15.518(毫秒). 执行号:102. SQL> select * from panda.testdw; 行号 ID TIME ---------- ----------- -------------------------- 1 1 2025-11-04 15:40:57.000000 2 2 2025-11-04 15:42:22.000000 3 3 2025-11-04 15:42:22.000000 4 4 2025-11-04 15:42:22.000000 已用时间: 0.392(毫秒). 执行号:101. SQL> insert into panda.testdw (id) values(1); insert into panda.testdw (id) values(1); [-710]:试图在STANDBY模式下,修改用户库. 已用时间: 15.965(毫秒). 执行号:0.

数据完整同步,说明,主库到备库实时同步正常,**备库禁止写操作,**这正是主备架构应有的行为。

三、验证手动主备切换(Switchover)

该场景适用于 主库正常、有计划的维护或割接。

切换命令如下:

-- 主库 panda01 SCM01 192.168.66.201:15236 -- 备库 panda02 SCM02 192.168.66.202:25236 choose switchover GRP1 -- 主机正常:查看可切换为主机的实例列表 Switchover GRP1.实例名 -- 主机正常:使用指定组的指定实例,切换为主机 choose takeover GRP1 -- 主机故障:查看可切换为主机的实例列表 Takeover GRP1.实例名 -- 主机故障:使用指定组的指定实例,切换为主机 choose takeover force GRP1 -- 强制切换:查看可切换为主机的实例列表 takeover force GRP1.实例名 -- 强制切换:使用指定组的指定实例,切换为主机

正常切换过程

通过 DM 监视器先登录sysdba用户,在执行切换命令:

login 用户名:sysdba 密码: [monitor] 2025-11-04 16:18:04: 登录监视器成功! choose switchover grp1 Can choose one of the following instances to do switchover: 1: GRP1_SCM_S switchover grp1.GRP1_SCM_S 此操作需谨慎, 将会导致主库发生切换, 是否继续使用GRP1.GRP1_SCM_S执行SWITCHOVER操作(YES/NO/Y/N)? Y [monitor] 2025-11-04 16:18:59: 开始切换实例GRP1_SCM_S [monitor] 2025-11-04 16:18:59: 通知守护进程GRP1_SCM_P切换SWITCHOVER状态 [monitor] 2025-11-04 16:18:59: 守护进程(GRP1_SCM_P)状态切换 [OPEN-->SWITCHOVER] [monitor] 2025-11-04 16:19:00: 切换守护进程GRP1_SCM_P为SWITCHOVER状态成功 [monitor] 2025-11-04 16:19:00: 通知守护进程GRP1_SCM_S切换SWITCHOVER状态 [monitor] 2025-11-04 16:19:00: 守护进程(GRP1_SCM_S)状态切换 [OPEN-->SWITCHOVER] [monitor] 2025-11-04 16:19:00: 切换守护进程GRP1_SCM_S为SWITCHOVER状态成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P开始执行ALTER DATABASE MOUNT语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P执行ALTER DATABASE MOUNT语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行SP_APPLY_KEEP_PKG()语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行SP_APPLY_KEEP_PKG()语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行ALTER DATABASE MOUNT语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行ALTER DATABASE MOUNT语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P开始执行ALTER DATABASE STANDBY语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P执行ALTER DATABASE STANDBY语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行ALTER DATABASE PRIMARY语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行ALTER DATABASE PRIMARY语句成功 [monitor] 2025-11-04 16:19:00: 通知实例GRP1_SCM_S修改所有归档状态无效 [monitor] 2025-11-04 16:19:00: 修改所有实例归档为无效状态成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P开始执行ALTER DATABASE OPEN FORCE语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P执行ALTER DATABASE OPEN FORCE语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行ALTER DATABASE OPEN FORCE语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行ALTER DATABASE OPEN FORCE语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句 [monitor] 2025-11-04 16:19:00: 实例GRP1_SCM_S执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功 [monitor] 2025-11-04 16:19:00: 通知守护进程GRP1_SCM_P切换OPEN状态 [monitor] 2025-11-04 16:19:00: 守护进程(GRP1_SCM_P)状态切换 [SWITCHOVER-->OPEN] [monitor] 2025-11-04 16:19:01: 切换守护进程GRP1_SCM_P为OPEN状态成功 [monitor] 2025-11-04 16:19:01: 通知守护进程GRP1_SCM_S切换OPEN状态 [monitor] 2025-11-04 16:19:01: 守护进程(GRP1_SCM_S)状态切换 [SWITCHOVER-->OPEN] [monitor] 2025-11-04 16:19:01: 切换守护进程GRP1_SCM_S为OPEN状态成功 [monitor] 2025-11-04 16:19:01: 通知组(GRP1)的守护进程执行清理操作 [monitor] 2025-11-04 16:19:01: 清理守护进程(GRP1_SCM_P)请求成功 [monitor] 2025-11-04 16:19:01: 清理守护进程(GRP1_SCM_S)请求成功 [monitor] 2025-11-04 16:19:01: 实例GRP1_SCM_S切换成功 2025-11-04 16:19:01 #================================================================================# GROUP OGUID MON_CONFIRM MODE MPP_FLAG GRP1 25114 TRUE AUTO FALSE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT DETACHED 192.168.66.102 25238 2025-11-04 16:19:00 GLOBAL VALID OPEN GRP1_SCM_S OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID FALSE EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.66.202 25236 OK GRP1_SCM_S OPEN PRIMARY 0 0 REALTIME VALID 3230 46481 3230 46482 NONE <<DATABASE GLOBAL INFO:>> DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT DETACHED 192.168.66.101 15238 2025-11-04 16:19:01 GLOBAL VALID OPEN GRP1_SCM_P OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID FALSE EP INFO: INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG 192.168.66.201 15236 OK GRP1_SCM_P OPEN STANDBY 0 0 REALTIME INVALID 3228 46405 3228 46405 NONE DATABASE(GRP1_SCM_P) APPLY INFO FROM (GRP1_SCM_S), REDOS_PARALLEL_NUM (1), WAIT_APPLY[FALSE]: DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[3228, 3228, 3228], (RLSN, SLSN, KLSN)[46405, 46405, 46405], N_TSK[0], TSK_MEM_USE[0] #================================================================================# [monitor] 2025-11-04 16:19:04: 守护进程(GRP1_SCM_S)状态切换 [OPEN-->RECOVERY] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:19:03 RECOVERY OK GRP1_SCM_S OPEN PRIMARY VALID 3 46482 46482 [monitor] 2025-11-04 16:19:07: 守护进程(GRP1_SCM_S)状态切换 [RECOVERY-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:19:06 OPEN OK GRP1_SCM_S OPEN PRIMARY VALID 3 46482 46482

切换后验证

主库降级为 STANDBY,备库提升为 PRIMARY,数据状态保持一致服务不中断,整个过程由 DM 自动完成,再次登录,主库已经改为备库

[dmdba@panda01 SCM01]$ /dm8/dmdbms/bin/disql panda/dm_OPS_123@192.168.66.201:15236 服务器[192.168.66.201:15236]:处于备库打开状态 登录使用时间 : 6.168(ms) disql V8 SQL> insert into panda.testdw (id) values(5); insert into panda.testdw (id) values(5); [-710]:试图在STANDBY模式下,修改用户库. 已用时间: 3.520(毫秒). 执行号:0.

四、模拟高可用故障场景

场景一:主库断电 / 重启

模拟方式:reboot

观察监视器信息

-- 上面主动切换了一次,现在状态如下,使用 reboot 模拟 panda02 主机上主库断电重启 -- 备库 panda01 SCM01 192.168.66.201:15236 -- 主库 panda02 SCM02 192.168.66.202:25236 [root@panda02 ~]# reboot -- 观察监视器信息 [monitor] 2025-11-04 16:26:31: 接收守护进程(GRP1_SCM_S)消息超时 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:09 ERROR OK GRP1_SCM_S OPEN PRIMARY VALID 3 46489 46489 [monitor] 2025-11-04 16:26:31: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管 [monitor] 2025-11-04 16:26:31: 通知组(GRP1)当前活动的守护进程设置MID [monitor] 2025-11-04 16:26:31: 通知组(GRP1)当前活动的守护进程设置MID成功 [monitor] 2025-11-04 16:26:31: 开始使用实例GRP1_SCM_P接管 [monitor] 2025-11-04 16:26:31: 通知守护进程GRP1_SCM_P切换TAKEOVER状态 [monitor] 2025-11-04 16:26:31: 守护进程(GRP1_SCM_P)状态切换 [OPEN-->TAKEOVER] [monitor] 2025-11-04 16:26:31: 切换守护进程GRP1_SCM_P为TAKEOVER状态成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行SP_APPLY_KEEP_PKG()语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行SP_APPLY_KEEP_PKG()语句成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行ALTER DATABASE MOUNT语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行ALTER DATABASE MOUNT语句成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行ALTER DATABASE PRIMARY语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行ALTER DATABASE PRIMARY语句成功 [monitor] 2025-11-04 16:26:31: 通知实例GRP1_SCM_P修改所有归档状态无效 [monitor] 2025-11-04 16:26:31: 修改所有实例归档为无效状态成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行ALTER DATABASE OPEN FORCE语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行ALTER DATABASE OPEN FORCE语句成功 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句 [monitor] 2025-11-04 16:26:31: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功 [monitor] 2025-11-04 16:26:31: 通知守护进程GRP1_SCM_P切换OPEN状态 [monitor] 2025-11-04 16:26:31: 守护进程(GRP1_SCM_P)状态切换 [TAKEOVER-->OPEN] [monitor] 2025-11-04 16:26:31: 切换守护进程GRP1_SCM_P为OPEN状态成功 [monitor] 2025-11-04 16:26:31: 通知组(GRP1)的守护进程执行清理操作 [monitor] 2025-11-04 16:26:31: 清理守护进程(GRP1_SCM_P)请求成功 [monitor] 2025-11-04 16:26:31: 使用实例GRP1_SCM_P接管成功 [monitor] 2025-11-04 16:26:31: 组(GRP1)使用实例GRP1_SCM_P自动接管成功 <<<<

DM 监视器可以通过检测主库心跳超时,自动触发 TAKEOVER,备库自动升级为主库整个过程秒级接管业务可继续写入。

-- 登录备库,已经变成了主库,s级别切换 [dmdba@panda01 SCM01]$ /dm8/dmdbms/bin/disql panda/dm_OPS_123@192.168.66.201:15236 服务器[192.168.66.201:15236]:处于主库打开状态 登录使用时间 : 4.724(ms) disql V8 -- panda02自动重启变成了备库 Last login: Tue Nov 4 15:26:59 2025 from 192.168.66.11 [root@panda02 ~]# su - dmdba 上一次登录:二 11月 4 15:29:37 CST 2025pts/1 上 [dmdba@panda02 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.202:25236 服务器[192.168.66.202:25236]:处于备库打开状态 登录使用时间 : 7.300(ms) disql V8 [monitor] 2025-11-04 16:26:58: 守护进程(GRP1_SCM_S)状态切换 [NONE-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:57 STARTUP OK GRP1_SCM_S MOUNT PRIMARY VALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:58 UNIFY EP OK GRP1_SCM_S MOUNT PRIMARY VALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_S)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:58 STARTUP OK GRP1_SCM_S MOUNT STANDBY INVALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:58 UNIFY EP OK GRP1_SCM_S MOUNT STANDBY INVALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_S)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:58 STARTUP OK GRP1_SCM_S OPEN STANDBY INVALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:58 OPEN OK GRP1_SCM_S OPEN STANDBY INVALID 3 46516 46516 [monitor] 2025-11-04 16:26:59: 守护进程(GRP1_SCM_P)状态切换 [OPEN-->RECOVERY] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:26:59 RECOVERY OK GRP1_SCM_P OPEN PRIMARY VALID 4 46561 46561 [monitor] 2025-11-04 16:27:01: 守护进程(GRP1_SCM_P)状态切换 [RECOVERY-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:27:01 OPEN OK GRP1_SCM_P OPEN PRIMARY VALID 4 46561 46561

场景二:主库断网

模拟方式:systemctl stop network

-- 模拟主机panda01上主库断网 [root@panda01 ~]# systemctl stop network -- 60s后 备库已经接管主库 [dmdba@panda02 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.202:25236 服务器[192.168.66.202:25236]:处于主库打开状态 登录使用时间 : 5.273(ms) disql V8 [monitor] 2025-11-04 16:30:56: 接收守护进程(GRP1_SCM_P)消息超时 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:30:35 ERROR OK GRP1_SCM_P OPEN PRIMARY VALID 4 46594 46594 [monitor] 2025-11-04 16:30:56: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管 [monitor] 2025-11-04 16:30:56: 通知组(GRP1)当前活动的守护进程设置MID [monitor] 2025-11-04 16:30:56: 通知组(GRP1)当前活动的守护进程设置MID成功 [monitor] 2025-11-04 16:30:56: 开始使用实例GRP1_SCM_S接管 [monitor] 2025-11-04 16:30:56: 通知守护进程GRP1_SCM_S切换TAKEOVER状态 [monitor] 2025-11-04 16:30:56: 守护进程(GRP1_SCM_S)状态切换 [OPEN-->TAKEOVER] [monitor] 2025-11-04 16:30:56: 切换守护进程GRP1_SCM_S为TAKEOVER状态成功 [monitor] 2025-11-04 16:30:56: 实例GRP1_SCM_S开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S开始执行SP_APPLY_KEEP_PKG()语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行SP_APPLY_KEEP_PKG()语句成功 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S开始执行ALTER DATABASE MOUNT语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行ALTER DATABASE MOUNT语句成功 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S开始执行ALTER DATABASE PRIMARY语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行ALTER DATABASE PRIMARY语句成功 [monitor] 2025-11-04 16:30:57: 通知实例GRP1_SCM_S修改所有归档状态无效 [monitor] 2025-11-04 16:30:57: 修改所有实例归档为无效状态成功 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S开始执行ALTER DATABASE OPEN FORCE语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行ALTER DATABASE OPEN FORCE语句成功 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句 [monitor] 2025-11-04 16:30:57: 实例GRP1_SCM_S执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功 [monitor] 2025-11-04 16:30:57: 通知守护进程GRP1_SCM_S切换OPEN状态 [monitor] 2025-11-04 16:30:57: 守护进程(GRP1_SCM_S)状态切换 [TAKEOVER-->OPEN] [monitor] 2025-11-04 16:30:57: 切换守护进程GRP1_SCM_S为OPEN状态成功 [monitor] 2025-11-04 16:30:57: 通知组(GRP1)的守护进程执行清理操作 [monitor] 2025-11-04 16:30:57: 清理守护进程(GRP1_SCM_S)请求成功 [monitor] 2025-11-04 16:30:57: 使用实例GRP1_SCM_S接管成功 [monitor] 2025-11-04 16:30:57: 组(GRP1)使用实例GRP1_SCM_S自动接管成功 systemctl start network [monitor] 2025-11-04 16:34:06: 实例GRP1_SCM_P[PRIMARY, SUSPEND, ISTAT_SAME:TRUE]故障 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:06 STARTUP ERROR GRP1_SCM_P SUSPEND PRIMARY VALID 4 46594 46594 [monitor] 2025-11-04 16:34:06: 守护进程(GRP1_SCM_P)状态切换 [NONE-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:06 STARTUP ERROR GRP1_SCM_P SUSPEND PRIMARY VALID 4 46594 46594 [monitor] 2025-11-04 16:34:21: 实例GRP1_SCM_P[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:21 STARTUP OK GRP1_SCM_P MOUNT PRIMARY VALID 4 46594 46594 [monitor] 2025-11-04 16:34:34: 守护进程(GRP1_SCM_P)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 UNIFY EP OK GRP1_SCM_P MOUNT PRIMARY VALID 4 46594 46594 [monitor] 2025-11-04 16:34:34: 守护进程(GRP1_SCM_P)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 STARTUP OK GRP1_SCM_P MOUNT STANDBY INVALID 4 46594 46594 [monitor] 2025-11-04 16:34:34: 守护进程(GRP1_SCM_P)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 UNIFY EP OK GRP1_SCM_P MOUNT STANDBY INVALID 4 46594 46594 [monitor] 2025-11-04 16:34:34: 守护进程(GRP1_SCM_P)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 STARTUP OK GRP1_SCM_P OPEN STANDBY INVALID 4 46594 46594 [monitor] 2025-11-04 16:34:34: 守护进程(GRP1_SCM_P)状态切换 [STARTUP-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 OPEN OK GRP1_SCM_P OPEN STANDBY INVALID 4 46594 46594 [monitor] 2025-11-04 16:34:35: 守护进程(GRP1_SCM_S)状态切换 [OPEN-->RECOVERY] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:34 RECOVERY OK GRP1_SCM_S OPEN PRIMARY VALID 5 46688 46688 [monitor] 2025-11-04 16:34:36: 守护进程(GRP1_SCM_S)状态切换 [RECOVERY-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:34:35 OPEN OK GRP1_SCM_S OPEN PRIMARY VALID 5 46688 46688

现象:主库网络中断,监视器在约 60 秒内检测到异常,备库自动接管为主库。

网络恢复后,原主库自动以 备库 身份重新加入,数据自动追平。

-- 自动切换到备库 Last login: Tue Nov 4 15:29:17 2025 from 192.168.66.11 [root@panda01 ~]# su - dmdba 上一次登录:二 11月 4 16:35:55 CST 2025pts/2 上 [dmdba@panda01 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.201:15236 服务器[192.168.66.201:15236]:处于备库打开状态 登录使用时间 : 5.475(ms) disql V8

场景三:主库进程异常宕机(硬件故障、系统hang、负载过高)

场景三和场景一很像模拟主库不可用,模拟方式:kill -9 dmserver_pid

-- kill -9 模拟主库异常宕机 [dmdba@panda02 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.202:25236 服务器[192.168.66.202:25236]:处于主库打开状态 登录使用时间 : 4.625(ms) disql V8 SQL> exit [dmdba@panda02 ~]$ ps -ef|grep dms dmdba 9704 1 0 16:26 ? 00:00:04 /dm8/dmdbms/bin/dmserver path=/dm8/dmdata/SCM02/dm.ini -noconsole mount dmdba 10645 10478 0 16:37 pts/0 00:00:00 grep --color=auto dms [dmdba@panda02 ~]$ [dmdba@panda02 ~]$ kill -9 9704 [dmdba@panda02 ~]$ ps -ef|grep dms dmdba 10671 10478 0 16:37 pts/0 00:00:00 grep --color=auto dms -- 监视器信息 [monitor] 2025-11-04 16:37:31: 实例GRP1_SCM_S[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:30 STARTUP ERROR GRP1_SCM_S OPEN PRIMARY VALID 5 46733 46733 [monitor] 2025-11-04 16:37:31: 守护进程(GRP1_SCM_S)状态切换 [OPEN-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:30 STARTUP ERROR GRP1_SCM_S OPEN PRIMARY VALID 5 46733 46733 [monitor] 2025-11-04 16:37:31: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管 [monitor] 2025-11-04 16:37:31: 通知组(GRP1)当前活动的守护进程设置MID [monitor] 2025-11-04 16:37:31: 通知组(GRP1)当前活动的守护进程设置MID成功 [monitor] 2025-11-04 16:37:31: 开始使用实例GRP1_SCM_P接管 [monitor] 2025-11-04 16:37:31: 通知守护进程GRP1_SCM_P切换TAKEOVER状态 [monitor] 2025-11-04 16:37:31: 守护进程(GRP1_SCM_P)状态切换 [OPEN-->TAKEOVER] [monitor] 2025-11-04 16:37:31: 切换守护进程GRP1_SCM_P为TAKEOVER状态成功 [monitor] 2025-11-04 16:37:31: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P开始执行SP_APPLY_KEEP_PKG()语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行SP_APPLY_KEEP_PKG()语句成功 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P开始执行ALTER DATABASE MOUNT语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行ALTER DATABASE MOUNT语句成功 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P开始执行ALTER DATABASE PRIMARY语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行ALTER DATABASE PRIMARY语句成功 [monitor] 2025-11-04 16:37:32: 通知实例GRP1_SCM_P修改所有归档状态无效 [monitor] 2025-11-04 16:37:32: 修改所有实例归档为无效状态成功 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P开始执行ALTER DATABASE OPEN FORCE语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行ALTER DATABASE OPEN FORCE语句成功 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句 [monitor] 2025-11-04 16:37:32: 实例GRP1_SCM_P执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功 [monitor] 2025-11-04 16:37:32: 通知守护进程GRP1_SCM_P切换OPEN状态 [monitor] 2025-11-04 16:37:32: 守护进程(GRP1_SCM_P)状态切换 [TAKEOVER-->OPEN] [monitor] 2025-11-04 16:37:32: 切换守护进程GRP1_SCM_P为OPEN状态成功 [monitor] 2025-11-04 16:37:32: 通知组(GRP1)的守护进程执行清理操作 [monitor] 2025-11-04 16:37:32: 清理守护进程(GRP1_SCM_P)请求成功 [monitor] 2025-11-04 16:37:32: 清理守护进程(GRP1_SCM_S)请求成功 [monitor] 2025-11-04 16:37:32: 使用实例GRP1_SCM_P接管成功 [monitor] 2025-11-04 16:37:32: 组(GRP1)使用实例GRP1_SCM_P自动接管成功 [monitor] 2025-11-04 16:37:53: 实例GRP1_SCM_S[PRIMARY, MOUNT, ISTAT_SAME:TRUE]恢复正常 WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:52 STARTUP OK GRP1_SCM_S MOUNT PRIMARY VALID 5 46733 46733 [monitor] 2025-11-04 16:37:53: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:52 UNIFY EP OK GRP1_SCM_S MOUNT PRIMARY VALID 5 46733 46733 [monitor] 2025-11-04 16:37:53: 守护进程(GRP1_SCM_S)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:53 STARTUP OK GRP1_SCM_S MOUNT STANDBY INVALID 5 46733 46733 [monitor] 2025-11-04 16:37:53: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->UNIFY EP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:53 UNIFY EP OK GRP1_SCM_S MOUNT STANDBY INVALID 5 46733 46733 [monitor] 2025-11-04 16:37:54: 守护进程(GRP1_SCM_S)状态切换 [UNIFY EP-->STARTUP] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:53 STARTUP OK GRP1_SCM_S OPEN STANDBY INVALID 5 46733 46733 [monitor] 2025-11-04 16:37:54: 守护进程(GRP1_SCM_S)状态切换 [STARTUP-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:53 OPEN OK GRP1_SCM_S OPEN STANDBY INVALID 5 46733 46733 [monitor] 2025-11-04 16:37:54: 守护进程(GRP1_SCM_P)状态切换 [OPEN-->RECOVERY] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:54 RECOVERY OK GRP1_SCM_P OPEN PRIMARY VALID 6 46785 46785 [monitor] 2025-11-04 16:37:56: 守护进程(GRP1_SCM_P)状态切换 [RECOVERY-->OPEN] WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN 2025-11-04 16:37:56 OPEN OK GRP1_SCM_P OPEN PRIMARY VALID 6 46785 46785

DM 行为:立即识别 PRIMARY 故障,备库自动 TAKEOVER,打开为 PRIMARY。

当原主库重新启动后,自动变为备库,数据全部同步完成,切换期间产生的数据 全部存在

-- 登录备库,已经转为主库,模拟一些事务信息 truncate table panda.testdw; insert into panda.testdw (id) values(100); create table panda.hang (id int,time TIMESTAMP DEFAULT SYSDATE); insert into panda.hang (id) values(1); insert into panda.hang (id) values(2); insert into panda.hang (id) values(3); insert into panda.hang (id) values(4); insert into panda.hang (id) values(5); insert into panda.hang (id) values(6); insert into panda.hang (id) values(7); insert into panda.hang (id) values(8); insert into panda.hang (id) values(9); commit; -- 再次启动 SCM02_P,发现自动重启了,数据也同步完成 [dmdba@panda02 ~]$ /dm8/dmdbms/bin/disql SYSDBA/dm_OPS_123@192.168.66.202:25236 服务器[192.168.66.202:25236]:处于备库打开状态 登录使用时间 : 5.622(ms) disql V8 SQL> SQL> select START_TIME from v$instance; 行号 START_TIME ---------- ------------------- 1 2025-11-04 16:37:49 已用时间: 5.317(毫秒). 执行号:2. SQL> select * from panda.testdw; 行号 ID TIME ---------- ----------- -------------------------- 1 100 2025-11-04 16:41:02.000000 已用时间: 4.554(毫秒). 执行号:3. SQL> select * from panda.hang; 行号 ID TIME ---------- ----------- -------------------------- 1 1 2025-11-04 16:41:02.000000 2 2 2025-11-04 16:41:02.000000 3 3 2025-11-04 16:41:02.000000 4 4 2025-11-04 16:41:02.000000 5 5 2025-11-04 16:41:02.000000 6 6 2025-11-04 16:41:02.000000 7 7 2025-11-04 16:41:02.000000 8 8 2025-11-04 16:41:02.000000 9 9 2025-11-04 16:41:02.000000 9 rows got 已用时间: 3.715(毫秒). 执行号:4.

五、测试结论总结

通过本次测试,可以得出DM 主备架构具备:实时数据同步、主写备读保护、平滑的手动切换(Switchover)、自动故障接管(Takeover)、支持断电、断网、进程崩溃等多种异常场景。

评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服