数据守护集群概念
DM 数据守护(Data Watch)是一种集成化的高可靠性解决方案,满足用户对数据
安全性和高可用性的要求。主要解决由于硬件故障、自然灾害等原因导致的数据库
服务长时间中断问题,提供不间断数据库服务。
主要功能:监控数据库状态、发送状态信息、监控其他守护进程消息、接收监视器消息、主备库启动运行、备库故障处理、备库异常处理、主库故障处理、故障修复处理
故障自动切换:主库发生故障时,确认监视器自动选择一个备库,切换为主库对外提供服务,故障自动切换模式,要求必须配置且只能配置一个确认监视器
故障手动切换:由用户根据实际情况,通过监视器命令将备库切换为主库。在用户干预之前,备库可以继续提供只读服务,但修改非临时表数据的操作将会失败。
监视器作用:监视数据守护状态、状态信息确认(配置为故障自动切换的确认监视器的情况下)、管理数据守护状态、发起故障自动接管命令
监视器类型:
监控模式(dmmonitor.ini:MON_DW_CONFIRM=0)
确认模式(dmmonitor.ini:MON_DW_CONFIRM=1)
区别:确认模式除了具备监控模式所有功能外,还具有状态确认和自动接管两个功能。
状态确认:主库守护进程监测到备库故障时,需要向监视器求证,确认备库是真的故障了,再启动故障处理流程将归档失效,避免脑裂。状态确认只对故障自动切换数据守护系统有效,主库守护进程在满足一定条件时,会切换到 Confirm 状态,然后根据不同的场景决定是否切换为 Failover 状态并启动故障处理流程。
自动接管:故障自动切换模式下,确认监视器检测到主库故障后,根据收到的主备 LSN、归档状态、MAL 链路状态等信息,确定一个接管备库,并将其切换为主库。
模拟生产环境一主一备一灾备,监视器放在灾备环境进行部署。
主机类型 | IP地址 | 实例名 | 操作系统 | 其他 |
---|---|---|---|---|
主库 | 192.168.20.31(内部通讯) | DM1 | centos7 | |
备库 | 192.168.20.32(内部通讯) | DM2 | centos7 | |
灾备库/监视器 | 192.168.20.33(内部通讯) | DM3 | centos7 | 监视器和灾备库共同部署 |
实例名 | OGUID | PORT_NUM | MAL_INST_DW_PORT | MAL_HOST | MAL_PORT | MAL_DW_PORT |
---|---|---|---|---|---|---|
DM1 | 45331 | 5236 | 5237 | 192.168.20.31 | 5238 | 5239 |
DM2 | 45331 | 5236 | 5237 | 192.168.20.32 | 5238 | 5239 |
DM3 | 45331 | 5236 | 5237 | 192.168.20.33 | 5238 | 5239 |
伪代码流程类似如下
./DMInstall.bin -i <\<EOF
1
n
y
21
1
/home/dmdba/dmdbms
y
y
EOF
DM1节点
cd /home/dmdba/dmdbms/bin
./dminit path=/home/dmdba/dmdbms/data/ PAGE\_SIZE=32 EXTENT\_SIZE=32 CASE\_SENSITIVE=Y DB\_NAME=DM1 INSTANCE\_NAME=DM1 PORT\_NUM=5236 LOG\_SIZE=256 BUFFER=512 SYSDBA\_PWD=Dameng123 SYSAUDITOR\_PWD=Dameng123
DM2节点
cd /home/dmdba/dmdbms/bin
./dminit path=/home/dmdba/dmdbms/data/ PAGE\_SIZE=32 EXTENT\_SIZE=32 CASE\_SENSITIVE=Y DB\_NAME=DM2 INSTANCE\_NAME=DM2 PORT\_NUM=5236 LOG\_SIZE=256 BUFFER=512 SYSDBA\_PWD=Dameng123 SYSAUDITOR\_PWD=Dameng123
DM3节点
cd /home/dmdba/dmdbms/bin
./dminit path=/home/dmdba/dmdbms/data/ PAGE\_SIZE=32 EXTENT\_SIZE=32 CASE\_SENSITIVE=Y DB\_NAME=DM3 INSTANCE\_NAME=DM3 PORT\_NUM=5236 LOG\_SIZE=256 BUFFER=512 SYSDBA\_PWD=Dameng123 SYSAUDITOR\_PWD=Dameng123
#DM1
su root
cd /home/dmdba/dmdbms/script/root
sh dm\_service\_installer.sh -t dmserver -dm\_ini /home/dmdba/dmdbms/data/DM1/dm.ini -p DMSERVER
#DM2
su root
cd /home/dmdba/dmdbms/script/root
sh dm\_service\_installer.sh -t dmserver -dm\_ini /home/dmdba/dmdbms/data/DM2/dm.ini -p DMSERVER
#DM3
su root
cd /home/dmdba/dmdbms/script/root
sh dm\_service\_installer.sh -t dmserver -dm\_ini /home/dmdba/dmdbms/data/DM3/dm.ini -p DMSERVER
vi .bash\_profile
export LD\_LIBRARY\_PATH="\$LD\_LIBRARY\_PATH:/home/dmdba/dmdbms/bin"
export DM\_HOME="/home/dmdba/dmdbms"
export PATH=\$PATH:\$DM\_HOME/bin
DmServiceDM1 start
DmServiceDM2 start
DmServiceDM3 start
\[dmdba\@localhost \~]\$ disql sysdba
password:
Server\[LOCALHOST:5236]\:mode is normal, state is open
login used time : 5.758(ms)
disql V8
SQL> alter database mount ;
executed successfully
used time: 157.325(ms). Execute id is 0.
SQL> alter database archivelog;
executed successfully
used time: 34.430(ms). Execute id is 0.
SQL> alter database add archivelog 'DEST=/home/dmdba/dmdbms/arch,type=local,file\_size=100,space\_limit=1024';
executed successfully
used time: 2.368(ms). Execute id is 0.
SQL>
SQL> alter database open ;
executed successfully
used time: 25.258(ms). Execute id is 0.
SQL>
vi /home/dmdba/dmdbms/data/DM1/dmarch.ini
#DaMeng Database Archive Configuration file
#0:高性能,1:事务一致
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
#本地归档类型
ARCH_TYPE = LOCAL
#本地归档存放路径
ARCH_DEST = /home/dmdba/dmdbms/data/DM1/arch
ARCH_FILE_SIZE = 100
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 2
ARCH_HANG_FLAG = 1
[ARCHIVE_REALTIME1]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM2
[ARCHIVE_REALTIME2]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM3
vi /home/dmdba/dmdbms/data/DM2/dmarch.ini
#DaMeng Database Archive Configuration file
#0:高性能,1:事务一致
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
#本地归档类型
ARCH_TYPE = LOCAL
#本地归档存放路径
ARCH_DEST = /home/dmdba/dmdbms/data/DM2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
ARCH_FLUSH_BUF_SIZE = 2
ARCH_HANG_FLAG = 1
[ARCHIVE_REALTIME1]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM1
[ARCHIVE_REALTIME2]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM3
vi /home/dmdba/dmdbms/data/DM3/dmarch.ini
#DaMeng Database Archive Configuration file
#0:高性能,1:事务一致
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
#本地归档类型
ARCH_TYPE = LOCAL
#本地归档存放路径
ARCH_DEST = /home/dmdba/dmdbms/data/DM2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
ARCH_FLUSH_BUF_SIZE = 2
ARCH_HANG_FLAG = 1
[ARCHIVE_REALTIME1]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM1
[ARCHIVE_REALTIME2]
#实时归档类型
ARCH_TYPE = REALTIME
#实时归档目标实例名
ARCH_DEST = DM2
SQL> create table t1(id int,name varchar(20));
executed successfully
used time: 21.249(ms). Execute id is 66601.
SQL> insert into t1 values(1,'a') ;
affect rows 1
used time: 4.601(ms). Execute id is 66602.
SQL> commit ;
executed successfully
used time: 1.122(ms). Execute id is 66603.
vi /home/dmdba/dmdbms/data/DM1/dm.ini
vi /home/dmdba/dmdbms/data/DM2/dm.ini
vi /home/dmdba/dmdbms/data/DM3/dm.ini
MAL_INI = 1 #dmmal.ini
ARCH_INI = 1 #dmarch.ini
#data watch
ALTER_MODE_STATUS = 0 #Whether to permit database user to alter database mode
ENABLE_OFFLINE_TS = 2 #Whether tablespace can be offline
vi /home/dmdba/dmdbms/data/DM1/dmmal.ini
vi /home/dmdba/dmdbms/data/DM2/dmmal.ini
vi /home/dmdba/dmdbms/data/DM3/dmmal.ini
MAL_CHECK_INTERVAL = 5 #MAL 链路检测时间间隔
MAL_CONN_FAIL_INTERVAL = 5 #判定 MAL 链路断开的时间
MAL_TEMP_PATH = /home/dmdba/dmdbms/data/malpath/ #临时文件目录
MAL_BUF_SIZE = 256 #单个 MAL 缓存大小,单位 MB
MAL_SYS_BUF_SIZE = 2048 #MAL 总大小限制,单位 MB
MAL_COMPRESS_LEVEL = 0 #MAL 消息压缩等级,0 表示不压缩
[MAL_INST1]
MAL_INST_NAME = DM1 #实例名,和 dm.ini 的 INSTANCE_NAME 一致
MAL_HOST = 192.168.20.31 #MAL 系统监听 TCP 连接的 IP 地址
MAL_PORT = 5238 #MAL 系统监听 TCP 连接的端口
MAL_INST_HOST = 192.168.30.31 #实例的对外服务 IP 地址
MAL_INST_PORT = 5236 #实例对外服务端口,和 dm.ini 的 PORT_NUM 一致
MAL_DW_PORT = 5239 #实例对应的守护进程监听 TCP 连接的端口
MAL_INST_DW_PORT = 5237 #实例监听守护进程 TCP 连接的端口
[MAL_INST2]
MAL_INST_NAME = DM2
MAL_HOST = 192.168.20.32
MAL_PORT = 5238
MAL_INST_HOST = 192.168.30.32
MAL_INST_PORT = 5236
MAL_DW_PORT = 5239
MAL_INST_DW_PORT = 5237
[MAL_INST3]
MAL_INST_NAME = DM3
MAL_HOST = 192.168.20.33
MAL_PORT = 5238
MAL_INST_HOST = 192.168.30.33
MAL_INST_PORT = 5236
MAL_DW_PORT = 5239
MAL_INST_DW_PORT = 5237
vi /home/dmdba/dmdbms/data/DM1/dmwatcher.ini
vi /home/dmdba/dmdbms/data/DM2/dmwatcher.ini
vi /home/dmdba/dmdbms/data/DM3/dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL #全局守护类型
DW_MODE = AUTO #MANUAL:故障手切 AUTO:故障自切
DW_ERROR_TIME = 10 #远程守护进程故障认定时间
INST_ERROR_TIME = 10 #本地实例故障认定时间
INST_RECOVER_TIME = 60 #主库守护进程启动恢复的间隔时间
INST_OGUID = 45331 #守护系统唯一 OGUID 值
INST_INI = /home/dmdba/dmdbms/data/DM1/dm.ini #dm.ini 文件路径
INST_AUTO_RESTART = 1 #打开实例的自动启动功能
INST_STARTUP_CMD = /home/dmdba/dmdbms/bin/dmserver #命令行方式启动
RLOG_SEND_THRESHOLD = 0 #指定主库发送日志到备库的时间阈值,默认关闭
RLOG_APPLY_THRESHOLD = 0 #指定备库重演日志的时间阈值,默认关闭
cd /home/dmdba/dmdbms/bin
disql sysdba -E "backup database backupset '/home/dmdba/dmdbms/data/DM1/bak/full\_dmbackup'; "
disql sysdba/Dameng123 -E 'select \* from v\$backupset;';
scp -r /home/dmdba/dmdbms/data/DM1/bak/full\_dmbackup root\@192.168.20.32:/home/dmdba/dmdbms/data/DM2/bak
scp -r /home/dmdba/dmdbms/data/DM1/bak/full\_dmbackup root\@192.168.20.33:/home/dmdba/dmdbms/data/DM3/bak
DmServiceDM2 stop
DmServiceDM3 stop
#DM2
cd /home/dmdba/dmdbms/bin
./dmrman
restore database '/home/dmdba/dmdbms/data/DM2/dm.ini' from backupset '/home/dmdba/dmdbms/data/DM2/bak/full\_dmbackup';
recover database '/home/dmdba/dmdbms/data/DM2/dm.ini' from backupset '/home/dmdba/dmdbms/data/DM2/bak/full\_dmbackup';
recover database '/home/dmdba/dmdbms/data/DM2/dm.ini' update db\_magic;
#DM3
cd /home/dmdba/dmdbms/bin
./dmrman
restore database '/home/dmdba/dmdbms/data/DM3/dm.ini' from backupset '/home/dmdba/dmdbms/data/DM3/bak/full\_dmbackup';
recover database '/home/dmdba/dmdbms/data/DM3/dm.ini' from backupset '/home/dmdba/dmdbms/data/DM3/bak/full\_dmbackup ';
recover database '/home/dmdba/dmdbms/data/DM3/dm.ini' update db\_magic;
su - root
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmserver -p DM1 -dm\_ini /home/dmdba/dmdbms/data/DM1/dm.ini -m mount
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmwatcher -p Watcher -watcher\_ini /home/dmdba/dmdbms/data/DM1/dmwatcher.ini
su - root
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmserver -p DM2 -dm\_ini /home/dmdba/dmdbms/data/DM2/dm.ini -m mount
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmwatcher -p Watcher -watcher\_ini /home/dmdba/dmdbms/data/DM2/dmwatcher.ini
su - root
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmserver -p DM3 -dm\_ini /home/dmdba/dmdbms/data/DM3/dm.ini -m mount
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmwatcher -p Watcher -watcher\_ini /home/dmdba/dmdbms/data/DM3/dmwatcher.ini
/home/dmdba/dmdbms/bin/DmServiceDM1 start
/home/dmdba/dmdbms/bin/disql SYSDBA/Dameng123
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
SP_SET_OGUID(45331);
ALTER DATABASE STANDBY;
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
/home/dmdba/dmdbms/bin/DmServiceDM2 start
/home/dmdba/dmdbms/bin/disql SYSDBA/Dameng123
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
SP_SET_OGUID(45331);
ALTER DATABASE STANDBY;
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
/home/dmdba/dmdbms/bin/DmServiceDM3 start
/home/dmdba/dmdbms/bin/disql SYSDBA/Dameng123
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
SP_SET_OGUID(45331);
ALTER DATABASE STANDBY;
SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
Select arch_mode from v$database ;
Select arch_name,arch_type,arch_dest from v$dm_arch_ini;
Select oguid from v$instance ;
Select * from v$dm_mal_ini ;
vi /home/dmdba/dmdbms/data/DM3/dmmonitor.ini
MON_DW_CONFIRM = 1 #确认监视器模式(0非确认监视器)
MON_LOG_PATH = /home/dmdba/dmdbms/log #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔60秒定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #日志文件最大512M
MON_LOG_SPACE_LIMIT = 2048 #不限定日志文件总占用空间
[GRP1]
MON_INST_OGUID = 45331 #组GRP1的唯一OGUID值
MON_DW_IP = 192.168.20.31:5239 #集群内部节点1
MON_DW_IP = 192.168.20.32:5239 #集群内部节点2
MON_DW_IP = 192.168.20.33:5239 #集群内部节点3
vi /home/dmdba/dmdbms/data/DM3/dmmonitor\_manual.ini
MON_DW_CONFIRM = 0 #确认监视器模式(0非确认监视器)
MON_LOG_PATH = /home/dmdba/dmdbms/log #监视器日志文件存放路径
MON_LOG_INTERVAL = 60 #每隔60秒定时记录系统信息到日志文件
MON_LOG_FILE_SIZE = 512 #日志文件最大512M
MON_LOG_SPACE_LIMIT = 2048 #不限定日志文件总占用空间
[GRP1]
MON_INST_OGUID = 45331 #组GRP1的唯一OGUID值
MON_DW_IP = 192.168.20.31:5239 #集群内部节点1
MON_DW_IP = 192.168.20.32:5239 #集群内部节点2
MON_DW_IP = 192.168.20.33:5239 #集群内部节点3
/home/dmdba/dmdbms/bin/DmWatcherServiceWatcher start
netstat -anop|grep -E '5236|5237|5238|5239'
/home/dmdba/dmdbms/script/root/dm\_service\_installer.sh -t dmmonitor -p Monitor -monitor\_ini /home/dmdba/dmdbms/data/DM3/dmmonitor.ini
su - dmdba
/home/dmdba/dmdbms/bin/DmMonitorServiceMonitor start
使用非确认监视器查看集群状态
/home/dmdba/dmdbms/bin/dmmonitor /home/dmdba/dmdbms/data/DM3/dmmonitor\_manual.ini
Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 22.142(ms)
disql V8
SQL> insert into t1 values(2,'3') ;
affect rows 1
used time: 5.680(ms). Execute id is 802.
SQL> commit ;
executed successfully
used time: 2.741(ms). Execute id is 803.
SQL> select * from t1;
LINEID ID NAME
---------- ----------- ----
1 1 a
2 2 3
used time: 2.168(ms). Execute id is 804
[dmdba@localhost dmdbms]$ disql sysdba/Dameng123
服务器[LOCALHOST:5236]:处于备库打开状态
登录使用时间 : 14.245(ms)
disql V8
SQL> select * from t1;
行号 ID NAME
---------- ----------- ----
1 1 a
2 2 3
已用时间: 12.207(毫秒). 执行号:1.
SQL>
##全局配置区
TIME_ZONE=(480)
LANGUAGE=(cn)
DMHA=(192.168.30.31:5236, 192.168.30.32:5236,192.168.30.33:5236)
##服务配置
[DMHA]
SWITCH_TIMES=(3)
SWITCH_INTERVAL=(100)
LOGIN_MODE=(1)
参数 | 默认值 | 调整模式 | 注释 |
---|---|---|---|
REDOS_PARALLEL_NUM | 1 | 静态 | REDO日志重演的线程数量,有效值1-64 |
REDOS_BUF_SIZE | 1024 | 静态 | 待重演日志堆积的内存限制,堆积的日志缓冲区占用内存超过则将会被延迟加入重演队列,内存释放后再进入队列。 |
REDOS_BUF_NUM | 4096 | 静态 | 待重演日志缓冲区允许堆积的数据限制。 |
REDOS_MAX_DELAY | 1800 | 静态 | 备库重演日志缓冲区的时间限制,超过限制则认为重演异常。 |
建议使用自动优化脚本,根据实际服务器进行调整优化。
DmMonitorServiceMonitor stop
DmWatcherServiceWatcher stop
DmWatcherServiceWatcher stop
DmServiceDMSERVER stop
DmServiceDMSERVER stop
[dmdba@localhost tmp]$ DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher: [ OK ]
[dmdba@localhost tmp]$ ps -ef|grep dmse
dmdba 2962 1 6 10:27 ? 00:00:00 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdbms/data/DM3/dm.ini mount
dmdba 3111 2380 0 10:27 pts/0 00:00:00 grep --color=auto dmse
[dmdba@localhost ~]$ DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher: [ OK ]
[dmdba@localhost ~]$ ps -ef|grep dms
dmdba 4015 1 10 10:28 ? 00:00:00 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdbms/data/DM1/dm.ini mount
dmdba 4146 3057 0 10:28 pts/0 00:00:00 grep --color=auto dms
[dmdba@localhost tmp]$ DmMonitorServiceMonitor start
Starting DmMonitorServiceMonitor: [ OK ]
[dmdba@localhost tmp]$
dmmonitor /home/dmdba/dmdbms/data/DM3/dmmonitor_manual.ini
show global info
重启主库服务器后,查看集群
查看可以切换的节点
choose switchover grp1
choose takeover grp1
查看可以切换的节点
choose switchover grp1
login
switchover grp1.DM1
show global info
正常状态下查看监视器
关闭备库DM2服务器后查看监视器
主库执行数据插入
Insert into t1 values (3,'dm2');
Commit ;
查看归档状态
SQL> select \* from v\$arch\_status;
监视器日志
重启备库DM1服务器后,查看监视器
在DM2服务器查看数据
Select * from t1 ;
数据同步完成
查看集群状态
集群自动恢复
通过监视器查看当前状态
主库服务器关闭
Init 0
查看监视器日志
查看所有服务器状态
DM1的wstatus变成了ERROR
DM2升级为了主库primary
在DM2上插入数据
insert into t1 values(111,'dm2insert1');
commit ;
select * from t1 ;
DM3节点查看数据
自动切换完成,启动DM1节点
监视器日志
查看所有节点状态
查看数据
集群恢复完成
关闭监视器
[dmdba@localhost ~]$ DmMonitorServiceMonitor stop
Stopping DmMonitorServiceMonitor:
[ OK ]
[dmdba@localhost ~]$
关闭主库守护进程
DmWatcherServiceWatcher stop
关闭主库数据库实例
[dmdba@localhost root]$ DmServiceDMSERVER stop
通过监视器查看集群状态
查看可以接管的实例
choose takeover grp1
choose takeover force grp1
使用DM1接管服务
查看集群状态
启动DM2数据库实例和守护进程
DmServiceDM2 start
DmWatcherServiceWatcher start
启动过程中监视器的日志
查看集群信息
show global info
集群恢复完成。
文章
阅读量
获赞