DMDSC 集群支持动态扩展节点,每次扩展可以在原有基础上增加一个节点。动态扩展节点要求当前 DMDSC 集群的所有节点都为 OK 状态,所有 dmserver 实例都 处于 OPEN 状态,且可以正常访问。
注意:扩展节点过程中,不应该有修改数据库状态或模式的操作。
[monitor] 2022-06-16 13:44:57: CSS MONITOR V8
[monitor] 2022-06-16 13:45:17: CSS MONITOR SYSTEM IS READY.
[monitor] 2022-06-16 13:45:17: Wait CSS Control Node choosed...
[monitor] 2022-06-16 13:45:18: Wait CSS Control Node choosed succeed.
show
monitor current time:2022-06-16 13:45:21, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 1] ========================================
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 CSS0 0 5336 Normal Node OPEN WORKING OK TRUE 13419125 13421771
2022-06-16 13:45:20 CSS1 1 5337 Control Node OPEN WORKING OK TRUE 13451753 13454404
=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 13432204 13434808
2022-06-16 13:45:20 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 13465137 13467745
=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 14261853 14263514
2022-06-16 13:45:20 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 14300882 14302532
==================================================================================================================
已用时间: 2.891(毫秒). 执行号:313.
[dmdba@dmdsc01 ~]$ dmasmcmd
DMASMCMD V8
ASM>export dcrdisk '/dev/raw/raw1' to '/home/dmdba/dmdcr_cfg_bak3.ini'
ASMCMD export DCRDISK success.
Used time: 4.969(ms).
服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 5.016(ms)
disql V8
SQL> alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 2048,'+DMLOG/log/dsc2_log02.log' size 2048;
操作已执行
已用时间: 504.634(毫秒). 执行号:300.
--查看是否添加成功
[dmdba@dmdsc02 ~]$ dmasmtool dcr_ini=/dm8/dsc/config/dmdcr.ini
DMASMTOOL V8
ASM>ls
+
disk groups total [4]......
NO.1 name: DMLOG
NO.2 name: DMDATA
NO.3 name: VOTE
NO.4 name: DCR
Used time: 4.446(ms).
ASM>ls -r +DMLOG
+DMLOG:
dir : log
+DMLOG/log:
file : dsc0_log01.log
file : dsc0_log02.log
file : dsc1_log01.log
file : dsc1_log02.log
file : dsc2_log01.log
file : dsc2_log02.log
Used time: 32.808(ms).
dmdba@192.168.10.102's password:
dminit20220607111325.log 100% 1157 1.1KB/s 00:00
sqllog.ini 100% 481 0.5KB/s 00:00
dmwatcher.ini 100% 892 0.9KB/s 00:00
dmmal.ini 100% 841 0.8KB/s 00:00
dmarch.ini 100% 395 0.4KB/s 00:00
dm.ini
--节点dsc2
[dmdba@dmdw01 dsc2_config]$ vi dmarch.ini
ARCH_LOCAL_SHARE = 1
[ARCHIVE_LOCAL]
ARCH_TYPE = LOCAL
ARCH_DEST = +DMDATA/DSC2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
[ARCHIVE_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC0
ARCH_INCOMING_PATH = +DMDATA/DSC0/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
[ARCHIVE_REMOTE2]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC1
ARCH_INCOMING_PATH = +DMDATA/DSC1/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
--节点dsc0和dsc1分别加入DSC2的归档路径
[dmdba@dmdsc02 dsc1_config]$ vi dmarch.ini
[ARCHIVE_REMOTE2]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC2
ARCH_INCOMING_PATH = +DMDATA/DSC2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 51200
--节点dsc0、dsc1、dsc2配置一样
[dmdba@dmdsc02 dsc1_config]$ vi dmmal.ini
#DaMeng Database Mail Configuration file
#this is comments
MAL_CHECK_INTERVAL = 30
MAL_COMBIN_BUF_SIZE = 0
MAL_SEND_THRESHOLD = 2048
MAL_CONN_FAIL_INTERVAL = 10
MAL_LOGIN_TIMEOUT = 15
MAL_BUF_SIZE = 100
MAL_SYS_BUF_SIZE = 0
MAL_VPOOL_SIZE = 128
MAL_COMPRESS_LEVEL = 0
MAL_TEMP_PATH =
[MAL_INST0]
MAL_INST_NAME = DSC0
MAL_HOST = 192.168.10.100
MAL_PORT = 5736
MAL_INST_HOST = 192.168.2.100
MAL_INST_PORT = 5236
MAL_DW_PORT = 5836
MAL_LINK_MAGIC = 0
MAL_INST_DW_PORT = 5936
[MAL_INST1]
MAL_INST_NAME = DSC1
MAL_HOST = 192.168.10.101
MAL_PORT = 5737
MAL_INST_HOST = 192.168.2.101
MAL_INST_PORT = 5236
MAL_DW_PORT = 5837
MAL_LINK_MAGIC = 0
MAL_INST_DW_PORT = 5937
[MAL_INST2]
MAL_INST_NAME = DSC2
MAL_HOST = 192.168.10.102
MAL_PORT = 5738
MAL_INST_HOST = 192.168.2.102
MAL_INST_PORT = 5236
MAL_DW_PORT = 5838
MAL_LINK_MAGIC = 0
MAL_INST_DW_PORT = 5938
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /dm8/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 2
DMDCR_AUTO_OPEN_CHECK = 90
DMDCR_ASM_RESTART_INTERVAL = 30 #CSS认定ASM故障重启的时间
DMDCR_ASM_STARTUP_CMD = /dm8/bin/dmasmsvr dcr_ini=/dm8/dsc/config/dmdcr.ini
DMDCR_DB_RESTART_INTERVAL = 60 #CSS认定DSC故障重启的时间
DMDCR_DB_STARTUP_CMD = /dm8/bin/dmserver path=/dm8/dsc/config/dsc2_config/dm.ini dcr_ini=/dm8/dsc/config/dmdcr.ini
[MAL_INST1]
MAL_INST_NAME = ASM0
MAL_HOST = 192.168.10.100 #心跳地址
MAL_PORT = 5636 #MAL监听端口
[MAL_INST2]
MAL_INST_NAME = ASM1
MAL_HOST = 192.168.10.101
MAL_PORT = 5637
[MAL_INST3]
MAL_INST_NAME = ASM2
MAL_HOST = 192.168.10.102
MAL_PORT = 5638
--修改dmdcr_cfg_bak3.ini
[dmdba@dmdsc01 ~]$ vi dmdcr_cfg_bak3.ini
# the file is auto-created by system, self edit is invalid!
#DCR HDR
DCR_N_GRP = 3
DCR_VTD_PATH = /dev/raw/raw2
DCR_OGUID = 45331
[GRP]
DCR_GRP_TYPE = CSS
DCR_GRP_NAME = GRP_CSS
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP]
DCR_GRP_TYPE = ASM
DCR_GRP_NAME = GRP_ASM
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP]
[GRP_DSC]
DCR_EP_NAME = DSC0
DCR_GRP_TYPE = DB
DCR_GRP_NAME = GRP_DSC
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP_CSS]
DCR_EP_NAME = CSS0
DCR_EP_HOST = 192.168.10.100
DCR_EP_PORT = 5336
[GRP_CSS]
DCR_EP_NAME = CSS1
DCR_EP_HOST = 192.168.10.101
DCR_EP_PORT = 5337
[GRP_CSS]
DCR_EP_NAME = CSS2
DCR_EP_HOST = 192.168.10.102
DCR_EP_PORT = 5338
[GRP_ASM]
DCR_EP_NAME = ASM0
DCR_EP_SHM_KEY = 93360
DCR_EP_SHM_SIZE = 10
DCR_EP_HOST = 192.168.10.100
DCR_EP_PORT = 5436
DCR_EP_ASM_LOAD_PATH = /dev/raw
[GRP_ASM]
DCR_EP_NAME = ASM1
DCR_EP_SHM_KEY = 93361
DCR_EP_SHM_SIZE = 10
DCR_EP_HOST = 192.168.10.101
DCR_EP_PORT = 5437
DCR_EP_ASM_LOAD_PATH = /dev/raw
[GRP_ASM]
DCR_EP_NAME = ASM2
DCR_EP_SHM_KEY = 93362
DCR_EP_SHM_SIZE = 10
DCR_EP_HOST = 192.168.10.102
DCR_EP_PORT = 5438
DCR_EP_ASM_LOAD_PATH = /dev/raw
[GRP_DSC]
DCR_EP_NAME = DSC0
DCR_EP_SEQNO = 0
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 5536
[GRP_DSC]
DCR_EP_NAME = DSC1
DCR_EP_SEQNO = 1
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 5537
[GRP_DSC]
DCR_EP_NAME = DSC2
DCR_EP_SEQNO = 2
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 5538
[dmdba@dmdsc01 ~]$ dmasmcmd
DMASMCMD V8
ASM>extend dcrdisk '/dev/raw/raw1' from '/home/dmdba/dmdcr_cfg_bak3.ini'
ASMCMD extend node for dcr disk success.
ASMCMD extend node for vote disk success.
Used time: 102.029(ms).
[dmdba@dmdsc01 bin]$ dmcssm dmcssm.ini
[monitor] 2022-06-16 13:44:57: CSS MONITOR V8
[monitor] 2022-06-16 13:45:17: CSS MONITOR SYSTEM IS READY.
[monitor] 2022-06-16 13:45:17: Wait CSS Control Node choosed...
[monitor] 2022-06-16 13:45:18: Wait CSS Control Node choosed succeed.
show
monitor current time:2022-06-16 13:45:21, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 1] ========================================
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 CSS0 0 5336 Normal Node OPEN WORKING OK TRUE 13419125 13421771
2022-06-16 13:45:20 CSS1 1 5337 Control Node OPEN WORKING OK TRUE 13451753 13454404
=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 13432204 13434808
2022-06-16 13:45:20 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 13465137 13467745
=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 13:45:20 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 14261853 14263514
2022-06-16 13:45:20 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 14300882 14302532
==================================================================================================================
extend node
[monitor] 2022-06-16 13:45:28: 执行扩展节点动作
[monitor] 2022-06-16 13:45:31: 通知当前活动的CSS执行清理操作
[monitor] 2022-06-16 13:45:31: 清理CSS(0)请求成功
[monitor] 2022-06-16 13:45:32: 清理CSS(1)请求成功
[monitor] 2022-06-16 13:45:32: 命令EXTENT NODE 执行成功
[dmdba@dmdw01 ~]$ cd /dm8/bin
[dmdba@dmdw01 bin]$ ./dmcss dcr_ini=/dm8/dsc/config/dmdcr.ini
DMCSS V8
DMCSS IS READY
[2022-06-16 17:16:42:869] [CSS]: 设置EP CSS0[0]为控制节点
[dmdba@dmdsc01 ~]$ cd /dm8/bin
[dmdba@dmdsc01 bin]$ vi dmcssm.ini
CSSM_OGUID = 45331
CSSM_CSS_IP = 192.168.10.100:5336
CSSM_CSS_IP = 192.168.10.101:5337
CSSM_CSS_IP = 192.168.10.102:5338
CSSM_LOG_PATH = ../log
CSSM_LOG_FILE_SIZE = 512
CSSM_LOG_SPACE_LIMIT = 2048
[dmdba@dmdsc01 bin]$ dmcssm dmcssm.ini
[monitor] 2022-06-16 17:13:14: CSS MONITOR V8
[monitor] 2022-06-16 17:13:34: CSS MONITOR SYSTEM IS READY.
[monitor] 2022-06-16 17:13:34: Wait CSS Control Node choosed...
[monitor] 2022-06-16 17:13:35: Wait CSS Control Node choosed succeed.
[CSS1] [2022-06-16 17:13:38:290] [CSS]: 重启本地ASM实例,命令:[/dm8/bin/dmasmsvr dcr_ini=/dm8/dsc/config/dmdcr.ini]
show
show
monitor current time:2022-06-16 17:20:11, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = TRUE
[DSC2] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 17:20:10 CSS0 0 5336 Control Node OPEN WORKING OK TRUE 18144846 18145294
2022-06-16 17:20:10 CSS1 1 5337 Normal Node OPEN WORKING OK TRUE 18185244 18185672
2022-06-16 17:20:10 CSS2 2 5338 Normal Node OPEN WORKING OK TRUE 116040 116253
=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 3
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
(2, 2)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 17:20:10 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 18158229 18158634
2022-06-16 17:20:10 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 18198309 18198695
2022-06-16 17:20:10 ASM2 2 5438 Normal Node OPEN WORKING OK TRUE 129433 129602
=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 3
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
(2, 2)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-06-16 17:20:10 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 18916013 18916316
2022-06-16 17:20:10 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 18951417 18951721
2022-06-16 17:20:10 DSC2 2 5236 Normal Node OPEN WORKING OK TRUE 144464 144602
==================================================================================================================
[root@dmdw01 root]# ./dm_service_installer.sh -t dmcss -dcr_ini /dm8/dsc/config/dmdcr.ini -p CSS
Created symlink from /etc/systemd/system/multi-user.target.wants/DmCSSServiceCSS.service to /usr/lib/systemd/system/DmCSSServiceCSS.service.
创建服务(DmCSSServiceCSS)完成
--创建ASM服务
[root@dmdw01 root]# ./dm_service_installer.sh -t dmasmsvr -dcr_ini /dm8/dsc/config/dmdcr.ini -y DmCSSServiceCSS.service -p ASM
Created symlink from /etc/systemd/system/multi-user.target.wants/DmASMSvrServiceASM.service to /usr/lib/systemd/system/DmASMSvrServiceASM.service.
创建服务(DmASMSvrServiceASM)完成
--创建DMserver服务
[root@dmdw01 root]# ./dm_service_installer.sh -t dmserver -dm_ini /dm8/dsc/config/dsc2_config/dm.ini -dcr_ini /dm8/dsc/config/dmdcr.ini -y DmASMSvrServiceASM.service -p DSC
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC.service to /usr/lib/systemd/system/DmServiceDSC.service.
创建服务(DmServiceDSC)完成
[root@dmdw01 root]#
2022-06-16 15:10:18.660 [FATAL] database P0000003969 T0000000000000003969 os_sema2_create_low, exist other server is running, sema_value:2, after dec:1, errno:10!
2022-06-16 15:10:18.660 [INFO] database P0000003969 T0000000000000003969 Create semaphore for path[+DMDATA/data/dsc//dev/raw/raw1] failed, it is being startup by other process!
2022-06-16 15:10:18.660 [FATAL] database P0000003969 T0000000000000003969 instance DSC2 is running.
排查思路:
--按照这个思路先进行排查下
1. 扩展节点前由用户保证所有 dmcss/dmasmsvr/dmserver 节点都是OK的,且都是活动的;
2. 每次扩展节点只能扩一个节点,扩展完成后可以再继续扩展节点;
3. 扩展节点的过程中不能出现修改实例状态或模式的操作;
4. 扩展节点的过程中,如果发生 dmcss/dmasmsvr/dmserver 实例故障,会导致扩展失败
5. 扩展过程中操作失误(比如未修改 dmmal.ini、asmsvrmal.ini,未增加日志文件),会导致扩展失败;
6. 执行完 extend node 命令,用户需要查看 log 文件,确认扩展操作是否成功;
7. 扩展失败可能会导致集群环境异常,需要退出所有 dmcss/dmasmsvr/dmserver,重新 init dcr 磁盘
解决方法:
--停止所有节点的DMSERVER、ASM、CSS服务
[dmdba@dmdsc01 bin]$ DmServiceDSC stop && DmASMSvrServiceASM stop && DmCSSServiceCSS stop
Stopping DmServiceDSC: [ OK ]
Stopping DmASMSvrServiceASM: [ OK ]
Stopping DmCSSServiceCSS: [ OK ]
[dmdba@dmdsc01 bin]$
--清理err_ep_arr信息
[dmdba@dmdsc01 bin]$ dmasmcmd
DMASMCMD V8
ASM>clear dcrdisk err_ep_arr '/dev/raw/raw1' 'GRP_DSC'
Used time: 00:00:14.530.
ASM>
--重启所有节点CSS服务
[dmdba@dmdsc01 bin]$ DmCSSServiceCSS start
Starting DmCSSServiceCSS: [ OK ]
文章
阅读量
获赞