本文档意图可以帮助我们快速了解并基于DMASM的DSC集群动态扩展节点的实践操作流程。同时目前基于DMASM镜像的DSC集群不支持动态扩展节点。
DSC集群支持动态扩展,但每次扩展只能在原有基础上增加一个节点。同时需确保集群所有节点都为OK状态,所有的数据库实例都处于OPEN状态,并且可以正常访问。
在实际项目实践中的DSC使用中我们大多数会选择两节点的集群进行初步的搭建,在后续使用中因用户或者业务量的需求,我们需要保障原有业务不受太大的影响,且需要第三节点的加入完成集群的扩容,所以我们可以用到DSC的动态节点拓展。
注:在动态拓展的后续过程中,第三节点的加入会导致原有的节点出现数据库挂起状态无法对外提供服务,但此过程持续较短,具体恢复时间长短需根据具体环境和数据量而定。
下面介绍下DSC两节点集群扩展为DSC集群三节点,整体扩展流程及操作步骤,搭建环境过程省略。
已搭建完成的 DMDSC 集群实例名为 DSC0、DSC1,在此基础上扩展一个节点 DSC2。
机器上使用 DMASMCMD 工具 export 出备份 dmdcr_cfg_bak.ini,命令如下:
export dcrdisk ‘/dev/asmdisk/dmdcr01’ to ‘/home/dmdba/dmdcr_cfg_bak.ini’
[dmdba@dsc01:~]$ dmasmcmd
ASM>
ASM>export dcrdisk '/dev/asmdisk/dmdcr01' to '/home/dmdba/dmdcr_cfg_bak.ini'
ASMCMD export DCRDISK success.
Used time: 32.348(ms).
ASM>exit
1)使用 DIsql 登录任意一个节点执行添加日志文件操作:
至少两个日志文件,路径必须是 ASM 文件格式,大小可以参考其他两个活动节点。
alter database add node logfile ‘+DMLOG/log/dsc2_log01.log’ size 256, ‘+DMLOG/log/dsc2_log02.log’ size 256;
SQL>alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 256, '+DMLOG/log/dsc2_log02.log' size 256;
2)转换控制文件
达梦数据库都有一个名为 dm.ctl 的控制文件,控制文件是一个二进制文件,它记录了数据库必要的初始信息,其中主要包含内容如下:
介绍完成达梦控制文件之后,我们使用 dmctlcvt 工具将 dm.ctl 转换为文本文件 dmctl.txt,如下提示convert ctl to txt success!代表操作成功。
dmctlcvt TYPE=1 SRC=‘+DMDATA/data/DSC/dm.ctl’ DEST=/home/dmdba/dmctl.txt DCR_INI=/dmdata/DSC/dmdcr.ini
dmctlcvt TYPE=1 SRC='+DMDATA/data/DSC/dm.ctl' DEST=/home/dmdba/dmctl.txt DCR_INI=/dmdata/DSC/dmdcr.ini
[dmdba@dsc01:/dmdata/DSC]$ dmctlcvt TYPE=1 SRC='+DMDATA/data/DSC/dm.ctl' DEST=/home/dmdba/dmctl.txt DCR_INI=/dmdata/DSC/dmdcr.ini
DMCTLCVT V8
convert ctl to txt success!
[dmdba@dsc01:/dmdata/DSC]$
这时可以手工vi编辑dmctl.txt文件,确认新增节点的日志文件信息是否已经成功添加进 dm.ctl文件中。
3)使用 dmasmtool 工具登录 ASM 文件系统,也可以看到新增的节点日志文件
dmasmtool DCR_INI=/dmdata/DSC/dmdcr.ini
[dmdba@dsc01:/dmdata/DSC]$ dmasmtool DCR_INI=/dmdata/DSC/dmdcr.ini
DMASMTOOL V8
ASM>ls +DMLOG/log
file : dsc0_log01.log
file : dsc0_log02.log
file : dsc1_log01.log
file : dsc1_log02.log
file : dsc2_log01.log
file : dsc2_log02.log
total count 6.
Used time: 6.879(ms).
ASM>
将192.168.25.95机器/dmdata/DSC/dsc0_config 目录拷贝到192.168.25.93机器相同目录下,修改名字为/dmdata/DSC/dsc2_config。
修改dsc2_config 文件夹下的配置文件:
[dmdba@dm93 DSC]$ ls -l 总用量 0 drwxrwxr-x 4 dmdba dinstall 132 12月 28 11:18 dsc0_config [dmdba@dm93 DSC]$ mv dsc0_config/ dsc2_config/ [dmdba@dm93 DSC]$
CONFIG_PATH = /dmdata/DSC/dsc2_config instance_name = DSC2
#DSC节点1归档
[dmdba@dsc01:~]$ cat /dmdata/DSC/dsc0_config/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_LOCAL_SHARE = 1 #表示本地归档是否共享给远程节点,0表示不共享,1表表示共享默认是0
[ARCHIVE_LOCAL1] #本地归档配置
ARCH_TYPE = LOCAL #归档类型
ARCH_DEST = +DMARCH/log/dsc0/arch #归档路径
ARCH_FILE_SIZE = 1024 #归档单个文件大小,单位MB默认为1024MB
ARCH_SPACE_LIMIT = 1024 #归档空间限制,超过限制系统自动删除最早的本地文件
ARCH_FLUSH_BUF_SIZE = 0 #归档合并刷盘缓存大小,单位MB,默认是0
ARCH_HANG_FLAG = 1 #本地归档写入失败时系统是否挂起,0不挂起,1挂起,默认是1
[ARCH_REMOTE1] #远程归档配置
ARCH_TYPE = REMOTE #归档类型
ARCH_DEST = DSC1 #归档目录实例名
ARCH_INCOMING_PATH = +DMARCH/log/dsc1/arch #远程规程存放在本节点实际路径
ARCH_FILE_SIZE = 1024 #归档单个文件大小,单位MB默认为1024MB
ARCH_SPACE_LIMIT = 1024 #归档空间限制,超过限制系统自动删除最早的本地文件
ARCH_FLUSH_BUF_SIZE = 0 #归档合并刷盘缓存大小,单位MB,默认是0
[ARCH_REMOTE2]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC2
ARCH_INCOMING_PATH = +DMARCH/log/dsc2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[dmdba@dsc01:~]$
#dsc节点2归档
[dmdba@dsc02:/dmdata/DSC/dsc1_config]$ cat dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_LOCAL_SHARE=1
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = +DMARCH/log/dsc1/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
ARCH_HANG_FLAG = 1
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC0
ARCH_INCOMING_PATH = +DMARCH/log/dsc0/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[ARCH_REMOTE2]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC2
ARCH_INCOMING_PATH = +DMARCH/log/dsc2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[dmdba@dsc02:/dmdata/DSC/dsc1_config]$
#dsc3节点归档
[dmdba@dm93 dsc2_config]$ cat dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_LOCAL_SHARE = 1
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = +DMARCH/log/dsc2/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
ARCH_HANG_FLAG = 1
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC0
ARCH_INCOMING_PATH = +DMARCH/log/dsc0/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[ARCH_REMOTE2]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC1
ARCH_INCOMING_PATH = +DMARCH/log/dsc1/arch
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[dmdba@dm93 dsc2_config]$
在新节点DSC1环境下的/dmdata/DSC目录下面注意设置 dmdcr_seqo 为 2,修改为当前dm.ini 路径及dmdcr.ini路径
[dmdba@dm93 DSC]$ cat dmdcr.ini DMDCR_PATH = /dev/asmdisk/dmdcr01 DMDCR_MAL_PATH = /dmdata/DSC/dmasvrmal.ini DMDCR_SEQNO = 2 DMDCR_ASM_RESTART_INTERVAL = 0 DMDCR_ASM_STARTUP_CMD = /dm/dmdbms/bin/dmasmsvr dcr_ini=/dmdata/DSC/dmdcr.ini DMDCR_DB_RESTART_INTERVAL = 0 DMDCR_DB_STARTUP_CMD = /dm/dmdbms/bin/dmserver path=/dmdata/DSC/dsc2_config/dm.ini dcr_ini=/dmdata/DSC/dmdcr.ini [dmdba@dm93 DSC]$
直接修改当前环境的 dmasvrmal.ini 文件,添加新增节点信息,使用 DMASM 的所有节点都要配置,内容完全一样,并且将新增信息后的 dmasvrmal.ini 文件拷贝到节点192.168.25.93的/dmdata/DSC 目录下。
[root@dsc02:/dmdata/DSC]# cat dmasvrmal.ini
[MAL_INST1]
MAL_INST_NAME = ASM0
MAL_HOST = 192.168.25.95
MAL_PORT = 8888
[MAL_INST2]
MAL_INST_NAME = ASM1
MAL_HOST = 192.168.25.96
MAL_PORT = 8888
[MAL_INST3]
MAL_INST_NAME = ASM2
MAL_HOST = 192.168.25.93
MAL_PORT = 8888
直接修改 dmserver 三个实例的 dmmal.ini,添加新增节点信息,所有节点都要配置相同内容,保存到各自的dsc_config目录下。
root@dsc02:/dmdata/DSC/dsc1_config]# cat dmmal.ini
[mal_inst0]
mal_inst_name = DSC0
mal_host = 192.168.25.95
mal_port = 9340
[mal_inst1]
mal_inst_name = DSC1
mal_host = 192.168.25.96
mal_port = 9340
[mal_inst2]
mal_inst_name = DSC2
mal_host = 192.168.25.93
mal_port = 9340
[root@dsc02:/dmdata/DSC/dsc1_config]#
将原有的dmdcr_cfg_bak.ini文件备份,CSS/ASMSVR/DB组信息都需要调整修改。
原参数:
DCR_GRP_N_EP = 2
DCR_GRP_EP_ARR = {0,1}
调整后参数:
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
同时每个节点分别增加一个节点信息,注意 DCR_EP_SHM_KEY、端口号不能冲突;各组信息要放在各自的后面,即[GRP_CSS]中 CSS2 放在 CSS1 后面,[GRP_ASM]中 ASM2 放在 ASM1后面,DSC2 放在 DSC1 后面,参考信息如下:
[dmdba@dsc01:~]$ cat dmdcr_cfg_bak.ini
# the file is auto-created by system, self edit is invalid!
#DCR HDR
DCR_N_GRP = 3
DCR_VTD_PATH = /dev/asmdisk/dmvote01
DCR_OGUID = 417046
[GRP]
DCR_GRP_TYPE = CSS
DCR_GRP_NAME = GRP_CSS
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP]
DCR_GRP_TYPE = ASM
DCR_GRP_NAME = GRP_ASM
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP]
DCR_GRP_TYPE = DB
DCR_GRP_NAME = GRP_DSC
DCR_GRP_N_EP = 3
DCR_GRP_EP_ARR = {0,1,2}
DCR_GRP_N_ERR_EP = 0
DCR_GRP_ERR_EP_ARR = {}
DCR_GRP_DSKCHK_CNT = 60
[GRP_CSS]
DCR_EP_NAME = CSS0
DCR_EP_HOST = 192.168.25.95
DCR_EP_PORT = 12345
[GRP_CSS]
DCR_EP_NAME = CSS1
DCR_EP_HOST = 192.168.25.96
DCR_EP_PORT = 12345
[GRP_CSS]
DCR_EP_NAME = CSS2
DCR_EP_HOST = 192.168.25.93
DCR_EP_PORT = 12345
[GRP_ASM]
DCR_EP_NAME = ASM0
DCR_EP_SHM_KEY = 93360
DCR_EP_SHM_SIZE = 20
DCR_EP_HOST = 192.168.25.95
DCR_EP_PORT = 12346
DCR_EP_ASM_LOAD_PATH = /dev/asmdisk
[GRP_ASM]
DCR_EP_NAME = ASM1
DCR_EP_SHM_KEY = 93361
DCR_EP_SHM_SIZE = 20
DCR_EP_HOST = 192.168.25.96
DCR_EP_PORT = 12346
DCR_EP_ASM_LOAD_PATH = /dev/asmdisk
[GRP_ASM]
DCR_EP_NAME = ASM2
DCR_EP_SHM_KEY = 93362
DCR_EP_SHM_SIZE = 20
DCR_EP_HOST = 192.168.25.93
DCR_EP_PORT = 12346
DCR_EP_ASM_LOAD_PATH = /dev/asmdisk
[GRP_DSC]
DCR_EP_NAME = DSC0
DCR_EP_SEQNO = 0
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 12347
[GRP_DSC]
DCR_EP_NAME = DSC1
DCR_EP_SEQNO = 1
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 12347
[GRP_DSC]
DCR_EP_NAME = DSC2
DCR_EP_SEQNO = 2
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 12347
[dmdba@dsc01:~]$
在 dmcssm 控制台中,执行扩展节点命令 extend node回车即可,参考信息如下:dmcssm ini_path=/dmdata/DSC/dmcssm.ini
[dmdba@dsc01:/dmdata/DSC]$ dmcssm ini_path=/dmdata/DSC/dmcssm.ini
[monitor] 2022-12-28 11:43:29: CSS MONITOR V8
[monitor] 2022-12-28 11:43:29: CSS MONITOR SYSTEM IS READY.
[monitor] 2022-12-28 11:43:29: Wait CSS Control Node choosed...
[monitor] 2022-12-28 11:43:30: Wait CSS Control Node choosed succeed.
#输入extend node
extend node
程序会通知所有实例(CSS/ASMSVR/dmserver)更新信息,在 CSS 控制台执行 SHOW命令,能看到新增节点信息, ASMSVR/dmserver 是 error 节点, 程 序 会 通 知ASMSVR/dmserver 更新 MAL 信息。
extend node
[monitor] 2022-12-28 11:44:12: Extend node
[monitor] 2022-12-28 11:44:17: Notify current active CSS to do clear
[monitor] 2022-12-28 11:44:18: Clean request of CSS(0) success
[monitor] 2022-12-28 11:44:19: Clean request of CSS(1) success
[monitor] 2022-12-28 11:44:19: Command EXTENT NODE execute success
在DSC2节点启动 dmcss、dmasmsvr 程序。
手动启动新的 dmcss,dcr_ini 指向新的 dmdcr.ini 文件:
#前台启动
#./dmcss DCR_INI=/dmdata/DSC/dmdcr.ini
服务名启动
[dmdba@dm93 bin]$ ./DmCSSServiceCss start
Starting DmCSSServiceCss:
手动启动新的 dmasmsvr,dcr_ini 指向新的 dmdcr.ini 文件,asmsvr 启动故障重加入流程:
./dmasmsvr DCR_INI=/dmdata/DSC/dmdcr.in [dmdba@dm93 bin]$ ./DmASMSvrServiceAsmsvr start Starting DmASMSvrServiceAsmsvr: [ OK ] [dmdba@dm93 bin]$
如果 DMCSS 配置有自动拉起 dmasmsvr 的功能,可以等待 DMCSS 自动拉起 dmasmsvr程序,不需要手动启
如果DMCSS配置有自动拉起dmserver的功能,可以等待DMCSS自动拉起实例,不需要手动启动。
如果需要手动启动,可参考下面的操作步骤:
DSC2节点:
前台启动 ./dmserver /dmdata/DSC/dsc2_config/dm.ini dcr_ini=/dmdata/DSC/dmdcr.ini 服务名启动 [dmdba@dm93 bin]$ ./DmServiceDSC start Starting DmServiceDSC: connnect dmasmtool successfully. [ OK ] [dmdba@dm93 bin]$
使用dmcss监视器查看数据库的状态,使用show命令可以查看到整个集群数据库信息,关注CSS、ASM、DB的状态 inst_status字段是否处理OPEN状态。如以下状态代表整个集群正常,信息如下:
dmcssm ini_path=/dmdata/DSC/dmcssm.ini
show
monitor current time:2022-12-28 12:00:10, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = FALSE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = FALSE
[DSC1] auto restart = TRUE
[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = FALSE
[DSC2] auto restart = FALSE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-12-28 12:00:10 CSS0 0 12345 Control Node OPEN WORKING OK TRUE 3743470 3756916
2022-12-28 12:00:10 CSS1 1 12345 Normal Node OPEN WORKING OK TRUE 3763149 3776576
2022-12-28 12:00:10 CSS2 2 12345 Normal Node OPEN WORKING OK TRUE 1257086 1257980
=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 3
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
(2, 2)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-12-28 12:00:10 ASM0 0 12346 Control Node OPEN WORKING OK TRUE 3758249 3771661
2022-12-28 12:00:10 ASM1 1 12346 Normal Node OPEN WORKING OK TRUE 3777929 3791321
2022-12-28 12:00:10 ASM2 2 12346 Normal Node OPEN WORKING OK TRUE 1265568 1266435
=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 3
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
(2, 2)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-12-28 12:00:10 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 17615205 17615544
2022-12-28 12:00:10 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 17642592 17642931
2022-12-28 12:00:10 DSC2 2 5236 Normal Node OPEN
需要注意的是如果由于配置文件错误,动态扩展节点失败,只能停掉所有实例,重新init dcr 磁盘,不影响 dmserver 数据。
使用ep命令启动节点异常
ep startup GRP_DSC
[monitor] 2022-12-28 21:29:23: Notify CSS(seqno:0) execute EP STARTUP(DSC0)
[monitor] 2022-12-28 21:29:32: Notify CSS(seqno:0) EP STARTUP(DSC0) success
[monitor] 2022-12-28 21:29:32: Notify CSS(seqno:1) execute EP STARTUP(DSC1)
[monitor] 2022-12-28 21:29:42: Notify CSS(seqno:1) EP STARTUP(DSC1) success
[monitor] 2022-12-28 21:29:42: Notify CSS(seqno:2) execute EP STARTUP(DSC2)
[monitor] 2022-12-28 21:30:43: Monitor wait over (60)s, ep(DSC2) still not STARTUP success, command execute failed
[monitor] 2022-12-28 21:30:43: Notify CSS(seqno:2) EP STARTUP(DSC2) failed
[monitor] 2022-12-28 21:30:43: Notify current active CSS to do clear
[monitor] 2022-12-28 21:30:43: Clean request of CSS(0) success
[monitor] 2022-12-28 21:30:44: Clean request of CSS(1) success
[monitor] 2022-12-28 21:30:44: Clean request of CSS(2) success
[monitor] 2022-12-28 21:30:44: Command EP STARTUP GRP_DSC execute failed
对css日志观察发现启动的路径信息错误,对配置文件排查由于扩展节点时dmdcr.ini配置信息没配置正确导致启动异常。原因是从其他节点拷贝的dsc_config文件,导致dmdcr_seqo参数漏掉为2还是原来的信息,同时DMDCR_DB_STARTUP_CMD路径信息也还是原来的节点信息。执行extend node之后节点无法正常启动。
[dmdba@dm93 DSC]$ cat dmdcr.ini DMDCR_PATH = /dev/asmdisk/dmdcr01 DMDCR_MAL_PATH = /dmdata/DSC/dmasvrmal.ini DMDCR_SEQNO = 0 DMDCR_ASM_RESTART_INTERVAL = 0 DMDCR_ASM_STARTUP_CMD = /dm/dmdbms/bin/dmasmsvr dcr_ini=/dmdata/DSC/dmdcr.ini DMDCR_DB_RESTART_INTERVAL = 0 DMDCR_DB_STARTUP_CMD = /dm/dmdbms/bin/dmserver path=/dmdata/DSC/dsc0_config/dm.ini dcr_ini=/dmdata/DSC/dmdcr.ini
手工将扩展节点dmdcr.ini配置文件修改正确,内容如下重启DSC集群问题依然存在。
DMDCR_PATH = /dev/asmdisk/dmdcr01 DMDCR_MAL_PATH = /dmdata/DSC/dmasvrmal.ini DMDCR_SEQNO = 2 DMDCR_ASM_RESTART_INTERVAL = 0 DMDCR_ASM_STARTUP_CMD = /dm/dmdbms/bin/dmasmsvr dcr_ini=/dmdata/DSC/dmdcr.ini DMDCR_DB_RESTART_INTERVAL = 0 DMDCR_DB_STARTUP_CMD = /dm/dmdbms/bin/dmserver path=/dmdata/DSC/dsc2_config/dm.ini dcr_ini=/dmdata/DSC/dmdcr.ini
通过停掉所有数据库实例,重新对dcr ASM磁盘组重新初始化,最后启动DSC集群检查状态正常,在生产环境中操作需要特别注意。
文章
阅读量
获赞