注册
源端DMHS启动直接CORE掉案例分析
专栏/技术分享/ 文章详情 /

源端DMHS启动直接CORE掉案例分析

干饭王 2023/11/30 1631 1 0
摘要

一、同步情况方向
源端ORACLE11.2.0.4 RAC--同步-》目标端DM V8.1.3.62_pack16 DSC
二、使用的同步软件版本
达梦端HS版本:V4.3.24-Build(2023.11.04-143714trunc)_D64_2311
ORACLE端HS版本:V4.3.14-Build(2023.07.05-133910trunc)_D64_2306_sp9
三、问题表现

正常运行一段时间后,发现源端会异常停止,且无法启动

前台方式启动同步,启动后直接CORE掉

CPT[INFO]: node 2 search seq 357524 at arch
CPT[INFO]: node 2 switch to log file(+ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755:0)
CPT[INFO]: nodeid:2 - id:2 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
CPT[INFO]: nodeid:2 - id:0 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
CPT[INFO]: nodeid:2 - id:3 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
CPT[INFO]: nodeid:2 - id:1 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
CPT[INFO]: nodeid:1 - id:0 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
CPT[INFO]: nodeid:1 - id:1 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
CPT[INFO]: nodeid:1 - id:3 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
CPT[INFO]: nodeid:1 - id:2 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
CPT[INFO]: node 1 close to log file(+ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0)
CPT[INFO]: node 1 search seq 673356 at arch
CPT[INFO]: node 1 switch to log file(+ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605:0)
CPT[INFO]: nodeid:1 - id:2 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
CPT[INFO]: nodeid:1 - id:3 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
CPT[INFO]: nodeid:1 - id:1 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
CPT[INFO]: nodeid:1 - id:0 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=32] [si_ptr=0x20] [si_addr=0x38]
kpedbg_dmp_stack()+362<-kpeDbgCrash()+192<-kpeDbgSignalHandler()+119<-skgesig_sigactionHandler()+218<-__sighandler()<-redo_parse_get_data_piece()+38<-redo_cfa_parse()+194<-redo_change_parse()+685<-ora_parse_record()+1905<-ora_cpt_parse()+2521<-start_thread()+209
段错误 (core dumped)
[oracle@db01 bin_cpt_172]$ 

对应hs日志,因hs已CORE,日志再无输出

2023-11-30 14:56:44 CPT[INFO]: node 2 search seq 357523 at arch
2023-11-30 14:56:44 CPT[INFO]: node 2 switch to log file(+ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0)
2023-11-30 14:56:44 CPT[INFO]: nodeid:2 - id:2 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425, handle:0, block size:512, block count:365170
2023-11-30 14:56:44 CPT[INFO]: nodeid:2 - id:3 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425, handle:0, block size:512, block count:365170
2023-11-30 14:56:44 CPT[INFO]: nodeid:2 - id:1 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425, handle:0, block size:512, block count:365170
2023-11-30 14:56:44 CPT[INFO]: nodeid:2 - id:0 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425, handle:0, block size:512, block count:365170
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:0 close +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:1 close +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:2 close +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:3 close +ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0
2023-11-30 14:56:52 CPT[INFO]: node 2 close to log file(+ARCH/db/archivelog/2023_11_29/thread_2_seq_357523.1696.1154204425:0)
2023-11-30 14:56:52 CPT[INFO]: node 2 search seq 357524 at arch
2023-11-30 14:56:52 CPT[INFO]: node 2 switch to log file(+ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755:0)
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:1 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:0 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:2 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
2023-11-30 14:56:52 CPT[INFO]: nodeid:2 - id:3 open redo log +ARCH/db/archivelog/2023_11_29/thread_2_seq_357524.5254.1154204755, handle:0, block size:512, block count:411434
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:2 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:3 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:0 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:1 close +ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0
2023-11-30 14:56:57 CPT[INFO]: node 1 close to log file(+ARCH/db/archivelog/2023_11_29/thread_1_seq_673355.3330.1154202605:0)
2023-11-30 14:56:57 CPT[INFO]: node 1 search seq 673356 at arch
2023-11-30 14:56:57 CPT[INFO]: node 1 switch to log file(+ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605:0)
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:1 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:2 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:0 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
2023-11-30 14:56:57 CPT[INFO]: nodeid:1 - id:3 open redo log +ARCH/db/archivelog/2023_11_29/thread_1_seq_673356.4662.1154202605, handle:0, block size:512, block count:96460
=====日志停留在这里,没有任何输出了

同时检查ORACLE数据库状态,数据库是正常的

[oracle@db01 log]$ ps -ef |grep pmon
grid      50963      1  0 13:09 ?        00:00:03 asm_pmon_+ASM1
oracle    78482      1  0 14:36 ?        00:00:02 ora_pmon_db1
oracle   137163 120407  0 14:59 pts/5    00:00:00 grep pmon
[oracle@db01 log]$ tnsping db
TNS Ping Utility for Linux: Version 11.2.0.4.0 - Production on 30-NOV-2023 15:00:10
Copyright (c) 1997, 2013, Oracle.  All rights reserved.
Used parameter files:
/oracle/app/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = db-scan)(PORT = 1523)) (ADDRESS = (PROTOCOL = TCP)(HOST = *.*.*.172)(PORT = 1523)) (ADDRESS = (PROTOCOL = TCP)(HOST = *.*.*.174)(PORT = 1523)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db)))
OK (0 msec)
[oracle@db01 log]$ tnsping db 10
TNS Ping Utility for Linux: Version 11.2.0.4.0 - Production on 30-NOV-2023 15:00:14
Copyright (c) 1997, 2013, Oracle.  All rights reserved.
Used parameter files:
/oracle/app/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = db-scan)(PORT = 1523)) (ADDRESS = (PROTOCOL = TCP)(HOST = *.*.*.172)(PORT = 1523)) (ADDRESS = (PROTOCOL = TCP)(HOST = *.*.*.174)(PORT = 1523)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db)))
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (0 msec)
OK (10 msec)
[oracle@db01 log]$


分析对于的CORE文件,通过堆栈显示,关键#5行是“redo_parse_get_data_piece”出现异常,与技术专家沟通,基本确认与“redo解析”有关。

[oracle@db01 bin_cpt_172]$ gdb dmhs_server  core.118608
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
。。。。。
Core was generated by '/oracle/dmhs/bin_cpt_172/dmhs_server /oracle/dmhs/bin_cpt_172/dmhs.hs -noconsol'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003a8c632495 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64 libaio-0.3.107-10.el6.x86_64 numactl-2.0.7-8.el6.x86_64
(gdb) bt
#0  0x0000003a8c632495 in raise () from /lib64/libc.so.6
#1  0x00007f8bf68fa476 in skgesigOSCrash () from /oracle/app/oracle/product/11.2.0/db_1/lib/libclntsh.so.11.1
#2  0x00007f8bf6bb70c5 in kpeDbgSignalHandler () from /oracle/app/oracle/product/11.2.0/db_1/lib/libclntsh.so.11.1
#3  0x00007f8bf68fa686 in skgesig_sigactionHandler () from /oracle/app/oracle/product/11.2.0/db_1/lib/libclntsh.so.11.1
#4  <signal handler called>
#5  0x00007f8bfc31c854 in redo_parse_get_data_piece (cur_table=0x0, bdba=4120393684, slot=0, is_redo_lst=1) at /opt/qza/src/cpt_ora/ora_redo.c:3661
#6  0x00007f8bfc3134a8 in redo_cfa_parse (ora_parse=0x7f8bf49e2838, change_vector=0x7f8be7b0caf8) at /opt/qza/src/cpt_ora/ora_redo.c:367
#7  0x00007f8bfc31c26d in redo_change_parse (ora_parse=0x7f8bf49e2838, change_vector=0x7f8be7b0caf8, is_rollback=1) at /opt/qza/src/cpt_ora/ora_redo.c:3464
#8  0x00007f8bfc2dca56 in ora_parse_record (ora_parse=0x7f8bf49e2838, record=0x7f8be7a042b0, racid=1) at /opt/qza/src/cpt_ora/ora_parse.c:918
#9  0x00007f8bfc3008b8 in ora_cpt_parse (_mgr_cpt=0xa588a8) at /opt/qza/src/cpt_ora/ora_pub.c:1155
#10 0x0000003a8ca07aa1 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003a8c6e8bcd in clone () from /lib64/libc.so.6
(gdb)


四、解决办法
升级ORACLE端的hs版本与DM端一致,即V4.3.24,完美解决本次故障,正常启动hs。

五、总结
对于DMHS同步环境,部署的时候源端和目标端版本务必一致,例如本次统一使用V4.3.24即可减少此类问题出现。

评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服