注册

生产环境,230802周三,今天数据库宕机2次,急急急

祢真伟大 2023/08/02 1161 5

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
【DM版本】:V8 1-1-190-21.03.12-136419-SEC
【操作系统】:麒麟v10
【CPU】:arm架构
【问题描述】*:单机,今天数据库宕机2次(日志见附件,数据库日志未报错)
有生产core文件,文件是截断的
操作系统日志
操作系统将数据库进程杀死,和内存有关系,请问能看出具体问题吗?

Aug  2 14:26:08 oadb kernel: [18654779.419773] {360}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
Aug  2 14:26:08 oadb kernel: [18654779.428884] {360}[Hardware Error]: event severity: recoverable
Aug  2 14:26:08 oadb kernel: [18654779.435388] {360}[Hardware Error]:  Error 0, type: recoverable
Aug  2 14:26:08 oadb kernel: [18654779.441889] {360}[Hardware Error]:   section_type: ARM processor error
Aug  2 14:26:08 oadb kernel: [18654779.449081] {360}[Hardware Error]:   MIDR: 0x00000000481fd010
Aug  2 14:26:08 oadb kernel: [18654779.455497] {360}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x00000000811d0300
Aug  2 14:26:08 oadb kernel: [18654779.465023] {360}[Hardware Error]:   error affinity level: 0
Aug  2 14:26:08 oadb kernel: [18654779.471351] {360}[Hardware Error]:   running state: 0x1
Aug  2 14:26:08 oadb kernel: [18654779.477249] {360}[Hardware Error]:   Power State Coordination Interface state: 0
Aug  2 14:26:08 oadb kernel: [18654779.485305] {360}[Hardware Error]:   Error info structure 0:
Aug  2 14:26:08 oadb kernel: [18654779.491632] {360}[Hardware Error]:   num errors: 1
Aug  2 14:26:08 oadb kernel: [18654779.497097] {360}[Hardware Error]:    error_type: 0, cache error
Aug  2 14:26:08 oadb kernel: [18654779.503770] {360}[Hardware Error]:    error_info: 0x0000000024400014
Aug  2 14:26:08 oadb kernel: [18654779.510788] {360}[Hardware Error]:     cache level: 1
Aug  2 14:26:08 oadb kernel: [18654779.516514] {360}[Hardware Error]:     the error has been corrected
Aug  2 14:26:08 oadb kernel: [18654779.523445] {360}[Hardware Error]:    virtual fault address: 0x0000000000000000
Aug  2 14:26:08 oadb kernel: [18654779.531416] {360}[Hardware Error]:    physical fault address: 0x00000024738712a0
Aug  2 14:26:08 oadb kernel: [18654779.539474] {360}[Hardware Error]:   Vendor specific error info has 16 bytes:
Aug  2 14:26:08 oadb kernel: [18654779.547274] {360}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
Aug  2 14:26:08 oadb kernel: [18654779.557155] Uncorrected hardware memory error in user-access at 0000000055108590
Aug  2 14:26:08 oadb kernel: [18654779.703431] {361}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
Aug  2 14:26:08 oadb kernel: [18654779.703456] Memory failure: 0x247387: Killing dmserver:1977295 due to hardware memory corruption
Aug  2 14:26:08 oadb kernel: [18654779.712532] {361}[Hardware Error]: event severity: recoverable
Aug  2 14:26:08 oadb kernel: [18654779.712533] {361}[Hardware Error]:  Error 0, type: recoverable
Aug  2 14:26:08 oadb kernel: [18654779.712534] {361}[Hardware Error]:   section_type: memory error
Aug  2 14:26:08 oadb kernel: [18654779.712535] {361}[Hardware Error]:   error_status: 0x0000000000000000
Aug  2 14:26:08 oadb kernel: [18654779.712535] {361}[Hardware Error]:   physical_address: 0x0000002473871280
Aug  2 14:26:08 oadb kernel: [18654779.712536] {361}[Hardware Error]:   physical_address_mask: 0x0000000000000000
Aug  2 14:26:08 oadb kernel: [18654779.712541] {361}[Hardware Error]:   node: 0 card: 3 module: 0 rank: 0 bank: 5 device: 1 row: 16320 column: 560 bit_position: 0 requestor_id: 0x0000000000000000 responder_id: 0x0000000000000000 
Aug  2 14:26:08 oadb kernel: [18654779.721993] Memory failure: 0x247387: recovery action for dirty LRU page: Recovered
Aug  2 14:26:08 oadb kernel: [18654779.728493] {361}[Hardware Error]:   error_type: 17, unknown
Aug  2 14:26:08 oadb audit[1977295]: ANOM_ABEND auid=0 uid=12345 gid=12349 ses=40806 pid=1977295 comm="dm_sql_thd" exe="/dmdbms/bin/dmserver" sig=7 res=1
Aug  2 14:26:08 oadb systemd[1]: Started Process Core Dump (PID 1981819/UID 0).
Aug  2 14:26:08 oadb audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@5-1981819-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug  2 14:26:08 oadb kernel: [18654779.796548] EDAC MC0: 1 UE reserved error (17) on unknown label (node:0 card:3 module:0 rank:0 bank:5 row:16320 col:560 bit_pos:0 page:0x247387 offset:0x1280 grain:-1 - status(0x0000000000000000): reserved requestorID: 0x0000000000000000 responderID: 0x0000000000000000 targetID: 0x0000000000000000)
Aug  2 14:26:08 oadb kernel: [18654779.823539] Memory failure: 0x247387: already hardware poisoned
Aug  2 14:26:09 oadb systemd-coredump[1981820]: Core file was truncated to 2147483648 bytes.
Aug  2 14:26:09 oadb kernel: [18654780.956532] EDAC MC0: 1 UE reserved error (17) on unknown label (node:0 card:6 module:0 rank:0 bank:11 row:27654 col:536 bit_pos:0 page:0x20d80d offset:0x3c0 grain:-1 - status(0x0000000000000000): reserved requestorID: 0x0000000000000000 responderID: 0x0000000000000000 targetID: 0x0000000000000000)
Aug  2 14:26:09 oadb kernel: [18654780.983525] Memory failure: 0x20d80d: already hardware poisoned
Aug  2 14:26:11 oadb systemd-coredump[1981820]: Process 1977295 (dmserver) of user 12345 dumped core.#012#012Stack trace of thread 1981237:#012#0  0x00000000005081b8 bdta3_cpy_str (/dmdbms/bin/dmserver)#012#1  0x00000000005081b4 bdta3_cpy_str (/dmdbms/bin/dmserver)
Aug  2 14:26:11 oadb systemd[1]: systemd-coredump@5-1981819-0.service: Succeeded.
Aug  2 14:26:11 oadb audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@5-1981819-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug  2 14:26:46 oadb esfdaemon[998483]: 0

数据库配置文件 dm.ini
dm.ini
数据库宕机 2段日志 + 对应时间段 2段操作系统日志
dm_sys_log.txt

回答 0
暂无回答
扫一扫
联系客服