客户监控报警某系统cpu使用率较高,top查看 dmserver进程占用达到9000%,96线程cpu服务器,反馈该系统是一个子系统,压力不应该这么大,
迅速进库查看当前活动会话情况并导出execl留作备份,会话情况见下截图
Top查看主机资源CPU中sys占比较高,user未分配到,存在热点表的访问,perf查看系统热点函数spin_lock占用了CPU,通常是sql并发访问同一数据情况,且sql执行速度慢造成,
解决方法就是尽可能优化sql,系统上线已久,只有查看sql能否进一步优化,因主库资源占用较高,将SQL拿到备库进行调试
相关sql结构如下,
select ahqc,kssj,jssj,laaymc,cbrmc,hytcy,sjymc,
(select wm_concat(xm) from C where js='1011250002' and c.ajbs=a.ajbs) as dlr,cbrbm
from A join B on a.ajbs=b.ajbs where a.kssj<=now() and a.jssj>=now() and a.dm='M58'
and a.ftbm='13' order by a.yjkssj desc LIMIT 0,10;
Sql结构分析:
1、该sql存在标量子查询和join,驱动表关联次数需要翻倍,目前执行时间500ms左右
2、存在1对多的情形,用wm_concat合并多行数据为一行
3、最终返回只有一行数据
表数据量统计
–A 340w
–B 534w
–C 216w
执行计划如下:
1 #NSET2: [200, 5, 524]
2 #PIPE2: [200, 5, 524]
3 #PRJT2: [99, 5, 524]; exp_num(9), is_atom(FALSE)
4 #SORT3: [99, 5, 524]; key_num(1), is_distinct(FALSE), top_flag(1), is_adaptive(0)
5 #NEST LOOP INDEX JOIN2: [98, 5, 524]
6 #SLCT2: [97, 5, 188]; (a.ftbm = '13' AND a.kssj <= var1 AND a.jssj >= var1)
7 #BLKUP2: [97, 78416, 188]; INDEX13406957170222226(a)
8 #SSEK2: [97, 78416, 188]; scan_type(ASC), INDEX13406957170222226(A as a), scan_range[('M58',min),('M58',max))
9 #BLKUP2: [1, 1, 0]; INDEX33569753(b)
10 #SSEK2: [1, 1, 0]; scan_type(ASC), INDEX33569753(B as b), scan_range[a.ajbs,a.ajbs]
11 #SPL2: [100, 5, 188]; key_num(2), spool_num(0), is_atom(FALSE), has_variable(0)
12 #PRJT2: [100, 5, 188]; exp_num(3), is_atom(FALSE)
13 #HAGR2: [100, 5, 188]; grp_num(1), sfun_num(3); slave_empty(0) keys(a.ROWID)
14 #INDEX JOIN LEFT JOIN2: [99, 856, 188] join condition(c.js = 1011250002) ret_null(0)
15 #SLCT2: [97, 5, 188]; (a.ftbm = '13' AND a.kssj <= var2 AND a.jssj >= var2)
16 #BLKUP2: [97, 78416, 188]; INDEX13406957170222226(a)
17 #SSEK2: [97, 78416, 188]; scan_type(ASC), INDEX13406957170222226(A as a), scan_range[('M58',min),('M58',max))
18 #BLKUP2: [1, 169, 0]; idx_C_ajbs(c)
19 #SSEK2: [1, 169, 0]; scan_type(ASC), idx_C_ajbs(C as c),
-通过执行计划观察到A回表次数达到 78416,且存在子查询,共计两回,成本较大,那么主要问题就是减少回表行数,即减少驱动表结果集。
观察sql过滤条件都是a表过滤,且从执行计划
#SLCT2: [97, 5, 188]; (a.ftbm = ‘13’ AND a.kssj <= var2 AND a.jssj >= var2)观察条件组合过滤性较好,可以考虑建立组合索引,
组合索引需要等值条件在前,结合谓词条件a.dm=‘M58’,建立组合索引(dm,ftbm,kssj),过滤后数据只有几条,再回表代价较低
create index idx_dm_ftbm on A(dm,ftbm,kssj)
建立组合索引后执行计划
#NSET2: [4, 5, 524]
2 #PIPE2: [4, 5, 524]
3 #PRJT2: [1, 5, 524]; exp_num(9), is_atom(FALSE)
4 #SORT3: [1, 5, 524]; key_num(1), is_distinct(FALSE), top_flag(1), is_adaptive(0)
5 #NEST LOOP INDEX JOIN2: [1, 5, 524]
6 #SLCT2: [1, 5, 188]; a.jssj >= var1
7 #BLKUP2: [1, 94, 188]; idx_kssj_jssj(a)
8 #SSEK2: [1, 94, 188]; scan_type(ASC), idx_kssj_jssj(t_sp_ftsy as a), scan_range(('M58','13',null2),('M58','13',exp11)]
9 #BLKUP2: [1, 1, 0]; INDEX33569753(b)
10 #SSEK2: [1, 1, 0]; scan_type(ASC), INDEX33569753(t_sp_ajjbxx as b), scan_range[a.ajbs,a.ajbs]
11 #SPL2: [3, 5, 188]; key_num(2), spool_num(0), is_atom(FALSE), has_variable(0)
12 #PRJT2: [3, 5, 188]; exp_num(3), is_atom(FALSE)
13 #HAGR2: [3, 5, 188]; grp_num(1), sfun_num(3); slave_empty(0) keys(a.ROWID)
14 #INDEX JOIN LEFT JOIN2: [2, 859, 188] join condition(c.js = 1011250002) ret_null(0)
15 #SLCT2: [1, 5, 188]; a.jssj >= var2
16 #BLKUP2: [1, 94, 188]; idx_kssj_jssj(a)
17 #SSEK2: [1, 94, 188]; scan_type(ASC), idx_kssj_jssj(t_sp_ftsy as a), scan_range(('M58','13',null2),('M58','13',exp11)]
18 #BLKUP2: [1, 169, 0]; idx_t_sp_sscyr_qtry_ajbs(c)
19 #SSEK2: [1, 169, 0]; scan_type(ASC), idx_t_sp_sscyr_qtry_ajbs(t_sp_sscyr_qtry as c), scan_range[a.ajbs,a.ajbs]
观察BLKUP2,回表由原来的78416降至94,执行速度也降低至15ms。索引建立后,CPU使用率下降,且活动会话无慢sql,当然也可以考虑改写sql为子查询 为left join 使得驱动表只访问一次,这里情景改写前后速度变化不大,且现速度已达到要求,决定不再改动SQL。
至此,处理完此次性能问题。
总结:多表关联查询数据返回少,结合谓词条件观察执行计划是否最优,重点关注回表的代价,降低此代价是优化关键,即减少驱动表结果集,如果驱动表返回结果集过大,将难以优化。对于高并发且SQL返回数据只有几条或者1条情况下,要尽可能将执行速度控制再几十ms内。
文章
阅读量
获赞