注册
SQL优化案例——left+inner join的旋转功能
专栏/技术分享/ 文章详情 /

SQL优化案例——left+inner join的旋转功能

PYZ 2025/08/01 203 2 0
摘要

问题SQL

SQL以及执行计划如下:

select DISTINCT f.id, f.code, ... b.business_id from file_base_info f left join file_attributes a on f.id = a.file_id left join file_chunk c on f.id = c.file_id left join file_invoice i on f.id = i.file_id left join file_properties p on f.id = p.file_id inner join file_business_binding b on f.id = b.file_id where (f.is_chunk = 0 or (f.is_chunk = 1 and f.chunk_completed = 1)) and b.dr = 0 and b.business_type = 'hrcloud' and b.business_id in ('e180efc7-f502-48f4-a2a5-14419bbed808') order by f.utime desc; 1 #NSET2: [233, 25, 968] 2 #PRJT2: [233, 25, 968]; exp_num(59), is_atom(FALSE) 3 #SORT3: [233, 25, 968]; key_num(59), partition_key_num(0), is_distinct(TRUE), top_flag(0), is_adaptive(0) 4 #UNION FOR OR2: [232, 150, 968]; key_num(0), outer_join(-) 5 #HASH2 INNER JOIN: [196, 25, 968]; KEY_NUM(1); KEY(B.FILE_ID=F.ID) KEY_NULL_EQU(0) 6 #SSEK2: [1, 1, 160]; scan_type(ASC), IDX_B_BUSINESS(FILE_BUSINESS_BINDING as B), scan_range[('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',min),('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',max)), is_global(0) 7 #HASH LEFT JOIN2: [179, 100000, 808]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=P.FILE_ID) 8 #HASH LEFT JOIN2: [142, 20000, 808]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=I.FILE_ID) 9 #HASH LEFT JOIN2: [99, 20000, 808]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=C.FILE_ID) 10 #HASH LEFT JOIN2: [67, 20000, 808]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=A.FILE_ID) 11 #HASH RIGHT SEMI JOIN2: [42, 4000, 808]; n_keys(1) KEY(DMTEMPVIEW_889214179.colname=F.IS_CHUNK) KEY_NULL_EQU(0) 12 #CONST VALUE LIST: [1, 2, 4]; row_num(2), col_num(1) 13 #SLCT2: [28, 80000, 808]; F.IS_CHUNK = 0 SLCT_PUSHDOWN(TRUE) 14 #CSCN2: [28, 100000, 808]; INDEX33556270(FILE_BASE_INFO as F) NEED_SLCT(TRUE); btr_scan(1) 15 #CSCN2: [14, 100000, 204]; INDEX33556272(FILE_ATTRIBUTES as A); btr_scan(1) 16 #CSCN2: [14, 100000, 216]; INDEX33556274(FILE_CHUNK as C); btr_scan(1) 17 #CSCN2: [22, 100000, 588]; INDEX33556276(FILE_INVOICE as I); btr_scan(1) 18 #CSCN2: [17, 100000, 364]; INDEX33556278(FILE_PROPERTIES as P); btr_scan(1) 19 #NEST LOOP INDEX JOIN2: [34, 125, 968] 20 #INDEX JOIN LEFT JOIN2: [34, 125, 808]; ret_null(0) 21 #INDEX JOIN LEFT JOIN2: [32, 25, 808]; ret_null(0) 22 #INDEX JOIN LEFT JOIN2: [30, 25, 808]; ret_null(0) 23 #INDEX JOIN LEFT JOIN2: [29, 25, 808]; ret_null(0) 24 #HASH RIGHT SEMI JOIN2: [28, 5, 808]; n_keys(1) KEY(DMTEMPVIEW_889214181.colname=F.IS_CHUNK) KEY_NULL_EQU(0) 25 #CONST VALUE LIST: [1, 2, 4]; row_num(2), col_num(1) 26 #SLCT2: [28, 100, 808]; (F.IS_CHUNK = 1 AND F.CHUNK_COMPLETED = 1 AND var3) SLCT_PUSHDOWN(TRUE) 27 #CSCN2: [28, 100000, 808]; INDEX33556270(FILE_BASE_INFO as F) NEED_SLCT(TRUE); btr_scan(1) 28 #BLKUP2: [1, 5, 48]; IDX_A_FILEID(A) 29 #SSEK2: [1, 5, 48]; scan_type(ASC), IDX_A_FILEID(FILE_ATTRIBUTES as A), scan_range[F.ID,F.ID], is_global(0) 30 #BLKUP2: [1, 1, 48]; IDX_C_FILEID(C) 31 #SSEK2: [1, 1, 48]; scan_type(ASC), IDX_C_FILEID(FILE_CHUNK as C), scan_range[F.ID,F.ID], is_global(0) 32 #BLKUP2: [1, 1, 48]; IDX_I_FILEID(I) 33 #SSEK2: [1, 1, 48]; scan_type(ASC), IDX_I_FILEID(FILE_INVOICE as I), scan_range[F.ID,F.ID], is_global(0) 34 #BLKUP2: [1, 5, 48]; IDX_P_FILEID(P) 35 #SSEK2: [1, 5, 48]; scan_type(ASC), IDX_P_FILEID(FILE_PROPERTIES as P), scan_range[F.ID,F.ID], is_global(0) 36 #SSEK2: [1, 1, 148]; scan_type(ASC), IDX_B_BUSINESS(FILE_BUSINESS_BINDING as B), scan_range[('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',F.ID),('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',F.ID)], is_global(0)

SQL分析

这条SQL关联了六张表,其中file_base_info、file_attributes、file_chunk、file_invoice、file_properties均为大表,实际数据量在200w以上,file_business_binding表的business_id谓词条件过滤性很高,现场最后整体结果集两条,耗时大于1min。优化前执行计划的访问路径是由b表作为左孩子节点与f、a、c、i、p表hash左关联后的结果集进行hash内连接,再与or的另一分支进行or展开,最后sort排序返回结果,其中主要代价在于右孩子节点的全表扫描和hash关联,导致效率低下,如下图所示:
image.png

优化思路

首先由于计划中存在or的展开,那么这里先通过hint调整/*+ optimizer_or_nbexp(2) */将or表达式合并得到如下执行计划,虽然合并or表达式可以简化执行计划,但连接方式仍然是hash join,开销依然会很大

1 #NSET2: [204, 25, 944] 2 #PRJT2: [204, 25, 944]; exp_num(59), is_atom(FALSE) 3 #SORT3: [204, 25, 944]; key_num(59), partition_key_num(0), is_distinct(TRUE), top_flag(0), is_adaptive(0) 4 #HASH2 INNER JOIN: [203, 25, 944]; KEY_NUM(1); KEY(B.FILE_ID=F.ID) KEY_NULL_EQU(0) 5 #SSEK2: [1, 1, 148]; scan_type(ASC), IDX_B_BUSINESS(FILE_BUSINESS_BINDING as B), scan_range[('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',min),('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',max)), is_global(0) 6 #HASH LEFT JOIN2: [182, 125000, 796]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=P.FILE_ID) 7 #HASH LEFT JOIN2: [144, 25000, 796]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=I.FILE_ID) 8 #HASH LEFT JOIN2: [100, 25000, 796]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=C.FILE_ID) 9 #HASH LEFT JOIN2: [67, 25000, 796]; key_num(1), partition_keys_num(0), ret_null(0), mix(0), KEY(F.ID=A.FILE_ID) 10 #HASH RIGHT SEMI JOIN2: [41, 5000, 796]; n_keys(1) KEY(DMTEMPVIEW_889214194.colname=F.IS_CHUNK) KEY_NULL_EQU(0) 11 #CONST VALUE LIST: [1, 2, 4]; row_num(2), col_num(1) 12 #SLCT2: [27, 81000, 796]; (F.IS_CHUNK = 0 OR (F.IS_CHUNK = 1 AND F.CHUNK_COMPLETED = 1)) SLCT_PUSHDOWN(TRUE) 13 #CSCN2: [27, 100000, 796]; INDEX33556270(FILE_BASE_INFO as F) NEED_SLCT(TRUE); btr_scan(1) 14 #CSCN2: [14, 100000, 192]; INDEX33556272(FILE_ATTRIBUTES as A); btr_scan(1) 15 #CSCN2: [14, 100000, 204]; INDEX33556274(FILE_CHUNK as C); btr_scan(1) 16 #CSCN2: [22, 100000, 576]; INDEX33556276(FILE_INVOICE as I); btr_scan(1) 17 #CSCN2: [17, 100000, 352]; INDEX33556278(FILE_PROPERTIES as P); btr_scan(1)

由于b表的传参条件business_id过滤性很高,并且f表的关联字段id为主键,其他表的关联字段file_id上也存在索引,因此最优的SQL访问路径应该是由b作为驱动表,条件过滤后得到一个小结果集,再通过连接条件inner join file_business_binding b on f.id = b.file_id将连接条件推入来降低f表的结果集,最后f再与其他表进行嵌套循环索引连接,因此这里通过enable_hash_join(0)调整连接方式为嵌套循环,执行计划如下:

1 #NSET2: [4834, 125000, 948] 2 #PRJT2: [4834, 125000, 948]; exp_num(59), is_atom(FALSE) 3 #SORT3: [4834, 125000, 948]; key_num(59), partition_key_num(0), is_distinct(TRUE), top_flag(0), is_adaptive(0) 4 #NEST LOOP INDEX JOIN2: [4791, 125000, 948] 5 #INDEX JOIN LEFT JOIN2: [2697, 125000, 800]; ret_null(0) 6 #INDEX JOIN LEFT JOIN2: [1802, 25000, 800]; ret_null(0) 7 #INDEX JOIN LEFT JOIN2: [1637, 25000, 800]; ret_null(0) 8 #INDEX JOIN LEFT JOIN2: [1474, 25000, 800]; ret_null(0) 9 #SLCT2: [1438, 5000, 800]; F.IS_CHUNK = DMTEMPVIEW_889214198.colname 10 #NEST LOOP INNER JOIN2: [1438, 5000, 800] 11 #CONST VALUE LIST: [1, 2, 4]; row_num(2), col_num(1) 12 #SLCT2: [27, 81000, 796]; (F.IS_CHUNK = 0 OR (F.IS_CHUNK = 1 AND F.CHUNK_COMPLETED = 1)) SLCT_PUSHDOWN(TRUE) 13 #CSCN2: [27, 100000, 796]; INDEX33556270(FILE_BASE_INFO as F) NEED_SLCT(TRUE); btr_scan(1) 14 #BLKUP2: [35, 5, 48]; IDX_A_FILEID(A) 15 #SSEK2: [35, 5, 48]; scan_type(ASC), IDX_A_FILEID(FILE_ATTRIBUTES as A), scan_range[F.ID,F.ID], is_global(0) 16 #BLKUP2: [163, 1, 48]; IDX_C_FILEID(C) 17 #SSEK2: [163, 1, 48]; scan_type(ASC), IDX_C_FILEID(FILE_CHUNK as C), scan_range[F.ID,F.ID], is_global(0) 18 #BLKUP2: [165, 1, 48]; IDX_I_FILEID(I) 19 #SSEK2: [165, 1, 48]; scan_type(ASC), IDX_I_FILEID(FILE_INVOICE as I), scan_range[F.ID,F.ID], is_global(0) 20 #BLKUP2: [895, 5, 48]; IDX_P_FILEID(P) 21 #SSEK2: [895, 5, 48]; scan_type(ASC), IDX_P_FILEID(FILE_PROPERTIES as P), scan_range[F.ID,F.ID], is_global(0) 22 #SSEK2: [558, 1, 148]; scan_type(ASC), IDX_B_BUSINESS(FILE_BUSINESS_BINDING as B), scan_range[('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',F.ID),('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',F.ID)], is_global(0)

上面的执行计划虽然已经是嵌套循环,但循环顺序并不是由b表的小结果集来驱动右侧的大结果集,而是选择了f表作为驱动表,计划中f表需要先全表扫描再一层层进行嵌套,嵌套循环次数就是驱动表行数,整体计划需要循环上百万次效率仍然很差,这里是由于参数OUTER_CVT_INNER_PULL_UP_COND_FLAG默认为11没有开启旋转功能,即当LEFT与INNER JOIN操作符同时存在时,开启连接旋转功能可以让INNER JOIN提前下推。

OUTER_CVT_INNER_PULL_UP_COND_FLAG具体参数释义如下:

动态参数,默认值11,控制外连接转化内连接的优化 1:当外连接转化为内连接时,打开连接条件; 2:转化时探测 OR 布尔表达式是否能优化。即探测 OR 的每个分支中是否隐式包含 IS NOT NULL 的条件,若是则也转为内连接; 4:当过滤条件中包含函数时,禁止外连接转换为内连接; 8:当外连接包含层次查询时,禁止转换为内连接; 16:开启旋转功能。即指 LEFT 与 INNER JOIN(哈希内连接、归并内连接、嵌套内连接和索引内连接)操作符同时存在时,让 INNER JOIN 操作符先做的优化。 支持使用上述有效值的组合值,如 3 表示同时进行 1 和 2 的优化

最终通过组合hint /*+ optimizer_or_nbexp(2) enable_hash_join(0) outer_cvt_inner_pull_up_cond_flag(27) */调整执行计划来进行优化,如下计划也可以看到b表已经作为驱动表来进行嵌套循环,最后SQL整体耗时在10ms左右。

1 #NSET2: [2, 25, 944] 2 #PRJT2: [2, 25, 944]; exp_num(59), is_atom(FALSE) 3 #SORT3: [2, 25, 944]; key_num(59), partition_key_num(0), is_distinct(TRUE), top_flag(0), is_adaptive(0) 4 #INDEX JOIN LEFT JOIN2: [1, 25, 944]; ret_null(0) 5 #INDEX JOIN LEFT JOIN2: [1, 5, 944]; ret_null(0) 6 #INDEX JOIN LEFT JOIN2: [1, 5, 944]; ret_null(0) 7 #INDEX JOIN LEFT JOIN2: [1, 5, 944]; ret_null(0) 8 #HASH RIGHT SEMI JOIN2: [1, 1, 944]; n_keys(1) KEY(DMTEMPVIEW_889214203.colname=F.IS_CHUNK) KEY_NULL_EQU(0) 9 #CONST VALUE LIST: [1, 2, 4]; row_num(2), col_num(1) 10 #SLCT2: [1, 1, 944]; (F.IS_CHUNK = 0 OR (F.IS_CHUNK = 1 AND F.CHUNK_COMPLETED = 1)) 11 #NEST LOOP INDEX JOIN2: [1, 1, 944] 12 #SSEK2: [1, 1, 148]; scan_type(ASC), IDX_B_BUSINESS(FILE_BUSINESS_BINDING as B), scan_range[('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',min),('hrcloud',0,'e180efc7-f502-48f4-a2a5-14419bbed808',max)), is_global(0) 13 #BLKUP2: [1, 1, 48]; INDEX33556271(F) 14 #SSEK2: [1, 1, 48]; scan_type(ASC), INDEX33556271(FILE_BASE_INFO as F), scan_range[B.FILE_ID,B.FILE_ID], is_global(0) 15 #BLKUP2: [1, 5, 48]; IDX_A_FILEID(A) 16 #SSEK2: [1, 5, 48]; scan_type(ASC), IDX_A_FILEID(FILE_ATTRIBUTES as A), scan_range[F.ID,F.ID], is_global(0) 17 #BLKUP2: [1, 1, 48]; IDX_C_FILEID(C) 18 #SSEK2: [1, 1, 48]; scan_type(ASC), IDX_C_FILEID(FILE_CHUNK as C), scan_range[F.ID,F.ID], is_global(0) 19 #BLKUP2: [1, 1, 48]; IDX_I_FILEID(I) 20 #SSEK2: [1, 1, 48]; scan_type(ASC), IDX_I_FILEID(FILE_INVOICE as I), scan_range[F.ID,F.ID], is_global(0) 21 #BLKUP2: [1, 5, 48]; IDX_P_FILEID(P) 22 #SSEK2: [1, 5, 48]; scan_type(ASC), IDX_P_FILEID(FILE_PROPERTIES as P), scan_range[F.ID,F.ID], is_global(0)
评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服