注册
or关联改写反思
专栏/小周历险记/ 文章详情 /

or关联改写反思

啊小周 2024/07/08 1311 8 0
摘要 -or关联与union all之间有一定的联系,我们在做优化改写时,可以考虑这种方式。

or条件简单解释

or是一种逻辑运算,or条件成立的情况如下:
image.png
or运算相当于只要满足一个条件为真,它就返回真。在运算中与union all计算有相似之处
image.png
只要一个集合里面有数据,那么意味着union all结果集就有数据。
因此在优化中经常用此作为改写替换的方法。

or关联分析

构造例子

drop table t0705;
create table t0705 (id int primary key,c1 varchar2(20),c2 varchar2(20));
insert into t0705 select level,'A'||level,'B'||level from dual connect by level<=500000;
commit;
drop table t07051;
create table t07051 (id int primary key,c1 varchar2(20),c2 varchar2(20));

insert into t07051 select level,'A'||level,'B'||level from dual connect by level<=500000;
commit;
insert into t07051 select level+500000,'A'||level,'B'||level from dual connect by level<=500000;
commit;

drop table t07052;
create table t07052 (id int primary key,c1 varchar2(20),c2 varchar2(20));

insert into t07052 select level,'A'||level,'B'||level from dual connect by level<=10000;
commit;
insert into t07052 select level+10000,'B'||level,'B'||level from dual connect by level<=900000;
commit;

create index IDX_DM_t07051 on t07051(c1);
create index IDX_DM_t07052 on t07052(c1);
create index IDX_DM_T0705_c1 on t0705(c1);
create index IDX_DM_T0705_c2 on t0705(c2);

dbms_stats.gather_table_stats(USER,’T0705’,null,100);
dbms_stats.gather_table_stats(USER,’T07051’,null,100);
dbms_stats.gather_table_stats(USER,’T07052’,null,100);


分析

语句1

select /*+OPTIMIZER_OR_NBEXP(0) */count(*) from (
select distinct t2.id,t3.c1 from t0705 t1,t07051 t2,t07052 t3 where (t1.c1=t2.c1 or t1.c2=t2.c1)
and (t1.c1=t3.c1 or t1.c2=t3.c1) and t2.id=t3.id
) t

计划如下:
image.png
该语句主要是因为t2和t3与t1关联是or关联的,在or条件拆分的情况下这里表扫描次数4次。
下面是把or整体做/+OPTIMIZER_OR_NBEXP(2)/的情况:
image.png
计划的第6行这里是把or整体做,相当于三个表关联后再做or运算,在语句中t2和t3它由关联条件t2.id=t3.id,但是t1和t3的关联条件是or它已经合并起来做,那这里意味着t1和t3没有直接关联的条件了,所以做成笛卡尔积了。
上面无论怎么做,似乎并不高效。这里其实看到or,我们第一反应想到的是union all计算。
我们可以把重复计算的做成temp,计划也会做成HEAP TABLE从而减少表扫描,or条件关联相当于分为四种情况。
改写方案如下:

with temp as
     (select c1,c2 from t0705
     ) 
     , 
     temp2 as 
     (select t1.c1 t1c1,t1.id, 
             t2.c1 t2c1
        from t07051 t1,
             t07052 t2
       where t1.id=t2.id 
     )
select count(*) 
  from
  (select distinct id,t2c1 from (select t2.id,t2c1 from temp t,temp2 t2 where t.c1=t1c1 and t.c2=t2c1
 union ALL 
 select t2.id,t2c1 from temp t,temp2 t2 where t.c2=t1c1 and t.c1=t2c1
 union ALL 
  select t2.id,t2c1 from temp t,temp2 t2 where t.c1=t1c1 and t.c1=t2c1
 union ALL
 select t2.id,t2c1 from temp t,temp2 t2 where t.c2=t1c1 and t.c2=t2c1) AA
 
) t

计划如下:
image.png

语句2

select count(*) from t07051 t1,t07052 t2 where exists (select 1 from t0705 t where (t.c1=t1.c1 or t.c2=t1.c1) and (t.c1=t2.c1 or t.c2=t2.c1))
and t1.id=t2.id;

计划如下:
image.png
与语句1同样的道理

with temp as
     (select c1,c2 from t0705
     ) 
     , 
     temp2 as 
     (select t1.c1 t1c1, 
             t2.c1 t2c1
        from t07051 t1,
             t07052 t2
       where t1.id=t2.id 
     )
select count(*) 
  from temp2 t 
 where exists (select 1 from temp t where t.c1=t1c1 and t.c2=t2c1
 union ALL
 select 1 from temp t where t.c2=t1c1 and t.c1=t2c1
 union ALL
 select 1 from temp t where t.c1=t1c1 and t.c1=t2c1
 union ALL
 select 1 from temp t where t.c2=t1c1 and t.c2=t2c1
 );

计划如下:
image.png

上面改写后的语句基本上提升1倍以上的效率

总结

or关联无论是把or拆分还是整体做,并不高效,在优化中我们一般是把or用union all去做等价替代。这里有几点要注意
1、下面情况不等价

create table t1 (id int primary key,c1 varchar2(20),c2 varchar2(20));
insert into t1 values(1,'AA','BB');
insert into t1 values(2,'CC','AA');

commit;
drop table if exists t2;
create table t2 (id int primary key,c1 varchar2(20),c2 varchar2(20));
insert into t2 values(1,'AA','BB');
insert into t2 values(2,'AB','BA');
insert into t2 values(3,'BB','AA');
commit;
drop table if exists t3;
create table t3 (id int primary key,c1 varchar2(20),c2 varchar2(20));
insert into t3 values(1,'AA','BB');
insert into t3 values(2,'AB','BA');
insert into t3 values(3,'BB','AA');
commit;

select t1.c1 as t1c1,t2.c1 as t2c1,t3.c1 as t3c1 from t1,t2,t3 where t2.id=t3.id and (t1.c1=t2.c1 or t1.c2=t2.c1)  
AA	AA	AA
AA	BB	BB
CC	AA	AA
不等价于
select t1.c1 as t1c1,t2.c1 as t2c1,t3.c1 as t3c1 from (select c1 from t1
union all
select c2  from t1
) t1
,t2,t3 where t2.id=t3.id and t1.c1=t2.c1
AA	AA	AA
BB	BB	BB
AA	AA	AA
这种情况是要显示t1.c1情况的,下面的c1是union all是把t1.c1和t1.c2合并了,并不是实际意义上的c1了,因此这里需要加一列把真正的c1列查询出来
所以改写:
select t1c1 as t1c1,t2.c1 as t2c1,t3.c1 as t3c1 from (select c1,c1 as t1c1 from t1
union all
select c2,c1 as t1c1  from t1
) t1
,t2,t3 where t2.id=t3.id and t1.c1=t2.c1
AA	AA	AA
AA	BB	BB
CC	AA	AA

有时or改写成union all需要注意结果要的是什么,另外就是union all它不做去重,这一点与or有点区别。
2、or运算优先级比and的低,如果要优先计算,记得加()。
3、改写方案最终是不要受到参数的影响。

评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服