排序（ORDER BY）是经常需要用到的操作，本文通过几个具体的案例来分析这类排序SQL的优化方式

--构建测试表
create table t1 as select * from dba_objects;
create table t2 as select * from dba_objects where 1=2;
insert into t2 select * from t1 sample(50);
commit;
call dbms_stats.gather_table_stats('HR','T1',null,100,true,'for all columns size auto');
call dbms_stats.gather_table_stats('HR','T2',null,100,true,'for all columns size auto');

案例一

先来一个简单案例
已知存在如下索引，为什么Oracle可以走INDEX FULL SCAN将排序优化掉，达梦却还是CSCN+SORT？
create index idx_t1_object_id_asc on t1(object_id);

--Oracle执行计划
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	6txuq9zxxphx8, child number 0
-------------------------------------
select * from t1 where object_id is not null order by object_id desc

Plan hash value: 431440989

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation		    | Name		   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |			   |	  1 |	     |	     |	1583 (100)|	     |	87544 |00:00:00.11 |   13145 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1		   |	  1 |  87544 |	8463K|	1583   (1)| 00:00:01 |	87544 |00:00:00.11 |   13145 |
|*  2 |   INDEX FULL SCAN DESCENDING| IDX_T1_OBJECT_ID_ASC |	  1 |  87544 |	     |	 196   (1)| 00:00:01 |	87544 |00:00:00.05 |	6019 |
----------------------------------------------------------------------------------------------------------------------------------------------

--DM执行计划
select * from t1 where object_id is not null order by object_id desc;

1   #NSET2: [423, 1000000, 591] 
2     #PRJT2: [423, 1000000, 591]; exp_num(16), is_atom(FALSE) 
3       #SORT3: [423, 1000000, 591]; key_num(1), is_distinct(FALSE), top_flag(0), is_adaptive(0)
4         #SLCT2: [235, 1000000, 591]; NOT(T1.OBJECT_ID IS NULL)
5           #CSCN2: [235, 1000000, 591]; INDEX33633875(T1)

针对这个问题需要明确两点
（1）达梦索引里的排序字段，需要按SQL里的排序顺序来创建对应顺序的索引；
（2）优化器需要根据估算BLKUP的行数来判断是否使用TOP_ORDER进行优化；

那么按照上述条件做出以下调整
1、创建object_id desc顺序的索引

create index idx_t1_object_id_desc on t1(object_id desc);

2、开启top_order_opt_flag=1，即当语句内含有TOP+ORDER，且ORDER BY列属于索引前导列时，TOP_ORDER_OPT_FLAG=1可以根据ORDER BY列对应的基表信息，通过最优的索引进行优化，并且通过hint stat调整T1的估算值，降低索引回表计算的代价。

再看执行计划发现就已经可以利用索引来优化排序了

select /*+ stat(t1,5) top_order_opt_flag(1) */ * from t1 where object_id is not null order by object_id desc;

1   #NSET2: [1, 300, 591] 
2     #PRJT2: [1, 300, 591]; exp_num(16), is_atom(FALSE) 
3       #SLCT2: [1, 300, 591]; NOT(T1.OBJECT_ID IS NULL)
4         #BLKUP2: [1, 300, 591]; IDX_T1_OBJECT_ID_DESC(T1)
5           #SSCN: [1, 300, 591]; IDX_T1_OBJECT_ID_DESC(T1)

案例二：排序+分页

--以下排序分页的SQL写法达梦均支持
select * from T1 where status='VALID' order by last_ddl_time limit 10;
或
select top 10 * from T1 where status='VALID' order by last_ddl_time;
或
select * from (select * from t1 where status='VALID' order by last_ddl_time) where rownum<=10;

1   #NSET2: [295, 10->10, 591] 
2     #PRJT2: [295, 10->10, 591]; exp_num(16), is_atom(FALSE) 
3       #SORT3: [295, 10->10, 591]; key_num(1), is_distinct(FALSE), is_adaptive(0), MEM_USED(39753KB), DISK_USED(0KB)
4         #SLCT2: [235, 316387->682281, 591]; T1.STATUS = 'VALID'
5           #CSCN2: [235, 1000000->1000000, 591]; INDEX33633875(T1)

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        8162	    logical reads
        0	    physical reads
        0	    redo size
        2469	    bytes sent to client
        135	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        412	    exec time(ms)

类似这种写法的SQL其实存在两种情况，即先用SSEK过滤之后再排序，还是先按索引顺序输出后再做SLCT，因此也需要对应不同的优化方法。

情况一：先SSEK再SORT

条件过滤性好的情况，我们可以创建条件过滤的索引来降低结果集，而少量的结果即使进行排序代价也会很低，或者创建组合索引(status,last_ddl_time)，直接SSEK+TOPN消除SORT能起到很好的优化效果。

索引1：先过滤不排序，案例中的过滤性并不好，全表数据量1000000按status条件过滤出682281行，逻辑读2047337，需要1.5s，走索引定位的效率是较差的。

create or replace index idx_t1_status on t1(status);

1   #NSET2: [1391, 10->10, 591] 
2     #PRJT2: [1391, 10->10, 591]; exp_num(16), is_atom(FALSE) 
3       #SORT3: [1391, 10->10, 591]; key_num(1), is_distinct(FALSE), is_adaptive(0), MEM_USED(39753KB), DISK_USED(0KB)
4         #BLKUP2: [1263, 682281->682281, 591]; IDX_T1_STATUS(T1)
5           #SSEK2: [1263, 682281->682281, 591]; scan_type(ASC), IDX_T1_STATUS(T1), scan_range['VALID','VALID']

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        2047337	    logical reads
        0	    physical reads
        0	    redo size
        2469	    bytes sent to client
        165	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        1570	    exec time(ms)

索引2：组合索引过滤+排序，可以消除SORT来降低执行时间。

create or replace index idx_t1_status on t1(status,last_ddl_time);

1   #NSET2: [1393, 10->10, 591] 
2     #PRJT2: [1393, 10->10, 591]; exp_num(16), is_atom(FALSE) 
3       #TOPN2: [1393, 10->10, 591]; 
4         #BLKUP2: [1265, 682281->300, 591]; IDX_T1_STATUS(T1)
5           #SSEK2: [1265, 682281->300, 591]; scan_type(ASC), IDX_T1_STATUS(T1), scan_range[('VALID',min),('VALID',max))

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        995	    logical reads
        0	    physical reads
        0	    redo size
        2469	    bytes sent to client
        165	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        4	    exec time(ms)

情况二：先SSCN顺序输出再SLCT

由于索引扫描是"顺序读"，因此当条件过滤性差的时候SSEK的效率可能反而不如CSCN或SSCN，此时检索到的数据量大，还要对结果集进行排序代价就会很高，所以这种情况我们可以利用索引的有序性，创建排序字段的索引，先提前顺序输出，再SLCT过滤。

优化方法：
1、创建排序字段的索引

create or replace index idx_t1_last_ddl_time_desc on t1(last_ddl_time);

2、top_order_opt_flag=1，采用索引排序优化，优化后可见逻辑读和时间都有较大提升。

select /*+ top_order_opt_flag(1)*/ * from T1 where object_type='TABLE' order by last_ddl_time limit 10;

1   #NSET2: [1, 10->10, 591] 
2     #PRJT2: [1, 10->10, 591]; exp_num(16), is_atom(FALSE) 
3       #TOPN2: [1, 10->10, 591]; 
4         #SLCT2: [1, 300->29, 591]; T1.OBJECT_TYPE = 'TABLE'
5           #BLKUP2: [1, 300->300, 591]; IDX_T1_LAST_DDL_TIME_DESC(T1)
6             #SSCN: [1, 300->300, 591]; IDX_T1_LAST_DDL_TIME_DESC(T1)

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        902	    logical reads
        0	    physical reads
        0	    redo size
        2447	    bytes sent to client
        168	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        2	    exec time(ms)

案例三：多表关联排序——嵌套循环

--优化前：
select 
     t1.owner,
     t1.object_name,
     t1.object_id,
     t1.object_type,
     t1.created,
     t1.status
from t1
inner join t2 on t1.object_id = t2.object_id
where t1.object_name='T2'
  and t1.status = 'VALID'
order by t1.last_ddl_time desc;

1   #NSET2: [256, 1, 278] 
2     #PRJT2: [256, 1, 278]; exp_num(6), is_atom(FALSE) 
3       #SORT3: [256, 1, 278]; key_num(1), is_distinct(FALSE), top_flag(0), is_adaptive(0)
4         #HASH2 INNER JOIN: [255, 1, 278];  KEY_NUM(1); KEY(T1.OBJECT_ID=T2.OBJECT_ID) KEY_NULL_EQU(0)
5           #SLCT2: [165, 1, 248]; (T1.OBJECT_NAME = 'T1' AND T1.STATUS = 'VALID')
6             #CSCN2: [165, 1000000, 248]; INDEX33633875(T1)
7           #CSCN2: [55, 499994, 30]; INDEX33633869(T2)

先看当前的执行计划，由T1先进行过滤，过滤后的结果与T2进行HASH关联，最后通过t1.last_ddl_time进行结果集排序。

那么这种情况要想优化排序，就需要如下两个索引
索引1：首先需要判断T1的谓词条件的整体过滤性，如果过滤性好的话，就可以先通过索引扫描定位数据，并且将排序字段加入组合索引列，使T1结果集作为驱动表，只要保证驱动表结果有序，就能使整体结果有序，从而优化SORT；

create index idx_t1_01 on t1(object_name,status,last_ddl_time desc);

索引2：T2作为被驱动表，被驱动表的关联字段上需要存在索引，才能让这条SQL形成索引嵌套循环的执行计划；

create index idx_t2_01 on t2(object_id);

建完索引后的执行计划，可以看到已经没有SORT操作符

1   #NSET2: [1, 1, 278] 
2     #PRJT2: [1, 1, 278]; exp_num(6), is_atom(FALSE) 
3       #NEST LOOP INDEX JOIN2: [1, 1, 278] 
4         #BLKUP2: [1, 1, 248]; IDX_T1_01(T1)
5           #SSEK2: [1, 1, 248]; scan_type(ASC), IDX_T1_01(T1), scan_range[('T1','VALID',min),('T1','VALID',max))
6         #SSEK2: [1, 1, 30]; scan_type(ASC), IDX_T2_01(T2), scan_range[T1.OBJECT_ID,T1.OBJECT_ID]

案例四：多表关联排序——HASH关联

--优化前：
select
       t1.owner,
       t1.object_name,
       t1.object_id,
       t1.object_type,
       t1.status,
       t1.created,
       t1.last_ddl_time
  from t1
 where t1.object_id in (select t2.object_id
                          from t2
                         where t2.object_type in ('TABLE','INDEX')
                       )
 order by t1.last_ddl_time desc limit 10;

1   #NSET2: [378, 10->10, 260] 
2     #PRJT2: [378, 10->10, 260]; exp_num(8), is_atom(FALSE) 
3       #SORT3: [378, 10->10, 260]; key_num(1), is_distinct(FALSE), is_adaptive(0), MEM_USED(37017KB), DISK_USED(0KB)
4         #HASH RIGHT SEMI JOIN2: [368, 100482->100848, 260]; key_num(1), MEM_USED(23214KB), DISK_USED(0KB) KEY(T2.OBJECT_ID=T1.OBJECT_ID) KEY_NULL_EQU(0)
5           #HASH2 INNER JOIN: [97, 99986->99986, 126];  KEY_NUM(1), MEM_USED(224KB), DISK_USED(0KB) KEY(DMTEMPVIEW_889195273.colname=T2.OBJECT_TYPE) KEY_NULL_EQU(0)
6             #CONST VALUE LIST: [1, 2->2, 48]; row_num(2), col_num(1), 
7             #CSCN2: [60, 499994->499994, 78]; INDEX33633869(T2)
8           #CSCN2: [158, 1000000->1000000, 260]; INDEX33633875(T1)

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        4968	    logical reads
        0	    physical reads
        0	    redo size
        1357	    bytes sent to client
        438	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        417	    exec time(ms)

这个案例中，是以T2为条件过滤，将结果集传递给T1，最终按T1的last_ddl_time desc来做分页查询，显然整体结果集是比较多的，那这种情况是否能处理排序？

其实HASH连接时SORT也是可以消除的，要求是右表有序，因为HASH连接的过程是将左表数据生成HASH表，右表取值往HASH表上探测匹配，最后输出结果，这种情况下，输出结果的顺序就是HASH连接中右表的顺序。
那么按照这个思路，我们希望的执行计划就是：T2(左CSCN) + T1(右SSCN)->HASH JOIN->TOPN，这样输出的结果也可以消除排序。

优化方法：
1、创建T1 LAST_DDL_TIME的倒序索引

create or replace index idx_t1_last_ddl_time_desc on t1(last_ddl_time desc);

2、通过Hint NO_SEMI_GEN_CROSS不倾向半连接转换为等价的内连接，和HI_RIGHT_ORDER_FLAG=1使HASH右孩子顺序输出。

--优化后：
select /*+ NO_SEMI_GEN_CROSS HI_RIGHT_ORDER_FLAG(1) */
       t1.owner,
       t1.object_name,
       t1.object_id,
       t1.object_type,
       t1.status,
       t1.created,
       t1.last_ddl_time
  from t1
 where t1.object_id in (select t2.object_id
                          from t2
                         where t2.object_type in ('TABLE','INDEX')
                       )
 order by t1.last_ddl_time desc limit 10;

1   #NSET2: [391, 10->10, 260] 
2     #PRJT2: [391, 10->10, 260]; exp_num(8), is_atom(FALSE) 
3       #TOPN2: [391, 10->10, 260]; 
4         #HASH RIGHT SEMI JOIN2: [388, 25072->34, 260]; key_num(1), MEM_USED(92436KB), DISK_USED(0KB); join condition(T2.OBJECT_TYPE IN LIST)  KEY(T2.OBJECT_ID=T1.OBJECT_ID) KEY_NULL_EQU(0)
5           #CSCN2: [60, 499994->499994, 78]; INDEX33633869(T2)
6           #BLKUP2: [158, 1000000->300, 260]; IDX_T1_LAST_DDL_TIME_DESC(T1)
7             #SSCN: [158, 1000000->300, 260]; IDX_T1_LAST_DDL_TIME_DESC(T1)

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        2531	    logical reads
        0	    physical reads
        0	    redo size
        1357	    bytes sent to client
        486	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        147	    exec time(ms)

通过对比优化后的执行计划，可以看出优化后原本第三步的SORT变为TOPN，逻辑读和时间也都有所降低。

案例五：MERGE合并

--优化前：
select obj_owner,
       obj_id,
       obj_name,
       obj_type,
       ddl_time
    from (select t1.owner obj_owner,
                 t1.object_id obj_id,
                 t1.object_name obj_name,
                 t1.object_type obj_type,
                 t1.last_ddl_time ddl_time
            from t1
       left join t2 on t1.object_id = t2.object_id
           where t2.object_type = 'TABLE'
             and t2.status='VALID'
       union all
          select t1.owner obj_owner,
                 t1.object_id obj_id,
                 t1.object_name obj_name,
                 t1.object_type obj_type,
                 t1.last_ddl_time ddl_time
            from t1
       left join t2 on t1.object_id = t2.object_id
           where t2.object_type = 'INDEX'
             and t2.status='VALID' ) t
order by t.ddl_time
limit 10;

1   #NSET2: [616, 10->10, 313] 
2     #PRJT2: [616, 10->10, 313]; exp_num(5), is_atom(FALSE) 
3       #PRJT2: [616, 10->10, 313]; exp_num(5), is_atom(FALSE) 
4         #SORT3: [616, 10->10, 313]; key_num(1), is_distinct(FALSE), is_adaptive(0), MEM_USED(33345KB), DISK_USED(0KB)
5           #UNION ALL: [613, 31744->32025, 313]
6             #PRJT2: [303, 15934->16088, 313]; exp_num(5), is_atom(FALSE) 
7               #HASH2 INNER JOIN: [303, 15934->16088, 313];  KEY_NUM(1), MEM_USED(15022KB), DISK_USED(0KB) KEY(T2.OBJECT_ID=T1.OBJECT_ID) KEY_NULL_EQU(0)
8                 #SLCT2: [70, 15855->15922, 126]; (T2.OBJECT_TYPE = 'TABLE' AND T2.STATUS = 'VALID')
9                   #CSCN2: [70, 499994->499994, 126]; INDEX33633869(T2)
10                #CSCN2: [143, 1000000->1000000, 187]; INDEX33633875(T1)
11            #PRJT2: [303, 15809->15937, 313]; exp_num(5), is_atom(FALSE) 
12              #HASH2 INNER JOIN: [303, 15809->15937, 313];  KEY_NUM(1), MEM_USED(15022KB), DISK_USED(0KB) KEY(T2.OBJECT_ID=T1.OBJECT_ID) KEY_NULL_EQU(0)
13                #SLCT2: [70, 15731->15798, 126]; (T2.OBJECT_TYPE = 'INDEX' AND T2.STATUS = 'VALID')
14                  #CSCN2: [70, 499994->499994, 126]; INDEX33633869(T2)
15                #CSCN2: [143, 1000000->1000000, 187]; INDEX33633875(T1)

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        9873	    logical reads
        0	    physical reads
        0	    redo size
        937	    bytes sent to client
        898	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        1269	    exec time(ms)

通过上面的执行计划，可以看出这条SQL是想分别以两个T2 union分支的条件过滤，实际场景中过滤性也是较差的，之后关联T1，最后按整体时间进行排序，这里我们就可以结合案例二和案例三，来利用索引的特性进行优化。

优化方法：
1、分别创建T1以排序字段为前导列的索引和T2关联字段为前导列的索引

create or replace index idx_t1_01 on t1(last_ddl_time,owner,object_id,object_name,object_type);
create or replace index idx_t2_01 on t2(object_id,object_type,status);

2、Hint top_order_opt_flag=1，优先采用索引优化

--优化后：
select /*+ top_order_opt_flag(1) */
       obj_owner,
       obj_id,
       obj_name,
       obj_type,
       ddl_time
    from (select t1.owner obj_owner,
                 t1.object_id obj_id,
                 t1.object_name obj_name,
                 t1.object_type obj_type,
                 t1.last_ddl_time ddl_time
            from t1
       left join t2 on t1.object_id = t2.object_id
           where t2.object_type = 'TABLE'
             and t2.status='VALID'
       union all
          select t1.owner obj_owner,
                 t1.object_id obj_id,
                 t1.object_name obj_name,
                 t1.object_type obj_type,
                 t1.last_ddl_time ddl_time
            from t1
       left join t2 on t1.object_id = t2.object_id
           where t2.object_type = 'INDEX'
             and t2.status='VALID' ) t
order by t.ddl_time
limit 10;

1   #NSET2: [5, 10->10, 313] 
2     #PRJT2: [5, 10->10, 313]; exp_num(5), is_atom(FALSE) 
3       #PRJT2: [5, 10->10, 313]; exp_num(5), is_atom(FALSE) 
4         #TOPN2: [5, 10->10, 313]; 
5           #UNION ALL(MERGE): [5, 600->300, 313] merge_type(A)
6             #PRJT2: [2, 300->152, 313]; exp_num(5), is_atom(FALSE) 
7               #NEST LOOP INDEX JOIN2: [2, 300->152, 313] 
8                 #SSCN: [1, 300->9900, 187]; IDX_T1_01(T1)
9                 #SSEK2: [1, 1->152, 126]; scan_type(ASC), IDX_T2_01(T2), scan_range[(T1.OBJECT_ID,'TABLE','VALID'),(T1.OBJECT_ID,'TABLE','VALID')]
10            #PRJT2: [2, 300->149, 313]; exp_num(5), is_atom(FALSE) 
11              #NEST LOOP INDEX JOIN2: [2, 300->149, 313] 
12                #SSCN: [1, 300->9600, 187]; IDX_T1_01(T1)
13                #SSEK2: [1, 1->149, 126]; scan_type(ASC), IDX_T2_01(T2), scan_range[(T1.OBJECT_ID,'INDEX','VALID'),(T1.OBJECT_ID,'INDEX','VALID')]

Statistics
-----------------------------------------------------------------
        0	    data pages changed
        0	    undo pages changed
        38740	    logical reads
        0	    physical reads
        0	    redo size
        931	    bytes sent to client
        934	    bytes received from client
        1	    roundtrips to/from client
        0	    sorts (memory)
        0	    sorts (disk)
        10	    rows processed
        0	    io wait time(ms)
        150	    exec time(ms)

查看执行计划可见T1与T2先是嵌套循环，两个UNION分支通过共同的索引进行UNION ALL(MERGE)消除了上层的SORT，以驱动表T1(SSCN)顺序输出，提前返回前N条LIMIT记录，需要注意的一点是如果不是UNION ALL而是UNION或外层存在DISTINCT，则SORT是消除不掉的，因为没法保证去重后的结果有序。

总结：

1、通过索引优化排序时，需要按照排序顺序来创建对应顺序的索引；
2、TOP_ORDER_OPT_FLAG老版本默认为0不开启优化，新版本中将会默认开启；
3、排序+分页时，需要根据条件的过滤性来判断先索引定位数据后排序，还是先顺序输出后过滤；
4、嵌套循环需要保证驱动表有序，HASH连接需要保证右表的顺序输出；

SQL优化案例——排序优化

案例一