1. 概述

1.1. 背景

find_in_set是MySQL特有的函数，用于查找一个值是否存在于一个逗号分隔的字符串列表中。其效率主要取决于字符串列表的长度和查询的复杂度。对于较短的字符串列表，find_in_set函数的性能可能是可以接受的。然而，当字符串列表很长或查询非常频繁时，该函数的性能可能会受到影响，因为它需要对整个字符串列表进行遍历比较。
某单位某某系统从MySQL移植到DM8，其中有一个作业每个小时运行1次，作业过程多次调用find_in_set函数。监控显示该作业执行缓慢且运行期间数据库内存使用率大幅上涨，经分析为该自定义函数导致。本文基于该案例进行模拟和验证，并提出了可行的优化解决方案。

1.2. FIND_IN_SET函数优化

为减少适配兼容性问题，DM8于22年10月以后的版本支持了该函数。DM8新版本内置的FIND_IN_SET在底层进行了优化，其性能比自定义函数大幅提升。本案例用户DM8版本是21年的，还没有支持该函数。
使用集合类型等价改写取代FIND_IN_SET，实验数据显示性能有巨幅提升。下图展示和总结了对FIND_IN_SET的2种等价优化方法的实验数据对比效果：
find_in_set优化路径对比图.png

2. 测试验证

2.1. 实验数据验证

2.1.1. 表DDL

实验基于业务代码进行了大幅简化模拟，以下是表相关DDL，代码创建了2张表t_app2/t_app2_policy，分别有1000/250万记录，t_app2每个app_id对应有10个记录：

-- 1. 表 
create table t_app2 (app_id int not null, appname varchar(55), policy_id DECIMAL(22,0), appver varchar(10), created datetime, is_deleted smallint);
create table t_app2_policy (user_id int not null, apppolicyname varchar(355), dept_id int not null, app_id int, policy_id DECIMAL(22,0), created datetime);

-- 2. 数据
-- 1000
insert into t_app2
select level%100, 'APP'||level%100, dbms_random.value(1000000000000000000,2000000000000000000), 'v'||level, now()-dbms_random.value(1,90), decode(level%50,0,1,0)
from dual connect by level <= 1000;
commit;

-- 2500000
insert into t_app2_policy
select level%2000, repeat('P',125)||level%50, 10000+level%50, level%100, dbms_random.value(1000000000000000000,2000000000000000000), now()-dbms_random.value(1,90)
from dual connect by level <= 2500000;
commit;

-- 3. 索引
create index idx_t_app_policy_policy_id on t_app2_policy(policy_id) parallel 8;
create index idx_t_app_app_id on t_app2(app_id) ;

2.1.2. 自定义find_in_set

以下代码是自定义的find_in_set函数：

-- 4. find_in_set 函数
create function find_in_set(oldStr clob, childstr varchar2) return number is
        str          clob;
        currentIndex pls_integer;
        startIndex   pls_integer;
        endIndex     pls_integer;
        vn_sign varchar2(6) := ',';

        type str_type is table of varchar2(64) index by binary_integer;
        arr    str_type;
        Result number := 0;
    begin
        -- 空字符串
        if oldStr is null or oldStr = '0' then
            return(0);
        end if;

        str := oldStr||vn_sign;

        currentIndex := 0;
        startIndex   := 0;

        loop
            currentIndex := currentIndex + 1;
            endIndex     := instr(str, vn_sign, 1, currentIndex);
            if (endIndex <= 0) then
                exit;
            end if;

            arr(currentIndex) := trim(substr(str, startIndex + 1, endIndex - startIndex - 1));
            startIndex := endIndex;
        end loop;

        --取最后一个字符串:
        arr(currentIndex) := substr(str, startIndex + 1, length(str));

        --去掉重复出现的字符串:
        for i in 1 .. currentIndex - 1 loop
            if arr(i) = childstr then
                result := 1;
            end if;
        end loop;

        return(Result);
    end find_in_set;

2.1.3. 2个对比过程DDL

以下代码定义了2个过程，其中p_get_app_policy3是对业务代码的简化，代码简要的逻辑可以概况为：从表t_app2依据app_id获取对应的policy_id列拼接成字符串，然后利用这个长字符串对其它N个表（这里简化后仅保留1个）结合函数find_in_set判断policy_id是否存在于字符串中并插入到临时表（简化后的代码直接改成了查询）；其中p_get_app_policy3_turn2利用嵌套表集合类型实现了同样的逻辑：

-- 5. 过程
-- 简化模拟
create or replace procedure p_get_app_policy3 (v_app_id in number, v_errcode out number, v_errmsg out varchar2)
as
    l_apppolicyids clob;
begin
    v_errcode := 0;
    v_errmsg := '成功';
    if v_app_id is null then
        return;
    end if;
    
    select wm_concat(policy_id) into l_apppolicyids from t_app2 where app_id = v_app_id and is_deleted = 0;
    if l_apppolicyids is null or l_apppolicyids = '' then
        return;
		dbms_output.put_line('Nothing to do.');
    end if;
    
    select distinct a.user_id from t_app2_policy a where sysdba.find_in_set(l_apppolicyids, a.policy_id) > 0;
    
    exception
        when others then
        v_errcode := '-1';
        v_errmsg := 'dddd';
		dbms_output.put_line('errcode='||v_errcode||' errmsg='||v_errmsg);
end;
/

-- 优化后的过程-使用嵌套表
create or replace procedure p_get_app_policy3_turn2 (v_app_id in number, v_errcode out number, v_errmsg out varchar2)
as
    cursor cs for select policy_id from t_app2 where app_id = v_app_id and is_deleted = 0 and policy_id is not null;
    type t_policy_ids is table of t_app2.POLICY_ID%TYPE;
	l_t_policy_ids t_policy_ids;
begin
    v_errcode := 0;
    v_errmsg := '成功';
    if v_app_id is null then
        return;
		dbms_output.put_line('Nothing to do.');
    end if;
    
	open cs;
    fetch cs bulk collect into l_t_policy_ids;
    close cs;
    if l_t_policy_ids.count == 0 then
        return;
    end if;
    
    select distinct a.user_id 
	from t_app2_policy a
	where a.policy_id in (select * from table(l_t_policy_ids));
    
    exception
        when others then
        v_errcode := '-1';
        v_errmsg := 'dddd';
		dbms_output.put_line('errcode='||v_errcode||' errmsg='||v_errmsg);
end;
/

2.1.4. 对比测试

以下代码在disql工具调用2个过程传入相同的值18，并查询执行统计信息用于比对：

-- 6. 测试
set time on;
set feed on;
set serveroutput on

alter system set 'ENABLE_MONITOR'=1;
alter session set 'MONITOR_SQL_EXEC'=1;

call p_get_app_policy3(18, null, null);

et(?); -- 以上过程的执行号
select sql_id, sql_txt, exec_time, tab_scan_cnt, logic_read_cnt, phy_read_cnt, io_wait_time, max_mem_used from v$sql_stat_history where sql_txt like 'call p_get_app_policy3(18%';

call p_get_app_policy3_turn2(18, null, null);
et(?);-- 以上过程的执行号
select sql_id, sql_txt, exec_time, tab_scan_cnt, logic_read_cnt, phy_read_cnt, io_wait_time, max_mem_used from v$sql_stat_history where sql_txt like 'call p_get_app_policy3_turn2(18%';

2.1.5. 执行结果-未优化

过程p_get_app_policy3的执行统计信息如下：

-- 7. 结果 p_get_app_policy3
行号     OP     TIME(US)             PERCENT RANK                 SEQ         N_ENTER     MEM_USED(KB)         DISK_USED(KB)        HASH_USED_CELLS
---------- ------ -------------------- ------- -------------------- ----------- ----------- -------------------- -------------------- --------------------
           HASH_CONFLICT        DHASH3_USED_CELLS DHASH3_CONFLICT HASH_SAME_VALUE
           -------------------- ----------------- --------------- --------------------
1          PRJT2  2                    0%      13                   2           2           0                    0                    0
           0                    NULL              NULL            0

2          PRJT2  2                    0%      13                   2           4           0                    0                    0
           0                    NULL              NULL            0

3          DLCK   3                    0%      12                   0           2           0                    0                    0
           0                    NULL              NULL            0

4          DIST   4                    0%      11                   3           2           0                    0                    0
           0                    NULL              NULL            0

5          SLTIN2 6                    0%      10                   1           3           0                    0                    0
           0                    NULL              NULL            0

6          DLCK   8                    0%      9                    0           2           0                    0                    0
           0                    NULL              NULL            0

7          SLCT2  14                   0%      8                    5           4           0                    0                    0
           0                    NULL              NULL            0

8          NSET2  16                   0%      7                    1           2           0                    0                    0
           0                    NULL              NULL            0

9          AAGR2  28                   0%      6                    3           4           0                    0                    0
           0                    NULL              NULL            0

10         PRJT2  43                   0%      5                    4           4           0                    0                    0
           0                    NULL              NULL            0

11         SSEK2  53                   0%      4                    7           2           0                    0                    0
           0                    NULL              NULL            0

12         BLKUP2 72                   0%      3                    6           4           0                    0                    0
           0                    NULL              NULL            0

13         CSCN2  336419               0.2%    2                    5           8335        0                    0                    0
           0                    NULL              NULL            0

14         SLCT2  164971127            99.8%   1                    4           8336        0                    0                    0
           0                    NULL              NULL            0


14 rows got


行号     SQL_ID      SQL_TXT                                           EXEC_TIME            TAB_SCAN_CNT         LOGIC_READ_CNT       PHY_READ_CNT
---------- ----------- ------------------------------------------------- -------------------- -------------------- -------------------- --------------------
           IO_WAIT_TIME         MAX_MEM_USED
           -------------------- --------------------
1          1907        call p_get_app_policy3(18, null, null); 165425               1                    14148                0
           0                    188246208


1 rows got

以上显示，未经优化的过程p_get_app_policy3总耗时165425毫秒，其中99.8%的耗时在执行计划的SLCT2步骤（即find_in_set作为谓词筛选），1次表全扫描，逻辑读14148次，物理读0次，IO等待0毫秒，内存消耗188246208KB。

2.1.6. 执行结果-优化

过程p_get_app_policy3_turn2的执行统计信息如下：

-- 7. 结果 p_get_app_policy3_turn2
行号     OP     TIME(US)             PERCENT RANK                 SEQ         N_ENTER     MEM_USED(KB)         DISK_USED(KB)        HASH_USED_CELLS
---------- ------ -------------------- ------- -------------------- ----------- ----------- -------------------- -------------------- --------------------
           HASH_CONFLICT        DHASH3_USED_CELLS DHASH3_CONFLICT HASH_SAME_VALUE
           -------------------- ----------------- --------------- --------------------
1          DIST   1                    0.33%   14                   3           2           0                    0                    0
           0                    NULL              NULL            0

2          PRJT2  2                    0.66%   12                   2           4           0                    0                    0
           0                    NULL              NULL            0

3          IJI2   2                    0.66%   12                   4           13          0                    0                    0
           0                    NULL              NULL            0

4          DLCK   4                    1.33%   11                   0           2           0                    0                    0
           0                    NULL              NULL            0

5          NSET2  5                    1.66%   9                    1           2           0                    0                    0
           0                    NULL              NULL            0

6          ASCN   5                    1.66%   9                    7           2           0                    0                    0
           0                    NULL              NULL            0

7          BLKUP2 6                    1.99%   8                    8           20          0                    0                    0
           0                    NULL              NULL            0

8          DLCK   8                    2.66%   7                    0           2           0                    0                    0
           0                    NULL              NULL            0

9          SLCT2  11                   3.65%   6                    3           4           0                    0                    0
           0                    NULL              NULL            0

10         NSET2  21                   6.98%   5                    1           3           0                    0                    0
           0                    NULL              NULL            0

11         DIST   22                   7.31%   4                    5           4           7                    0                    10
           0                    NULL              NULL            0

12         SSEK2  46                   15.28%  3                    5           2           0                    0                    0
           0                    NULL              NULL            0

13         BLKUP2 60                   19.93%  2                    4           4           0                    0                    0
           0                    NULL              NULL            0

14         SSEK2  108                  35.88%  1                    9           10          0                    0                    0
           0                    NULL              NULL            0


14 rows got


行号     SQL_ID      SQL_TXT                                       EXEC_TIME            TAB_SCAN_CNT         LOGIC_READ_CNT       PHY_READ_CNT
---------- ----------- --------------------------------------------- -------------------- -------------------- -------------------- --------------------
           IO_WAIT_TIME         MAX_MEM_USED
           -------------------- --------------------
1          302         call p_get_app_policy3_turn2(18, null, null); 2                    0                    124                  0
           0                    483264


1 rows got

以上显示，优化后的过程p_get_app_policy3总耗时2毫秒，其中35.88%的耗时在执行计划的SSEK2步骤（即二级索引回表），0次表全扫描，逻辑读124次，物理读0次，IO等待0毫秒，累计内存消耗483264KB。

2.1.7. 效果比对

整理以上执行结果如下，注意这里补充了优化方式1（p_get_app_policy3过程引用的find_in_set改为DM8内置函数，方式2即以上改写的代码过程）：
对比图1.png

以上表格显示，使用新版本数据库自带FIND_IN_SET函数性能提升约120倍，使用嵌套表集合类型等价改写优化性能提升约8万倍，2种优化不仅性能大幅提升，内存消耗也大幅减少。但因为模拟的2个表没有数据交集，所以真实的性能提升效果以下文的业务数据验证为准。

2.2. 业务数据验证

业务数据导到测试环境的简化过程对比测试结果如下，因为用户环境版本没有支持系统定义的find_in_set，所以只有2个对比项：
对比图2.png

基于业务数据，优化后的过程性能提升超过200倍，耗时仅303毫秒。

3. 总结

对自定义函数常见的优化方法是改写为确定性函数，改写后数据库可以在批量执行中复用函数调用值以提升性能。但改写为确定性函数有一定的前提，引用了表数据或者有调用系统时间之类的不确定性函数的不支持。本案例的函数可以改为确定性函数，但为缩减篇幅没有提供相应测试结果，实测结果没有提升。
除了本文描述的优化方法以外，还有其它非等价优化方法取代FIND_IN_SET也能实现类似功能且能大幅提升性能，但前提是对数据的规范性有要求。譬如，业务保证处理的数据中绝对不会含有分隔符如“,”或者保证数据长度固定，则可以直接使用INSTR函数替代，如INSTR(?||’,’,A.COLNAME||’,’)>0。
升级数据库版本到更新的可以使用系统内置的find_in_set函数，性能有大幅提升，但是该方法存在2大阻力：

版本升级的人力成本大，需要对新版本做好全面的功能和性能测试，还需要升级变更（数据量大、版本跨度大的数据库版本升级耗时高）；
即便版本升级了，该场景下以函数作为单一的谓词筛选数据依然存在无法利用二级索引的性能问题，本例表总量仅250万，如果数据量达到上亿级，全扫描的代价将无法接受。

使用嵌套表集合类型等价改写取代find_in_set函数可以实现大幅的性能提升。虽然对业务过程代码造成了入侵，但这样改造带来的收益是值得的，何况改写以后的代码符合SQL92/SQL05标准，代码可移植性得到加强。另外，该方法还能轻松应对FIND_IN_SET函数的复杂版本（指定匹配个数或者分隔符），对数据提前做好聚合写入嵌套表集合即可，基于这一点又能一定程度上实现优化改进（避免在长字符串中多次查找）。

彻底优化FIND_IN_SET函数‌