注册
【与达梦同行】tpc-h基准测试在达梦数据库dm8的进行
技术分享/ 文章详情 /

【与达梦同行】tpc-h基准测试在达梦数据库dm8的进行

亨利 2022/12/13 2307 3 1

导读

读者可以从本文有以下收获

  • 什么是达梦的列式表
  • dm8如何批量导入数据
  • 进行tpc-h表注意什么
  • tpc-h的测试过程
  • 达梦dm8支持复杂SQL语句查询。

tpc-h测试是什么

**TPC-H是一套针对数据库决策支持能力的测试基准。**通过模拟数据库中与业务相关的复杂查询和并行的数据修改操作考察数据库的综合处理能力,获取数据库操作的响应时间和每小时执行的查询数指标(QphH@Size)。
TPC-H基准模型涉及22条复杂的select查询流语句和2条带有insert和delete程序段的更新流语句。测试分为Power测试和Throughout测试两种类型,Power测试是随机执行22条查询流中的一条测试流和2条更新流中的一条测试流,考核指标为QppH@Size;Throughout测试执行的是多条查询流和一条更新流对数据库的混合操作,考核指标是QthH@Size,Power测试和Throughout测试通过数理方式合成的结果为TPC-H基准测试中最关键的一项指标:每小时数据库查询数(QphH@Size),是QppH@Size和QthH@Size结果的乘积的1/2次方。

创建tpc-h 相关的8个表

官方文档

https://blog.csdn.net/weixin_44472170/article/details/119844920

Huge File System(检查HFS)是达梦数据库实现的,针对海量数据进行分析的一种高效、简单的列存储机制。列存储表(也称为HUGE表)就是建立在HFS存储机制上的一种表。

HUGE表是建立在自己特有的表空间HTS(HUGE TABLESPACE,即HUGE表空间)上的。最多可创建32767个HUGE表空间,其相关信息存储在动态视图V$HUGE_TABLESPACE中。

这个表空间与普通的表空间不同。普通的表空间,数据是通过段、簇、页来管理的,并且以固定大小(4K、8K、16K、32K)的页面为管理单位;而HUGE表空间是通过HFS存储机制来管理的,它相当于一个文件系统。创建一个HTS,其实就是创建一个空的文件目录(系统中有一个默认HTS,目录名为HMAIN)。在创建一个HUGE表并插入数据时,数据库会在指定的HTS表空间目录下创建一系列的目录及文件。

笔者认为tpc-h的测试更适合建立huge表,从真实的业务场景出发,未来的8个需要不停的增加数据,所以建表采用without DELTA。

建表


SQL> CREATE HUGE TABLESPACE TPCH PATH '/dm8/data/DAMENG/tpch';
executed successfully
used time: 87.388(ms). Execute id is 2134302.



create huge table CUSTOMER        
(        
C_CUSTKEY                 int not null,                
C_NAME                    varchar(25) not null,    
C_ADDRESS                 varchar(40) not null,     
C_NATIONKEY               int not null,              
C_PHONE                   char(15) not null,        
C_ACCTBAL                 float not null,    
C_MKTSEGMENT             char(10) not null,        
C_COMMENT                 varchar(117) not null,    
primary key (C_CUSTKEY)
)STORAGE(WITH DELTA) tablespace TPCH;
       
        
create huge table LINEITEM        
(        
L_ORDERKEY               int not null,       
L_PARTKEY                int not null,     
L_SUPPKEY                int not null,    
L_LINENUMBER             int not null,    
L_QUANTITY               float not null,    
L_EXTENDEDPRICE          float not null,    
L_DISCOUNT               float not null,    
L_TAX                    float not null,    
L_RETURNFLAG             char(1) not null,    
L_LINESTATUS             char(1) not null,     
L_SHIPDATE               date not null,    
L_COMMITDATE             date not null,    
L_RECEIPTDATE            date not null,    
L_SHIPINSTRUCT           char(25) not null,    
L_SHIPMODE               char(10) not null,     
L_COMMENT                varchar(44) not null,    
primary key(L_ORDERKEY , L_LINENUMBER)        
)STORAGE(WITH DELTA) tablespace TPCH;        
        
create huge table NATION        
(        
N_NATIONKEY              int not null,             
N_NAME                   char(25) not null,        
N_REGIONKEY              int not null,             
N_COMMENT                varchar(152) not null,    
primary key (N_NATIONKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;        
        
create huge table ORDERS        
(        
O_ORDERKEY               int not null,     
O_CUSTKEY                int not null,    
O_ORDERSTATUS            char(1) not null,     
O_TOTALPRICE             float not null,    
O_ORDERDATE              date not null,    
O_ORDERPRIORITY          char(15) not null,    
O_CLERK                  char(15) not null,    
O_SHIPPRIORITY           integer not null,    
O_COMMENT                varchar(79) not null,    
primary key(O_ORDERKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;       
        
create huge table part        
(        
P_PARTKEY        int not null,
P_NAME                    varchar(55) not null,    
P_MFGR                    char(25) not null,    
P_BRAND                   char(10) not null,    
P_TYPE                    varchar(25) not null,    
P_SIZE                    int not null,    
P_CONTAINER               char(10) not null,    
P_RETAILPRICE             float not null,    
P_COMMENT                 varchar(23) not null,    
primary key (P_PARTKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;       
        
create huge table PARTSUPP        
(        
PS_PARTKEY                int not null,                    
PS_SUPPKEY                int not null,                    
PS_AVAILQTY               int not null,                    
PS_SUPPLYCOST             float not null,    
PS_COMMENT                varchar(199) not null,    
 primary key (PS_PARTKEY , PS_SUPPKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;        
        
create huge table REGION        
(        
R_REGIONKEY              int not null,               
R_NAME                   char(25) not null,          
R_COMMENT                varchar(152) not null,     
primary key (R_REGIONKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;        
        
create huge table SUPPLIER        
(        
S_SUPPKEY                 int not null,                   
S_NAME                    char(25) not null,             
S_ADDRESS                 varchar(40) not null,          
S_NATIONKEY               int not null,                   
S_PHONE                   char(15) not null,             
S_ACCTBAL                 float not null,    
S_COMMENT                 varchar(101) not null,        
primary key (S_SUPPKEY)        
)STORAGE(WITH DELTA) tablespace TPCH;  


定入控制配置文件

dm8批量导入数据的方法首先需要定义配置文件,然后再运行导入命令并指定配置文件。

[root@dm8 data]# cat customer.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 1500000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/customer.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_customer.bad'
APPEND
INTO TABLE CUSTOMER
FIELDS ','
(
C_CUSTKEY                ,
C_NAME                   ,
C_ADDRESS                ,
C_NATIONKEY              ,
C_PHONE                  ,
C_ACCTBAL                ,
C_MKTSEGMENT               ,
C_COMMENT
)


[root@dm8 data]# cat lineitem.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 60000000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/lineitem.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_lineitem.bad'
APPEND
INTO TABLE LINEITEM
FIELDS ','
(
L_ORDERKEY               ,
L_PARTKEY                ,
L_SUPPKEY                ,
L_LINENUMBER             ,
L_QUANTITY               ,
L_EXTENDEDPRICE          ,
L_DISCOUNT               ,
L_TAX                    ,
L_RETURNFLAG             ,
L_LINESTATUS             ,
L_SHIPDATE              date format 'yyyy-mm-dd',
L_COMMITDATE            date format 'yyyy-mm-dd',
L_RECEIPTDATE           date format 'yyyy-mm-dd',
L_SHIPINSTRUCT           ,
L_SHIPMODE               ,
L_COMMENT
)


[root@dm8 data]# cat nation.ctrl
OPTIONS (
rows = 50000
skip = 0
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/nation.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_nation.bad'
APPEND
INTO TABLE NATION
FIELDS ','
(
N_NATIONKEY           ,
N_NAME                  ,
N_REGIONKEY             ,
N_COMMENT
)



[root@dm8 data]# cat orders.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 15000000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/orders.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_orders.bad'
APPEND
INTO TABLE ORDERS
FIELDS ','
(
O_ORDERKEY               ,
O_CUSTKEY                ,
O_ORDERSTATUS            ,
O_TOTALPRICE             ,
O_ORDERDATE             date format 'yyyy-mm-dd' ,
O_ORDERPRIORITY          ,
O_CLERK                  ,
O_SHIPPRIORITY           ,
O_COMMENT
)



[root@dm8 data]# cat part.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 2000000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/part.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_part.bad'
APPEND
INTO TABLE PART
FIELDS ','
(
P_PARTKEY               ,
P_NAME                   ,
P_MFGR                   ,
P_BRAND                  ,
P_TYPE                   ,
P_SIZE                   ,
P_CONTAINER              ,
P_RETAILPRICE            ,
P_COMMENT
)


[root@dm8 data]# cat partsupp.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 8000000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/partsupp.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_partsupp.bad'
APPEND
INTO TABLE PARTSUPP
FIELDS ','
(
PS_PARTKEY               ,
PS_SUPPKEY               ,
PS_AVAILQTY              ,
PS_SUPPLYCOST            ,
PS_COMMENT
)


[root@dm8 data]# cat region.ctrl
OPTIONS (
rows = 50000
skip = 0
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/region.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_region.bad'
APPEND
INTO TABLE REGION
FIELDS ','
(
R_REGIONKEY             ,
R_NAME                  ,
R_COMMENT
)



[root@dm8 data]# cat supplier.ctrl
OPTIONS (
rows = 50000
skip = 0
errors = 100000
)
LOAD DATA
INFILE '/home/dmdba/dmdbms/bin/small/supplier.csv'
BADFILE '/home/dmdba/dmdbms/bin/small/dfldr_supplier.bad'
APPEND
INTO TABLE SUPPLIER
FIELDS ','
(
S_SUPPKEY                ,
S_NAME                   ,
S_ADDRESS                ,
S_NATIONKEY              ,
S_PHONE                  ,
S_ACCTBAL                ,
S_COMMENT
)



执行dm8的导入程序

执行数据导入程序

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/customer.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/lineitem.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/nation.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/orders.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/partsupp.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/part.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/region.ctrl\'

./dmfldr userid=henley2/Dameng12345:5236 control=\'/home/dmdba/dmdbms/bin/small/supplier.ctrl\'

收集统计信息

笔者按照前人的路上摘录如下,真实客观的测试不需要收集统计信息,即使这样提升了也是会有水分在里面, 我认为这一步可免,最好是在其它地方调优。

--删除表上所有列的统计信息
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'CUSTOMER');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'LINEITEM');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'NATION');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'ORDERS');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'PART');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'PARTSUPP');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'REGION');
CALL SP_TAB_COL_STAT_DEINIT ('HENLEY2', 'SUPPLIER');
--删除表的统计信息
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'CUSTOMER');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'LINEITEM');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'NATION');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'ORDERS');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'PART');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'PARTSUPP');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'REGION');
CALL SP_TAB_STAT_DEINIT ('HENLEY2', 'SUPPLIER');
--更新统计信息
SP_TAB_STAT_INIT('HENLEY2','REGION');
SP_TAB_STAT_INIT('HENLEY2','NATION');
SP_TAB_STAT_INIT('HENLEY2','PART');
SP_TAB_STAT_INIT('HENLEY2','PARTSUPP');
SP_TAB_STAT_INIT('HENLEY2','SUPPLIER');
SP_TAB_STAT_INIT('HENLEY2','CUSTOMER');
SP_TAB_STAT_INIT('HENLEY2','ORDERS');
SP_TAB_STAT_INIT('HENLEY2','LINEITEM');

STAT 100 ON REGION(R_NAME) ;
STAT 100 ON REGION(R_REGIONKEY) ;
STAT 100 ON NATION(N_NAME) ;
STAT 100 ON NATION(N_NATIONKEY) ;
STAT 100 ON NATION(N_REGIONKEY) ;
STAT 100 ON SUPPLIER(S_SUPPKEY) ;
STAT 100 ON SUPPLIER(S_NATIONKEY) ;
STAT 100 ON SUPPLIER(S_COMMENT) ;

STAT 100 ON PART(P_SIZE);
STAT 100 ON PART(P_BRAND);
STAT 100 ON PART(P_TYPE);
STAT 100 ON PART(P_NAME);
STAT 100 ON PART(P_PARTKEY);
STAT 100 ON PART(P_CONTAINER);

STAT 100 ON PARTSUPP(PS_SUPPKEY);
STAT 100 ON PARTSUPP(PS_PARTKEY);

STAT 100 ON ORDERS(O_ORDERKEY);
STAT 100 ON ORDERS(O_ORDERDATE);
STAT 100 ON ORDERS(O_ORDERSTATUS);
STAT 100 ON ORDERS(O_ORDERPRIORITY);
STAT 100 ON ORDERS(O_CUSTKEY);
STAT 100 ON ORDERS(O_COMMENT);

STAT 100 ON LINEITEM(L_SUPPKEY);
STAT 100 ON LINEITEM(L_PARTKEY);
STAT 100 ON LINEITEM(L_ORDERKEY);
STAT 100 ON LINEITEM(L_SHIPDATE);
STAT 100 ON LINEITEM(L_SHIPMODE);
STAT 100 ON LINEITEM(L_COMMITDATE);
STAT 100 ON LINEITEM(L_RECEIPTDATE);
STAT 100 ON LINEITEM(L_RETURNFLAG);
STAT 100 ON LINEITEM(L_LINESTATUS);
STAT 100 ON LINEITEM(L_QUANTITY);
STAT 100 ON LINEITEM(L_SHIPINSTRUCT);

STAT 100 ON CUSTOMER(C_CUSTKEY);
STAT 100 ON CUSTOMER(C_MKTSEGMENT);
STAT 100 ON CUSTOMER(C_NATIONKEY);
STAT 100 ON CUSTOMER(C_ACCTBAL);



依次执行22条tpc-h语句

经测试,22条语句都支持,下面摘录一条SQL语句,22条语句是能运行了,但是调优的地方还是有很多的。

[root@hdp1 bin]# ./login5236_henley.sh 

Server[LOCALHOST:5236]:mode is normal, state is open
login used time : 7.486(ms)
disql V8
SQL>  select
           cntrycode,
           count(*) as numcust,
        sum(c_acctbal) as totacctbal
from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal
                 from customer
           where substring(c_phone from 1 for 2) in ('13','31','23','29','30','18','17')
                and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00
                                             and substring (c_phone from 1 for 2) in ('13','31','23','29','30','18','17')
                                  )
                  and not exists ( select * from orders where o_custkey = c_custkey )
       ) as custsale
  group by cntrycode
  order by cntrycode;

LINEID     CNTRYCODE NUMCUST              TOTACCTBAL               
---------- --------- -------------------- -------------------------
1          17        670                  4.994718700000005E+06
2          29        711                  5.277740329999998E+06
3          30        692                  5.148542949999996E+06
4          31        724                  5.414669359999999E+06

used time: 55.335(ms). Execute id is 2135600.


评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服