注册
【与达梦同行】亚信大数据基础平台适配国产数据库DM8-Hive篇
技术分享/ 文章详情 /

【与达梦同行】亚信大数据基础平台适配国产数据库DM8-Hive篇

小曼mm 2022/12/30 3280 21 7

1.信创国产化背景

  信创的全称是“信息技术应用创新产业”。过去很多年间,国内IT底层标准、架构、生态等大多数都由国外 IT 巨头制定的,由此存在诸多安全风险。因此,我们要逐步建立基于自己的IT底层架构和标准,形成自有开放生态,而这也是信创产业的核心。信创产业的生态体系极为庞大。

  从产业链角度来看,它主要由基础硬件、基础软件、应用软件、信息安全这四部分构成。目前国内的信创产业中,CPU芯片、服务器、操作系统、数据库、中间件是最重要的产业链环节。CPU和操作系统是整个信创产业的根基。

  中国CPU产业链正逐步完善,以华为和飞腾为代表的ARM指令集授杈厂商,拥有长期自主研发能力,随着鲲鹏、海光、飞腾等相继进入市场,信创产业市场规模也不断增加。目前国产CPU芯片主要参与者有:华为、飞腾、海光、兆芯等。

  国产操作系统:麒麟、统信软件、欧拉、龙蜥等。

  国产数据库:达梦、人大金仓、南大通用、万里开源

2.亚信AISWare DIF平台简介

亚信数据基础设施平台(简称AISWare DIF),提供完整的Hadoop技术生态环境,基于Apache版本自主开发,完全自主可控。同时为企业大数据提供湖仓一体、数据编织、云边协同等能力,支持管理、运营、监控的一体化管控能力,同时也提供专业的大数据运维、咨询等服务,协助客户打造高效,安全的具备云原生架构的大数据基础设施平台。
截屏20221229 22.25.18.png
截屏20221229 22.48.07.png

3.亚信AISWare DIF平台国产化现状

 拥抱信创:实现从硬件芯片、到操作系统、到中间件的全面国产化支持

  • 支持 ARM 生态的 CPU,包括鲲鹏、飞腾等;
  • 支持国产基于 X86 架构的 CPU,包括海光、兆芯等;
  • 支持国产操作系统,包括欧拉、麒麟、统信、BC_Linux 等;
  • 支持不同芯片、不同操作系统、不同服务器系列的混合部署。

 数据基础设施平台持续进行国产化CPU、操作系统的适配,并取得多项认证。
截屏20221229 22.47.47.png

4.国产数据库(达梦)适配

  整个大数据生态体系的元数据存储目前主流的是HMS(Hive Metastore),然而当前社区HMS仅支持Mysql/Mariadb、Oracle、PostgreSQL等数据库,在信创国产化大背景下,亚信科技敢为人先成功将HMS数据存储迁移到达梦数据库。

以下介绍我们HMS迁移达梦数据库遇到的问题和解决办法。

4.1.Hive适配DM8走过的路

首先对Ambari插件的数据库配置进行改造适配达梦数据库,改造的代码清单如下:

ambari-server/src/main/resources/stacks/DIF/3.0/services/HIVE/package/scripts/params.py
ambari-server/src/main/resources/stacks/DIF/3.0/services/HIVE/package/scripts/hive_service.py
ambari-server/src/main/resources/stacks/service_advisor.py
ambari-server/src/main/resources/custom_actions/scripts/check_host.py

同时,Hive源码中需要加载达梦驱动,改造的代码清单如下:

jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcStorageHandler.java

按照以上修改完成,重新部署Ambari,安装Hive,安装成功后启动时,在初始化阶段出现报错,见4.1.1和4.1.2。

4.1.1.创建sys数据库失败

报错如下:

org.apache.hadoop.hive.metastore.api.MetaException: Unrecognized database product name <OTHER>
        at org.apache.hadoop.hive.metastore.tools.SQLGenerator.addForUpdateClause(SQLGenerator.java:137) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.ObjectStore.lockForUpdate(ObjectStore.java:9676) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:9748) [hive-exec-3.1.2.jar:3.1.2]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:123) [hive-exec-3.1.2.jar:3.1.2]
        at com.sun.proxy.$Proxy36.addNotificationEvent(Unknown Source) [?:?]
        at org.apache.hive.hcatalog.listener.DbNotificationListener.process(DbNotificationListener.java:743) [hive-hcatalog-server-extensions-3.1.2.jar:3.1.2]
        at org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateDatabase(DbNotificationListener.java:350) [hive-hcatalog-server-extensions-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$22.notify(MetaStoreListenerNotifier.java:80) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:267) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1374) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1424) [hive-exec-3.1.2.jar:3.1.2]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:201) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:136) [hive-exec-3.1.2.jar:3.1.2]
        at com.sun.proxy.$Proxy40.create_database(Unknown Source) [?:?]
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:827) [hive-exec-3.1.2.jar:3.1.2]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) [hive-exec-3.1.2.jar:3.1.2]
        at com.sun.proxy.$Proxy41.createDatabase(Unknown Source) [?:?]
        at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:477) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:4828) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:393) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) [hive-service-3.1.2.jar:3.1.2]
        at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) [hive-service-3.1.2.jar:3.1.2]
        at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) [hive-service-3.1.2.jar:3.1.2]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_271]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_271]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) [hadoop-common-3.2.2.jar:?]
        at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) [hive-service-3.1.2.jar:3.1.2]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_271]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_271]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_271]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_271]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_271]

问题分析:

在standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java的determineDatabaseProduct方法中,解析出达梦的dbProduct=OTHER:

if (productName.contains("derby")) {
      return DERBY;
    } else if (productName.contains("microsoft sql server")) {
      return SQLSERVER;
    } else if (productName.contains("mysql")) {
      return MYSQL;
    } else if (productName.contains("oracle")) {
      return ORACLE;
    } else if (productName.contains("dm dbms")) {
      return DM;
    }else if (productName.contains("postgresql")) {
      return POSTGRES;
    } else {
      return OTHER;//dm走到此处  productName:DM DBMS
    }

因此,当调用standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/tools/SQLGenerator.java的addForUpdateClause方法时,达梦数据库会进入switch语句的default分支,抛出Unrecognized database product name <OTHER>异常:

switch (dbProduct) {
    case DERBY:
      return selectStatement;
    case MYSQL:
    case ORACLE: ;
    case POSTGRES:
      return selectStatement + " for update";
    case SQLSERVER:
      String modifier = " with (updlock)";
      int wherePos = selectStatement.toUpperCase().indexOf(" WHERE ");
      if (wherePos < 0) {
        return selectStatement + modifier;
      }
      return selectStatement.substring(0, wherePos) + modifier +
          selectStatement.substring(wherePos, selectStatement.length());
    default:
      String msg = "Unrecognized database product name <" + dbProduct + ">";
      LOG.error(msg);
      throw new MetaException(msg);
    }

解决方法:

修改standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java中的determineDatabaseProduct方法:

截屏20221230 11.27.39.png

修改standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/tools/SQLGenerator.java的addForUpdateClause方法:

截屏20221230 11.46.21.png

4.1.2.建表失败

报错如下:
image.png

0: jdbc:hive2://> CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) )
INFO  : Compiling command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b): CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b); Time taken: 0.297 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b): CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) )
INFO  : Completed executing command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b); Time taken: 24.618 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) ) (state=08S01,code=1)
Closing: 0: jdbc:hive2://?hive.conf.restricted.list=;hive.security.authorization.sqlstd.confwhitelist=.*;hive.security.authorization.sqlstd.confwhitelist.append=.*;hive.security.authorization.enabled=false;hive.metastore.uris=;hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory;hive.support.concurrency=false;hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:639)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:600)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1608)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Schema script failed, errorcode 2
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1291)
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1269)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:633)
        ... 8 more
*** schemaTool failed ***

问题分析:
建表的时候会往系统表里插数据,上述报错可以看出,就是在执行插入语句时失败的

INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) 

这里COMMENT本身是一个关键字但是未加引号,所以执行失败。

查看datanucleus-rdbms.jar的源码
达梦数据库会执行org/datanucleus/store/rdbms/adapter/BaseDatastoreAdapter.java,
而BaseDatastoreAdapter.java中未定义关键字“COMMENT”。

解决方法:
为org/datanucleus/store/rdbms/adapter/SQLConstants.java添加COMMENT作为关键字。

public static final String NONRESERVED_WORDS =
        "ADA," +
        "C,CATALOG_NAME,CHARACTER_SET_CATALOG,CHARACTER_SET_NAME," +
        "CHARACTER_SET_SCHEMA,CLASS_ORIGIN,COBOL,COLLATION_CATALOG," +
        "COLLATION_NAME,COLLATION_SCHEMA,COMMENT,COLUMN_NAME,COMMAND_FUNCTION,COMMITTED," +
        "CONDITION_NUMBER,CONNECTION_NAME,CONSTRAINT_CATALOG,CONSTRAINT_NAME," +
        "CONSTRAINT_SCHEMA,CURSOR_NAME," +
        "DATA,DATETIME_INTERVAL_CODE,DATETIME_INTERVAL_PRECISION,DYNAMIC_FUNCTION," +
        "FORTRAN," +
        "LENGTH," +
        "MESSAGE_LENGTH,MESSAGE_OCTET_LENGTH,MESSAGE_TEXT,MORE,MUMPS," +
        "NAME,NULLABLE,NUMBER," +
        "PASCAL,PLI," +
        "REPEATABLE,RETURNED_LENGTH,RETURNED_OCTET_LENGTH,RETURNED_SQLSTATE," +
        "ROW_COUNT," +
        "SCALE,SCHEMA_NAME,SERIALIZABLE,SERVER_NAME,SUBCLASS_ORIGIN," +
        "TABLE_NAME,TYPE," +
        "UNCOMMITTED,UNNAMED";

4.1.3.测试安装Hive

安装Hive时,配置达梦数据库,测试连接成功!
WechatIMG20667.png
安装启动完成界面:
截屏20221230 12.40.56.png

4.1.4.验证

4.1.4.1.通过beeline客户端方式

beeline 链接hive 成功
截屏20221230 12.26.59.png
查询数据库
截屏20221230 16.27.32.png
查询数据表
截屏20221230 16.27.51.png
插入表数据
截屏20221230 16.28.08.png
查询表数据
截屏20221230 16.28.27.png

4.1.4.2.通过disql方式

用disql查询Hive元数据表已存在,Hive元数据表共计75个。
截屏20221230 12.29.19.png
截屏20221230 12.29.45.png

评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服