信创的全称是“信息技术应用创新产业”。过去很多年间,国内IT底层标准、架构、生态等大多数都由国外 IT 巨头制定的,由此存在诸多安全风险。因此,我们要逐步建立基于自己的IT底层架构和标准,形成自有开放生态,而这也是信创产业的核心。信创产业的生态体系极为庞大。
从产业链角度来看,它主要由基础硬件、基础软件、应用软件、信息安全这四部分构成。目前国内的信创产业中,CPU芯片、服务器、操作系统、数据库、中间件是最重要的产业链环节。CPU和操作系统是整个信创产业的根基。
中国CPU产业链正逐步完善,以华为和飞腾为代表的ARM指令集授杈厂商,拥有长期自主研发能力,随着鲲鹏、海光、飞腾等相继进入市场,信创产业市场规模也不断增加。目前国产CPU芯片主要参与者有:华为、飞腾、海光、兆芯等。
国产操作系统:麒麟、统信软件、欧拉、龙蜥等。
国产数据库:达梦、人大金仓、南大通用、万里开源
亚信数据基础设施平台(简称AISWare DIF),提供完整的Hadoop技术生态环境,基于Apache版本自主开发,完全自主可控。同时为企业大数据提供湖仓一体、数据编织、云边协同等能力,支持管理、运营、监控的一体化管控能力,同时也提供专业的大数据运维、咨询等服务,协助客户打造高效,安全的具备云原生架构的大数据基础设施平台。
拥抱信创:实现从硬件芯片、到操作系统、到中间件的全面国产化支持
数据基础设施平台持续进行国产化CPU、操作系统的适配,并取得多项认证。
整个大数据生态体系的元数据存储目前主流的是HMS(Hive Metastore),然而当前社区HMS仅支持Mysql/Mariadb、Oracle、PostgreSQL等数据库,在信创国产化大背景下,亚信科技敢为人先成功将HMS数据存储迁移到达梦数据库。
以下介绍我们HMS迁移达梦数据库遇到的问题和解决办法。
首先对Ambari插件的数据库配置进行改造适配达梦数据库,改造的代码清单如下:
ambari-server/src/main/resources/stacks/DIF/3.0/services/HIVE/package/scripts/params.py
ambari-server/src/main/resources/stacks/DIF/3.0/services/HIVE/package/scripts/hive_service.py
ambari-server/src/main/resources/stacks/service_advisor.py
ambari-server/src/main/resources/custom_actions/scripts/check_host.py
同时,Hive源码中需要加载达梦驱动,改造的代码清单如下:
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcStorageHandler.java
按照以上修改完成,重新部署Ambari,安装Hive,安装成功后启动时,在初始化阶段出现报错,见4.1.1和4.1.2。
报错如下:
org.apache.hadoop.hive.metastore.api.MetaException: Unrecognized database product name <OTHER>
at org.apache.hadoop.hive.metastore.tools.SQLGenerator.addForUpdateClause(SQLGenerator.java:137) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.ObjectStore.lockForUpdate(ObjectStore.java:9676) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:9748) [hive-exec-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:123) [hive-exec-3.1.2.jar:3.1.2]
at com.sun.proxy.$Proxy36.addNotificationEvent(Unknown Source) [?:?]
at org.apache.hive.hcatalog.listener.DbNotificationListener.process(DbNotificationListener.java:743) [hive-hcatalog-server-extensions-3.1.2.jar:3.1.2]
at org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateDatabase(DbNotificationListener.java:350) [hive-hcatalog-server-extensions-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$22.notify(MetaStoreListenerNotifier.java:80) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:267) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1374) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1424) [hive-exec-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:201) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:136) [hive-exec-3.1.2.jar:3.1.2]
at com.sun.proxy.$Proxy40.create_database(Unknown Source) [?:?]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:827) [hive-exec-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) [hive-exec-3.1.2.jar:3.1.2]
at com.sun.proxy.$Proxy41.createDatabase(Unknown Source) [?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:477) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:4828) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:393) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) [hive-exec-3.1.2.jar:3.1.2]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) [hive-service-3.1.2.jar:3.1.2]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_271]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_271]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) [hadoop-common-3.2.2.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) [hive-service-3.1.2.jar:3.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_271]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_271]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_271]
问题分析:
在standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java的determineDatabaseProduct方法中,解析出达梦的dbProduct=OTHER:
if (productName.contains("derby")) {
return DERBY;
} else if (productName.contains("microsoft sql server")) {
return SQLSERVER;
} else if (productName.contains("mysql")) {
return MYSQL;
} else if (productName.contains("oracle")) {
return ORACLE;
} else if (productName.contains("dm dbms")) {
return DM;
}else if (productName.contains("postgresql")) {
return POSTGRES;
} else {
return OTHER;//dm走到此处 productName:DM DBMS
}
因此,当调用standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/tools/SQLGenerator.java的addForUpdateClause方法时,达梦数据库会进入switch语句的default分支,抛出Unrecognized database product name <OTHER>异常:
switch (dbProduct) {
case DERBY:
return selectStatement;
case MYSQL:
case ORACLE: ;
case POSTGRES:
return selectStatement + " for update";
case SQLSERVER:
String modifier = " with (updlock)";
int wherePos = selectStatement.toUpperCase().indexOf(" WHERE ");
if (wherePos < 0) {
return selectStatement + modifier;
}
return selectStatement.substring(0, wherePos) + modifier +
selectStatement.substring(wherePos, selectStatement.length());
default:
String msg = "Unrecognized database product name <" + dbProduct + ">";
LOG.error(msg);
throw new MetaException(msg);
}
解决方法:
修改standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java中的determineDatabaseProduct方法:
修改standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/tools/SQLGenerator.java的addForUpdateClause方法:
报错如下:
0: jdbc:hive2://> CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) )
INFO : Compiling command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b): CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b); Time taken: 0.297 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b): CREATE EXTERNAL TABLE IF NOT EXISTS `BUCKETING_COLS` ( `SD_ID` bigint, `BUCKET_COL_NAME` string, `INTEGER_IDX` int, CONSTRAINT `SYS_PK_BUCKETING_COLS` PRIMARY KEY (`SD_ID`,`INTEGER_IDX`) DISABLE ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT \"SD_ID\", \"BUCKET_COL_NAME\", \"INTEGER_IDX\" FROM \"BUCKETING_COLS\"" )
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) )
INFO : Completed executing command(queryId=ocdp_20221209141545_895fd38e-fab1-4526-bd93-add24c1bcb3b); Time taken: 24.618 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Add request failed : INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) ) (state=08S01,code=1)
Closing: 0: jdbc:hive2://?hive.conf.restricted.list=;hive.security.authorization.sqlstd.confwhitelist=.*;hive.security.authorization.sqlstd.confwhitelist.append=.*;hive.security.authorization.enabled=false;hive.metastore.uris=;hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory;hive.support.concurrency=false;hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:639)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:600)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1608)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Schema script failed, errorcode 2
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1291)
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:1269)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:633)
... 8 more
*** schemaTool failed ***
问题分析:
建表的时候会往系统表里插数据,上述报错可以看出,就是在执行插入语句时失败的
INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?)
这里COMMENT本身是一个关键字但是未加引号,所以执行失败。
查看datanucleus-rdbms.jar的源码
达梦数据库会执行org/datanucleus/store/rdbms/adapter/BaseDatastoreAdapter.java,
而BaseDatastoreAdapter.java中未定义关键字“COMMENT”。
解决方法:
为org/datanucleus/store/rdbms/adapter/SQLConstants.java添加COMMENT作为关键字。
public static final String NONRESERVED_WORDS =
"ADA," +
"C,CATALOG_NAME,CHARACTER_SET_CATALOG,CHARACTER_SET_NAME," +
"CHARACTER_SET_SCHEMA,CLASS_ORIGIN,COBOL,COLLATION_CATALOG," +
"COLLATION_NAME,COLLATION_SCHEMA,COMMENT,COLUMN_NAME,COMMAND_FUNCTION,COMMITTED," +
"CONDITION_NUMBER,CONNECTION_NAME,CONSTRAINT_CATALOG,CONSTRAINT_NAME," +
"CONSTRAINT_SCHEMA,CURSOR_NAME," +
"DATA,DATETIME_INTERVAL_CODE,DATETIME_INTERVAL_PRECISION,DYNAMIC_FUNCTION," +
"FORTRAN," +
"LENGTH," +
"MESSAGE_LENGTH,MESSAGE_OCTET_LENGTH,MESSAGE_TEXT,MORE,MUMPS," +
"NAME,NULLABLE,NUMBER," +
"PASCAL,PLI," +
"REPEATABLE,RETURNED_LENGTH,RETURNED_OCTET_LENGTH,RETURNED_SQLSTATE," +
"ROW_COUNT," +
"SCALE,SCHEMA_NAME,SERIALIZABLE,SERVER_NAME,SUBCLASS_ORIGIN," +
"TABLE_NAME,TYPE," +
"UNCOMMITTED,UNNAMED";
安装Hive时,配置达梦数据库,测试连接成功!
安装启动完成界面:
beeline 链接hive 成功
查询数据库
查询数据表
插入表数据
查询表数据
用disql查询Hive元数据表已存在,Hive元数据表共计75个。
文章
阅读量
获赞