Atlas集成Hive

发布时间 2024-01-08 13:36:08作者: 粒子先生

修改atlas-application.properties

添加
atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=10000 atlas.cluster.name=primary

如果是内嵌安装需要修改localhost为IP或域名,否则外部无法访问kafka

atlas.kafka.zookeeper.connect=172.31.6.205:9026
atlas.kafka.bootstrap.servers=172.31.6.205:9027

适当增加kafka、zk的超时时间

将配置文件放入插件jar包中

cd /opt/atlas/apache-atlas-sources-2.1.0/distro/target/apache-atlas-2.1.0-bin/apache-atlas-2.1.0/conf
zip -u atlas-application.properties /opt/atlas/apache-atlas-sources-2.1.0/distro/target/apache-atlas-2.1.0-hive-hook/apache-atlas-hive-hook-2.1.0/hook/hive/atlas-plugin-classloader-2.1.0.jar

拷贝atlas-application.properties /opt/atlas/apache-atlas-sources-2.1.0/distro/target/apache-atlas-2.1.0-hive-hook/apache-atlas-hive-hook-2.1.0目录到hive的安装节点

配置环境变量hive-env.sh

vim hive-env.sh
export HIVE_AUX_JARS_PATH=/opt/module/apache-atlas-hive-hook-2.1.0/hook/hive

修改hive-site.xml增加配置

<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook,org.apache.hadoop.hive.ql.hooks.LineageLogger</value>
</property>

重启hive

以上配置完成hive新增元数据的实时同步,但已有的元数据需要手动同步一次,执行以下脚本即可

/opt/module/apache-atlas-hive-hook-2.1.0/hook-bin/import-hive.sh