Hudi学习笔记4 - Hudi配置之Spark配置

发布时间 2023-05-08 15:51:00作者: -见

Spark Datasource Configs

  • 读配置
配置项 是否必须 默认值 配置说明
as.of.instant Y N/A 0.9.0 版本新增,时间旅行查询从哪儿开始,有两种格式的值:yyyyMMddHHmmss 和 yyyy-MM-dd HH:mm:ss,如果不指定则从最新的 snapshot 开始
hoodie.file.index.enable N true
hoodie.schema.on.read.enable N false
hoodie.datasource.streaming.startOffset N earliest
hoodie.datasource.write.precombine.field N ts
hoodie.datasource.read.begin.instanttime Y N/A
hoodie.datasource.read.end.instanttime Y N/A
hoodie.datasource.read.paths Y N/A
hoodie.datasource.merge.type N payload_combine
hoodie.datasource.query.incremental.format N latest_state
hoodie.datasource.query.type N snapshot
hoodie.datasource.read.extract.partition.values.from.path N false
hoodie.datasource.read.file.index.listing.mode N lazy
hoodie.datasource.read.file.index.listing.partition-path-prefix.analysis.enabled N true
  • 写配置
配置项 是否必须 默认值 配置说明
hoodie.datasource.hive_sync.mode Y N/A
hoodie.datasource.write.partitionpath.field Y N/A
hoodie.datasource.write.precombine.field N ts
hoodie.datasource.write.recordkey.field Y N/A
hoodie.datasource.write.table.type N COPY_ON_WRITE
hoodie.datasource.write.insert.drop.duplicates N false 如果设置为 true,则插入时()过滤掉所有重复的记录
hoodie.sql.insert.mode N upsert
hoodie.sql.bulk.insert.enable N false
hoodie.datasource.write.table.name Y N/A
hoodie.datasource.write.operation N upsert
hoodie.datasource.write.payload.class N hoodie.datasource.write.payload.class
hoodie.datasource.write.partitionpath.urlencode N false
hoodie.datasource.hive_sync.partition_fields N N/A
hoodie.datasource.hive_sync.auto_create_database N true 自动创建不存在的数据库
hoodie.datasource.hive_sync.database N default
hoodie.datasource.hive_sync.table N unknown
hoodie.datasource.hive_sync.use_jdbc N hive
hoodie.datasource.hive_sync.password N hive
hoodie.datasource.hive_sync.enable N false
hoodie.datasource.hive_sync.ignore_exceptions N false
hoodie.datasource.hive_sync.use_jdbc N true
hoodie.datasource.hive_sync.jdbcurl N jdbc:hive2://localhost:10000 Hive metastore url
hoodie.datasource.hive_sync.metastore.uris N thrift://localhost:9083 Hive metastore url
hoodie.datasource.hive_sync.base_file_format N PARQUET
hoodie.datasource.hive_sync.support_timestamp N false
hoodie.datasource.meta.sync.enable N false
hoodie.clustering.inline N false
hoodie.datasource.write.partitions.to.delete Y N/A 逗号分隔的待删除分区列表,支持星号通配符
  • PreCommit Validator 配置
配置项 是否必须 默认值 配置说明
hoodie.precommit.validators N
hoodie.precommit.validators.equality.sql.queries N
hoodie.precommit.validators.inequality.sql.queries N
hoodie.precommit.validators.single.value.sql.queries N