Hudi学习笔记2 - Hudi配置之FlinkSQL配置

发布时间 2023-05-08 15:45:58作者: -见
配置项 是否必须 默认值 配置说明
path Y N/A Hudi表的 base path,如果不存在会创建,否则应是一个已初始化成功的 hudi 表
read.end-commit Y N/A
read.start-commit Y N/A
read.tasks Y N/A
write.tasks Y N/A
write.partition.format Y N/A 分区路径格式,仅 write.datetime.partitioning 为 true 是有效。两种默认值:1、yyyyMMddHH,当分区字段类型为 timestamp(3) WITHOUT TIME ZONE, LONG, FLOAT, DOUBLE, DECIMAL 是;2、yyyyMMdd,当分区字段类型为 DATE 和 INT 时。
write.bucket_assign.tasks Y N/A
archive.max_commits N 50
archive.min_commits N 40
cdc.enabled N false
changelog.enabled N false
clean.async.enabled N true
clean.policy N KEEP_LATEST_COMMITS 清理策略,可取值:KEEP_LATEST_COMMITS, KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_BY_HOURS.Default is KEEP_LATEST_COMMITS
clean.retain_commits N 30
clean.retain_file_versions N 5
clean.retain_hours N 24
clustering.async.enabled N false
clustering.delta_commits N 4
clustering.plan.partition.filter.mode N NONE 可取值:NONE, RECENT_DAYS, SELECTED_PARTITIONS, DAY_ROLLING
clustering.plan.strategy.class N org.apache.hudi.client.clustering.plan.strategy.FlinkSizeBasedClusteringPlanStrategy
clustering.tasks Y N/A
clustering.schedule.enabled N false
compaction.async.enabled N true
compaction.delta_commits N 5
compaction.delta_seconds N 3600
compaction.max_memory N 100
compaction.schedule.enabled N true
compaction.target_io N 512000
compaction.timeout.seconds N 1200
compaction.trigger.strategy N num_commits 可取值:num_commits, time_elapsed, num_or_time
hive_sync.conf.dir Y N/A
hive_sync.table_properties Y N/A
hive_sync.assume_date_partitioning N false 假定分区为 yyyy/mm/dd 格式
hive_sync.auto_create_db N true 自动创建不存在的数据库
hive_sync.db N default
hive_sync.table N unknown
hive_sync.table.strategy N ALL
hive_sync.enabled N false
hive_sync.file_format N PARQUET
hive_sync.jdbc_url N jdbc:hive2://localhost:10000
hive_sync.metastore.uris N '' Hive Metastore uris
hive_sync.mode N HMS
hive_sync.partition_fields N ''
hive_sync.password N hive
hive_sync.support_timestamp N true
hive_sync.use_jdbc N true
hive_sync.username N hive
hoodie.bucket.index.hash.field N
hoodie.bucket.index.num.buckets N 4
hoodie.datasource.merge.type N payload_combine
hoodie.datasource.query.type N snapshot
hoodie.datasource.write.hive_style_partitioning N false
hoodie.datasource.write.keygenerator.type N SIMPLE
hoodie.datasource.write.partitionpath.field N ''
hoodie.datasource.write.recordkey.field N uuid
hoodie.datasource.write.partitionpath.urlencode N false
hoodie.database.name Y N/A
hoodie.table.name Y N/A
hoodie.datasource.write.keygenerator.class Y N/A
index.bootstrap.enabled N false
index.global.enabled N true
index.partition.regex N *
index.state.ttl N 0.0
index.type N FLINK_STATE 取值有:BUCKET,FLINK_STATE,BLOOM,GLOBAL_BLOOM,GLOBAL_SIMPLE,HBASE,INMEMORY,SIMPLE,默认为 FLINK_STATE,详情参见 https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java 或者 https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
metadata.enabled N false
metadata.compaction.delta_commits N 10
partition.default_name N HIVE_DEFAULT_PARTITION
payload.class N org.apache.hudi.common.model.EventTimeAvroPayload
precombine.field N ts
read.streaming.enabled N false
read.streaming.skip_compaction N false
read.streaming.skip_clustering N false
read.utc-timezone N true
record.merger.impls N org.apache.hudi.common.model.HoodieAvroRecordMerger
record.merger.strategy N eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
table.type N COPY_ON_WRITE 指定表类型,可取:COPY_ON_WRITE 或 MERGE_ON_READ
write.batch.size N 256.0
write.commit.ack.timeout N -1
write.ignore.failed N false
write.insert.cluster N false
write.log.max.size N 1024
write.log_block.size N 128
write.log_block.size N 100 单位:MB
write.operation N upsert 可取值:false、insert 或 upsert,默认 false 表示允许重复
write.precombine N false 是否在 insert 和 upsert 前删除重复数据
write.parquet.block.size N 120
write.rate.limit N 0 每秒写入的数据条数。默认 0 表示没有限制
write.retry.interval.ms N 2000
write.retry.times N 3
write.sort.memory N 128 单位:MB
write.task.max.size N 1024.0 单位:MB