OLAP系列之分析型数据库clickhouse备份方式(五)

发布时间 2023-07-24 18:53:42作者: 杨梅冲

一、常见备份方式

1.1 备份方式

备份方式 特点
物理文件备份 对物理文件进行拷贝,备份期间禁止数据写入
dump数据导入导出   备份方式灵活,但备份速度慢
快照表备份 制作_bak表进行备份
FREEZE备份 表(分区表、非分区表)分区备份,可通过attach进行装载恢复
FETCH备份  ReplicaMergeTree引擎的分区表分区备份,可通过attach进行装载恢复
元数据备份 建表、建库语句备份

 

 

 

 

 

 

 

1.2 恢复方式:

恢复方式 特点
attach 通过对表分区的备份文件进行装载恢复,需要将备份文件放在对应的detached目录下

 

 

二、冷备

通过对物理文件的拷贝也可达到备份的效果,但是需要注意的是通过拷贝物理文件进行备份期间,要避免数据写入。

1.物理文件的备份主要分为两部分:
(1) 表数据文件备份,${datadir}/data下可对某个数据库、某张表、某个分区文件进行备份
(2) 元数据备份,${datadir}/metadata下可对某个数据库、某张表的建表语句进行备份

2. 备份数据库文件
复制clickhouse完整的数据目录,数据的目录路径是config.xml的path属性的值;
恢复也很简单,只需要将数据目录替换为备份的数据目录即可.

三、热备

 :) show databases;

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ system             │
│ test_log           │
└────────────────────┘
 :) use test_log;
 :) show tables;

┌─name───────┐
│ test_all   │
│ test_local │
│ user_all   │
└────────────┘
 :) select * from test_all;
# 192.168.12.91服务器数据:test_local
┌─id─┬─name─────┐
│  1 │ zhangsan │
│  2 │ lisi     │
│  7 │ yw       │
│  8 │ xc       │
└────┴──────────┘
# 192.168.12.90服务器数据:test_local
┌─id─┬─name───┐
│  3 │ wangm  │
│  4 │ lijing │
│  9 │ cx     │
│ 10 │ aa     │
│ 13 │ kkkk   │
└────┴────────┘
# 192.168.12.88服务器数据:test_local
┌─id─┬─name──────┐
│  5 │ zhangquan │
│  6 │ lihua     │
│ 11 │ bb        │
│ 12 │ acca      │
└────┴───────────┘

3.1 数据导入导出

3.1.1 通过查询语句将指定数据导出到tsv文件进行备份,备份数据比较灵活,并且可通过客户端工具进行数据导入恢复。

导出文件备份

# 导出:
[root@ ~]#clickhouse-client -h 192.168.12.90 --port 9000 --user=default --password='密码' -d test_log --query="select * from test_local" >test_local.tsv

# 导入
cat test_local.tsv | clickhouse-client --host 192.168.12.91 --port 9000 --user=default --password='密码' -d test_log --query='INSERT INTO test_local format TSV'

 

四、备份工具clickhouse-backup

下载地址:https://github.com/Altinity/clickhouse-backup/releases

它有以下特点:
1. 可以轻松创建和恢复所有或特定表的备份
2. 在文件系统上高效存储多个备份
3. 通过流压缩上传和下载
4. 支持增量备份在远程存储上
5. 与AWS、Azure、GCS、腾讯COS、FTP兼容
tar -xzvf clickhouse-backup-linux-amd64.tar.gz

mv build/linux/amd64/clickhouse-backup /usr/bin/
mkdir /etc/clickhouse-backup
chown -R clickhouse:clickhouse /etc/clickhouse-backup

配置文件配置:

# 查看默认配置文件命令
clickhouse-backup default-config

vim /etc/clickhouse-backup/config.xml
general:
  remote_storage: none  # 默认为none,如果通过sftp上传到远程服务器,需要这个参数这只为sftp
  disable_progress_bar: false
  backups_to_keep_local: 7  # 本地备份的个数,大于7的自动删除旧的备份,默认为0,不删除备份
  backups_to_keep_remote: 7 # 远程备份个数
  log_level: info
  allow_empty_backups: false
clickhouse:
  username: default
  password: "PeQLg45tJcxZk"
  host: 192.168.12.91
  port: 9000
  data_path: "/data/clickhouse/clickhouse/" # clickhouse数据目录
  skip_tables: # 不需要备份的库
  - system.*
  - default.*
  - INFORMATION_SCHEMA.*
  - information_schema.*
  timeout: 5m
  freeze_by_part: false
# 备份命令
[root@ backup]# clickhouse-backup create my_backup --config /etc/clickhouse-backup/config.xml 
2023/07/24 18:00:32.259794  info clickhouse connection prepared: tcp://192.168.12.91:9000 run ping logger=clickhouse
2023/07/24 18:00:32.260590  info clickhouse connection open: tcp://192.168.12.91:9000 logger=clickhouse
2023/07/24 18:00:32.260619  info SELECT metadata_path FROM system.tables WHERE database = 'system' AND metadata_path!='' LIMIT 1; logger=clickhouse
2023/07/24 18:00:32.263036  info SELECT name, engine FROM system.databases WHERE name NOT IN ('system','INFORMATION_SCHEMA','information_schema','_temporary_and_external_tables','default') logger=clickhouse
2023/07/24 18:00:32.264230  info SHOW CREATE DATABASE `pet_battle_dev` logger=clickhouse
2023/07/24 18:00:32.264811  info SHOW CREATE DATABASE `pet_battle_test` logger=clickhouse
2023/07/24 18:00:32.265394  info SHOW CREATE DATABASE `test02` logger=clickhouse
2023/07/24 18:00:32.266068  info SELECT name, count(*) as is_present FROM system.settings WHERE name IN (?, ?) GROUP BY name with args [show_table_uuid_in_table_create_query_if_not_nil display_secrets_in_show_and_select] logger=clickhouse
2023/07/24 18:00:32.267830  info SELECT name FROM system.databases WHERE engine IN ('MySQL','PostgreSQL','MaterializedPostgreSQL') logger=clickhouse
2023/07/24 18:00:32.271753  info    SELECT     countIf(name='data_path') is_data_path_present,     countIf(name='data_paths') is_data_paths_present,     countIf(name='uuid') is_uuid_present,     countIf(name='create_table_query') is_create_table_query_present,     countIf(name='total_bytes') is_total_bytes_present    FROM system.columns WHERE database='system' AND table='tables'   logger=clickhouse
2023/07/24 18:00:32.273981  info SELECT database, name, engine , data_paths , uuid , create_table_query , coalesce(total_bytes, 0) AS total_bytes   FROM system.tables WHERE is_temporary = 0 ORDER BY total_bytes DESC SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 logger=clickhouse
2023/07/24 18:00:32.289898  info SELECT metadata_path FROM system.tables WHERE database = 'system' AND metadata_path!='' LIMIT 1; logger=clickhouse
2023/07/24 18:00:32.291693  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='pet_battle_dev' AND table='test_tbl' GROUP BY database, table logger=clickhouse
2023/07/24 18:00:32.293367  info SELECT count() as cnt FROM system.columns WHERE database='system' AND table='functions' AND name='create_query' SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2023/07/24 18:00:32.294880  info SELECT name, create_query FROM system.functions WHERE create_query!='' logger=clickhouse
2023/07/24 18:00:32.296301  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2023/07/24 18:00:32.297124  info SELECT count() is_disk_type_present FROM system.columns WHERE database='system' AND table='disks' AND name='type' logger=clickhouse
2023/07/24 18:00:32.298539  info SELECT path, any(name) AS name, any(type) AS type FROM system.disks GROUP BY path logger=clickhouse
2023/07/24 18:00:32.299913  info SELECT count() is_parts_column_present FROM system.tables WHERE database='system' AND name='parts_columns' logger=clickhouse
2023/07/24 18:00:32.301299  info SELECT column, groupUniqArray(type) AS uniq_types FROM system.parts_columns WHERE active AND database=? AND table=? GROUP BY column HAVING length(uniq_types) > 1 with args [test02 test_local] logger=clickhouse
2023/07/24 18:00:32.303138  info ALTER TABLE `test02`.`test_local` FREEZE WITH NAME '2aac621edb25476096bd0b6abd3d0d51'; logger=clickhouse
2023/07/24 18:00:32.306048  info ALTER TABLE `test02`.`test_local` UNFREEZE WITH NAME '2aac621edb25476096bd0b6abd3d0d51' logger=clickhouse
2023/07/24 18:00:32.306607  info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [test02 test_local] logger=clickhouse
2023/07/24 18:00:32.308310  info done                      backup=my_backup logger=backuper operation=create table=test02.test_local
2023/07/24 18:00:32.308351  info SELECT column, groupUniqArray(type) AS uniq_types FROM system.parts_columns WHERE active AND database=? AND table=? GROUP BY column HAVING length(uniq_types) > 1 with args [pet_battle_dev test_tbl] logger=clickhouse
2023/07/24 18:00:32.310104  info ALTER TABLE `pet_battle_dev`.`test_tbl` FREEZE WITH NAME '3a4112e17f994030b5c1b929fcbc3398'; logger=clickhouse
2023/07/24 18:00:32.311419  info ALTER TABLE `pet_battle_dev`.`test_tbl` UNFREEZE WITH NAME '3a4112e17f994030b5c1b929fcbc3398' logger=clickhouse
2023/07/24 18:00:32.311824  info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [pet_battle_dev test_tbl] logger=clickhouse
2023/07/24 18:00:32.313312  info done                      backup=my_backup logger=backuper operation=create table=pet_battle_dev.test_tbl
2023/07/24 18:00:32.313472  warn supports only schema backup backup=my_backup engine=Distributed logger=backuper operation=create table=test02.test_all
2023/07/24 18:00:32.313500  info SELECT mutation_id, command FROM system.mutations WHERE is_done=0 AND database=? AND table=? with args [test02 test_all] logger=clickhouse
2023/07/24 18:00:32.314993  info done                      backup=my_backup logger=backuper operation=create table=test02.test_all
2023/07/24 18:00:32.315022  info SELECT value FROM `system`.`build_options` where name='VERSION_DESCRIBE' logger=clickhouse
2023/07/24 18:00:32.316068  info done                      backup=my_backup duration=56ms logger=backuper operation=create

# create 指定备份名称,如果不指定备份名称,默认格式为:2023-07-24T10-21-13 伦敦时间
clickhouse-backup create --config /etc/clickhouse-backup/config.xml

# 默认备份文件备份到数据目录下的backup目录下
/data/clickhouse/clickhouse/backup/

# 查看已生成的备份
clickhouse-backup --config /etc/confighouse-backup/config.xml list

# 恢复数据库
clikhouse restore my_backup

# 查看可以备份的表
clickhouse-backup --config  config.xml list
# 定时备份脚本

#!/bin/bash
BACKUP_NAME="backup_$(date +%Y%m%d%H)"

/usr/bin/clickhouse-backup create $BACKUP_NAME --config /etc/clickhouse-backup/config.xml