elasticdump

发布时间 2023-07-20 21:27:54作者: 普里莫

linux安装esdump

操作机器elk01,机器环境请参考ELK搭建的elk01机器

# 1.npm安装esdump
[root@elk01 ~]# npm install elasticdump
npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated s3signed@0.1.0: This module is no longer maintained. It is provided as is.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142

added 129 packages in 6s


# 2.添加环境变量
[root@elk01 bin]# pwd
/root/node_modules/elasticdump/bin
[root@elk01 bin]# ll
total 28
-rwxr-xr-x 1 root root  4560 Jul 20 17:11 elasticdump
-rwxr-xr-x 1 root root 18586 Jul 20 17:11 multielasticdump
[root@elk01 bin]# vim /etc/profile.d/esdump.sh
PATH="/root/node_modules/elasticdump/bin:$PATH"

[root@elk01 bin]# source /etc/profile
PATH=/app/node/bin:/root/node_modules/elasticdump/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

elasticdump服务选项

elasticdump
--source=<source>:指定要导出数据的 Elasticsearch 实例的地址,默认为 http://localhost:9200。
--output=<output>:指定导出数据后保存的文件路径,默认为当前目录下的 dump.json 文件。
--index=<index>:指定要导出的索引名称,支持通配符匹配多个索引。
--type=<type>:指定要导出的文档类型,默认为所有文档类型。
--query=<query>:指定导出文档的查询条件。
--scroll-size=<size>:指定每次滚动查询返回的文档数量,默认为 100。
--max-requests=<requests>:指定最大并发请求数,默认为 8。
--bulk-size=<size>:指定每次批量写入到文件的文档数量,默认为 500。
--no-overwrite:禁止覆盖输出文件。
--verbose:显示详细输出信息。
--limit=<10000>限制每次导出的数量为 10000 条数据
--help:显示帮助信息。

导出

# 导出内容
[root@elk01 ~]# elasticdump --input=http://10.0.0.81:9200/666 --output=/tmp/666.txt
Thu, 20 Jul 2023 09:24:44 GMT | starting dump
(node:1331) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Thu, 20 Jul 2023 09:24:44 GMT | got 3 objects from source elasticsearch (offset: 0)
Thu, 20 Jul 2023 09:24:44 GMT | sent 3 objects to destination file, wrote 3
Thu, 20 Jul 2023 09:24:44 GMT | got 0 objects from source elasticsearch (offset: 3)
Thu, 20 Jul 2023 09:24:44 GMT | Total Writes: 3
Thu, 20 Jul 2023 09:24:44 GMT | dump complete

[root@elk01 ~]# ll /tmp/
total 4
-rw-r--r-- 1 root          root          333 Jul 20 17:24 666.txt
[root@elk01 ~]# cat /tmp/666.txt 
{"_index":"666","_type":"search","_id":"AYlthusg7SvMLS3628g0","_score":1,"_source":{"query":{"match_all":{}}}}
{"_index":"666","_type":"search","_id":"AYlthyuh7SvMLS3628g1","_score":1,"_source":{"query":{"match_all":{}}}}
{"_index":"666","_type":"search","_id":"AYlthy3p7SvMLS3628g2","_score":1,"_source":{"query":{"match_all":{}}}}

导入

  • 导入时可以使用原本的名字或其他的名字
[root@elk01 tmp]# elasticdump --input=/tmp/666.txt --output=http://10.0.0.81:9200/tomcat_66
Thu, 20 Jul 2023 09:48:02 GMT | starting dump
(node:1453) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Thu, 20 Jul 2023 09:48:02 GMT | got 3 objects from source file (offset: 0)
Thu, 20 Jul 2023 09:48:03 GMT | sent 3 objects to destination elasticsearch, wrote 3
Thu, 20 Jul 2023 09:48:03 GMT | got 0 objects from source file (offset: 3)
Thu, 20 Jul 2023 09:48:03 GMT | Total Writes: 3
Thu, 20 Jul 2023 09:48:03 GMT | dump complete

从一个es备份还原至另一个es

#备份data,mapping和analyzer类似
./elasticdump --input=http://ip1:9200/test_index1 --output=http://ip2:9200/test_index2 --type=data  --limit=50000

multielasticdump使用

选项

multielasticdump:多线程备份和还原 Elasticsearch 数据的工具。
--direction=dump:指定为数据备份操作。
--match='^ttt-.*$':匹配索引名称的正则表达式,只备份名称以 "ttt-" 开头的索引。
--input=http://127.0.0.1:9200:指定 Elasticsearch 节点的输入地址。
--ignoreType='mapping,settings,template':忽略备份索引的映射、设置和模板信息。
--output=/tmp/es_backup:指定备份数据的输出目录。

导出

[root@elk01 tmp]# multielasticdump --direction=dump --match='^ttt-.*$' --input=http://127.0.0.1:9200 --ignoreType='mapping,settings,template'  --output=/tmp/es_backup
Thu, 20 Jul 2023 11:54:43 GMT | We are performing : dump
Thu, 20 Jul 2023 11:54:43 GMT | options: {"debug":true,"parallel":1,"match":"^ttt-.*$","order":"asc","input":"http://127.0.0.1:9200","output":"/tmp/es_backup","scrollId":null,"scrollTime":"10m","scroll-with-post":false,"timeout":null,"limit":100,"offset":0,"size":-1,"direction":"dump","support-big-int":false,"big-int-fields":"","ignoreAnalyzer":true,"ignoreChildError":false,"ignoreData":false,"ignoreMapping":true,"ignoreSettings":true,"ignoreTemplate":true,"ignoreAlias":true,"ignoreIndex":true,"ignoreType":"mapping,settings,template","includeType":null,"interval":1000,"delete":false,"prefix":"","suffix":"","transform":null,"headers":null,"searchBody":null,"searchWithTemplate":null,"cert":null,"key":null,"pass":null,"ca":null,"tlsAuth":false,"input-cert":null,"input-key":null,"input-pass":null,"input-ca":null,"output-cert":null,"output-key":null,"output-pass":null,"output-ca":null,"httpAuthFile":null,"concurrency":1,"carryoverConcurrencyCount":true,"intervalCap":5,"concurrencyInterval":5000,"overwrite":false,"fsCompress":false,"awsChain":false,"awsAccessKeyId":null,"awsSecretAccessKey":null,"awsIniFileProfile":null,"awsService":null,"awsRegion":null,"awsUrlRegex":null,"s3AccessKeyId":null,"s3SecretAccessKey":null,"s3Region":null,"s3Endpoint":null,"s3SSLEnabled":true,"s3ForcePathStyle":false,"s3Compress":false,"s3ServerSideEncryption":null,"s3SSEKMSKeyId":null,"s3ACL":null,"quiet":false}
Thu, 20 Jul 2023 11:54:43 GMT [debug] | GET /_aliases
(node:1747) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Thu, 20 Jul 2023 11:54:43 GMT [debug] | GET /_aliases -> 200 OK
Thu, 20 Jul 2023 11:54:43 GMT | dumping http://127.0.0.1:9200/ttt-2023.07.13 to /tmp/es_backup/ttt-2023.07.13.json
Thu, 20 Jul 2023 11:54:43 GMT [debug] | fork: /root/node_modules/elasticdump/bin/elasticdump --type=data,--input=http://127.0.0.1:9200/ttt-2023.07.13,--output=/tmp/es_backup/ttt-2023.07.13.json,--scrollId=null,--scrollTime=10m,--limit=100,--offset=0,--size=-1,--searchBody=null,--searchWithTemplate=null,--support-big-int=false,--big-int-fields=,--headers=null,--cert=null,--key=null,--pass=null,--ca=null,--tlsAuth=false,--input-cert=null,--input-key=null,--input-pass=null,--input-ca=null,--output-cert=null,--output-key=null,--output-pass=null,--output-ca=null,--httpAuthFile=null,--concurrency=1,--carryoverConcurrencyCount=true,--intervalCap=5,--concurrencyInterval=5000,--overwrite=false,--fsCompress=false,--awsChain=false,--awsAccessKeyId=null,--awsSecretAccessKey=null,--awsIniFileProfile=null,--awsService=null,--awsRegion=null,--awsUrlRegex=null,--s3AccessKeyId=null,--s3SecretAccessKey=null,--s3Region=null,--s3Endpoint=null,--s3SSLEnabled=true,--s3ForcePathStyle=false,--s3Compress=false,--s3ServerSideEncryption=null,--s3SSEKMSKeyId=null,--s3ACL=null,--quiet=false,--prefix=,--suffix=,--scroll-with-post=false
Thu, 20 Jul 2023 11:54:44 GMT | starting dump
(node:1754) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Thu, 20 Jul 2023 11:54:44 GMT | got 1 objects from source elasticsearch (offset: 0)
Thu, 20 Jul 2023 11:54:44 GMT | sent 1 objects to destination file, wrote 1
Thu, 20 Jul 2023 11:54:44 GMT | got 0 objects from source elasticsearch (offset: 1)
Thu, 20 Jul 2023 11:54:44 GMT | Total Writes: 1
Thu, 20 Jul 2023 11:54:44 GMT | dump complete
Thu, 20 Jul 2023 11:54:44 GMT |  dumping all done 
Thu, 20 Jul 2023 11:54:44 GMT |  bye 

[root@elk01 ~]# ll /tmp/es_backup/
total 4
-rw-r--r-- 1 root root 122 Jul 20 19:54 ttt-2023.07.13.json

将ES索引及其所有类型备份到文件夹中

multielasticdump direction = dump match ='^.*$'  input = http://127.0.0.1:9200   output =/tmp/666

--ignoreType='mapping,settings,template'

--ignoreType='mapping,settings,template',在执行multielasticdump的过程中,将会忽略索引的映射文件、设置文件和模板文件。

multielasticdump --direction=dump --match='^tomcat-.*$' --input=http://127.0.0.1:9200 --output=/tmp/es_backup
[root@elk01 es_backup]# ll
total 16
-rw-r--r-- 1 root root 4067 Jul 20 20:25 tomcat_log-2023.07.20.json
-rw-r--r-- 1 root root 1305 Jul 20 20:26 tomcat_log-2023.07.20.mapping.json
-rw-r--r-- 1 root root  115 Jul 20 20:26 tomcat_log-2023.07.20.settings.json
-rw-r--r-- 1 root root 1034 Jul 20 20:26 tomcat_log-2023.07.20.template.json

tomcat_log-2023.07.20.json:这是数据文件,其中包含了要导入到Elasticsearch中的实际数据。
tomcat_log-2023.07.20.mapping.json:这是映射文件,它定义了索引中字段的映射关系,包括字段类型、分词器等。
tomcat_log-2023.07.20.settings.json:这是设置文件,它包含了与索引相关的配置信息,如分片数、副本数等。
tomcat_log-2023.07.20.template.json:这是模板文件,它定义了创建索引时使用的模板。

使用multielasticdump进行多个索引还原操作

multielasticdump --direction=load --input=/tmp/es_backup --output=http://127.0.0.1:9200
multielasticdump:多线程备份和还原 Elasticsearch 数据的工具。
--direction=load:指定为数据加载操作。
--input=/tmp/es_backup:指定备份数据的输入目录。
--output=http://127.0.0.1:9200:指定 Elasticsearch 节点的输出地址。

添加定时任务

# 备份前第七天的日志到/tmp/es_backup下
vim /root/es.sh
#!/bin/bash
datetime=`date '+%Y.%m.%d' -d '-7 day'`
/root/node_modules/elasticdump/bin/multielasticdump --direction=dump --match="^.*-${datetime}$" --input=http://127.0.0.1:9200 --output=/tmp/es_backup

# 天加到定时任务里,每天凌晨两点备份
[root@elk01 ~]# crontab -e
00 02 * * * /usr/bin/sh /root/es.sh