ceph pg incomplete修复

发布时间 2023-04-24 17:18:07作者: XU-NING

1.查看incomplete的pgs

ceph health detail | grep incomplete

pg 2.ef is incomplete, acting [10,9,4]

pg 2.a9 is incomplete, acting [10,4,3]

pg 2.a7 is incomplete, acting [10,0,4]

pg 3.99 is incomplete, acting [10,1,4]

2.保存pg query 

ceph pg 2.ef query > 2.ef.query

3.查看pg大小(acting 的osd上)

du -sh /var/lib/ceph/osd/ceph-x/pg.id_head

如果acting上的osd中pg大小全为0,则需要查找ceph pg 2.ef query中probing_osds

C:\Users\wanghj\AppData\Local\Temp\enhtmlclip\Image.png


4.从含有完整数据的osd上导出有问题的pg(数据完整只是人工认为比较完整,也就是数据量最大的,下面假设osd10上的数据是完整的)

ceph_objectstore_tool --op export --pgid <pg.id> --data-path /mnt/old --journal-path /mnt/old/journal --file <pg.id>.export

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10/ --journal-path /var/lib/ceph/osd/ceph-10/journal --pgid 2.ef --op export --file 2.ef.export

5.设置标记

ceph osd set noout

ceph osd set pause

ceph osd set noout

ceph osd set norebalance

ceph osd set norecover

6.把相关osd 关闭

7.移出有问osd上的目录,比如osd 9上2.ef 这个pg目录为0(所有有问题的目录全部移出)

mv /var/lib/ceph/osd/ceph-9/current/2.ef_* /home/admin/

8.导入第四步导出的pg

scp 2.88.export node-*/root/

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --journal-path /var/lib/ceph/osd/ceph-4/journal --pgid 2.88 --op import --file 2.88.export

9.标记完成(主osd上)

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10/ --journal-path /var/lib/ceph/osd/ceph-10/journal --pgid 2.ef --op mark-complete

10.启动osd,去除标志