KingbaseES V8R6 集群运维案例 -- 集群备份到nfs共享存储初始化错误

发布时间 2023-09-20 14:20:26作者: KINGBASE研究院

案例说明:
在主备库建立nfs共享存储的文件系统,作为sys_rman备份的repo-path,在备库作为repo-path节点执行备份,出现数据库连接到'5432端口的错误',数据库实际的服务端口为54321。

适用版本:
KingbaseES V8R6

节点信息:

[kingbase@node102 bin]$ cat /etc/hosts
192.168.1.101   node101  # primary
192.168.1.102   node102  # standby
192.168.1.103   node103  # nfs server

一、配置NFS共享server

[root@node103 ~]# service nfs start
Redirecting to /bin/systemctl start  nfs.service

# 查看共享存储
[root@node103 ~]# cat /etc/exports
/shdata 192.168.1.101(rw,no_root_squash) 192.168.1.102(rw,no_root_squash)

[root@node103 ~]# exportfs -v
/shdata         192.168.1.101(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/shdata         192.168.1.102(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)

[root@node103 ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
......
/dev/mapper/myvg-mylv     20G  223M   20G   2% /shdata

# 共享文件系统属主和权限
[root@node103 ~]# id kingbase
uid=200(kingbase) gid=1001(kingbase) groups=1001(kingbase)

[root@node103 ~]# ls -lhd /shdata
drwxr-x--- 7 kingbase kingbase 4.0K Jan 12 17:43 /shdata

二、nfs client挂载共享文件系统

1、主库配置

[root@node101 ~]# showmount -e 192.168.1.103
Export list for 192.168.1.103:
/shdata 192.168.1.102,192.168.1.101

[root@node101 ~]# mkdir -p /data/kingbase/bk
[root@node101 ~]# chown kingbase.kingbase /data/kingbase/bk
[root@node101 ~]# mount 192.168.1.103:/shdata /data/kingbase/bk
[root@node101 ~]# ls -lhd /data/kingbase/bk
drwxr-x--- 2 kingbase kingbase 6 Apr 13 11:28 /data/kingbase/bk

2、备库配置

[root@node102 ~]#  showmount -e 192.168.1.103
Export list for 192.168.1.103:
/shdata 192.168.1.102,192.168.1.101

[root@node102 ~]# mkdir -p /data/kingbase/bk
[root@node102 ~]# mount 192.168.1.103:/shdata /data/kingbase/bk

[root@node102 ~]# ls -lhd /data/kingbase/bk
drwxr-x--- 2 kingbase kingbase 6 Apr 13 11:28 /data/kingbase/bk
[root@node102 ~]# ls -lh /data/kingbase/bk
total 0

三、集群节点信息

[kingbase@node102 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                     
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 4        | 0 bytes | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | primary | * running |          | default  | 100      | 5        |         | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

四、配置sys_backup.conf
在备库执行备份,并将备份存储到nfs共享的文件系统下:

[kingbase@node102 bin]$ cat sys_backup.conf |grep -v ^$|grep -v ^#
_target_db_style="cluster"
_one_db_ip="192.168.1.101"
_repo_ip="192.168.1.102"
_stanza_name="kingbase"
_os_user_name="kingbase"
_repo_path="/data/kingbase/bk/c7_bk"
_repo_retention_full_count=5
_crond_full_days=7
_crond_diff_days=0
_crond_incr_days=1
_crond_full_hour=2
_crond_diff_hour=3
_crond_incr_hour=4
_band_width=0
_os_ip_cmd="/sbin/ip"
_os_rm_cmd="/bin/rm"
_os_sed_cmd="/bin/sed"
_os_grep_cmd="/bin/grep"
_single_data_dir="/home/kingbase/cluster/R6HA/ha7/kingbase/data"
_single_bin_dir="/home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/bin"
_single_db_user="system"
_single_db_port="54321"
_use_scmd=off
_start_fast=y
_compress_type=none
_non_archived_space=1024

五、执行sys_backup.sh init

[kingbase@node102 bin]$ ./sys_backup.sh init
# pre-condition: check the non-archived WAL files
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)
ERROR: create stanza failed, check log file /home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/log/sys_rman_stanza-create.log

备份故障日志:

[kingbase@node102 bin]$ cat /home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/log/sys_rman_stanza-create.log
2023-04-13 11:41:50.915 P00   INFO: stanza-create command begin 2.27: --band-width=0 --config=/data/kingbase/bk/c7_bk/sys_rman.conf --exec-id=27638-be67f3b4 --log-level-console=info --log-level-file=info --log-path=/home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/log --log-subprocess --kb1-path=/home/kingbase/cluster/R6HA/ha7/kingbase/data --repo1-host=192.168.1.102 --repo1-host-config=/data/kingbase/bk/c7_bk/sys_rman.conf --repo1-host-user=kingbase --repo1-path=/data/kingbase/bk/c7_bk --stanza=kingbase
WARN: unable to check kb-1: [DbConnectError] unable to connect to 'dbname='test' port=5432': could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.KINGBASE.5432"?
ERROR: [056]: unable to find primary cluster - cannot proceed
2023-04-13 11:41:50.915 P00   INFO: stanza-create command end: aborted with exception [056]

六、解决方案

方案1:

Tips:
由于主备非repo节点,不存储备份,只存储sys_rman.conf文件,不占用磁盘空间,可以考虑在本地节点建立存储目录。
1、umount主库的nfs共享文件系统

[root@node101 ~]# umount /data/kingbase/bk
[root@node101 ~]# ls -lhd /data/kingbase/bk
drwxr-xr-x 3 kingbase kingbase 17 Mar 29 11:59 /data/kingbase/bk

2、执行备份初始化

[kingbase@node102 bin]$ ./sys_backup.sh init
# pre-condition: check the non-archived WAL files
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)
# create stanza and check...DONE
# initial first full backup...(maybe several minutes)
# initial first full backup...DONE
# Initial sys_rman OK.
'sys_backup.sh start' should be executed when need back-rest feature.
'sys_backup.sh start' will add CRONTAB items.
Or you can manual backup once with user-guide.

方案2:

1)nfs server共享文件系统创建子目录

[kingbase@node103 ~]$ mkdir -p /shdata/bk{1,2}
[kingbase@node103 ~]$ ls -lh /shdata
total 0
drwxrwxr-x 2 kingbase kingbase  6 Apr 13 16:12 bk1
drwxrwxr-x 2 kingbase kingbase  6 Apr 13 16:12 bk2

2)nfs client挂载共享子目录

主库:

[root@node101 ~]# mount 192.168.1.103:/shdata/bk1 /data/kingbase/bk
[root@node101 ~]# df -h
.......
192.168.1.103:/shdata/bk1   20G  193M   20G   1% /data/kingbase/bk

备库:

[root@node102 ~]# mount 192.168.1.103:/shdata/bk2 /data/kingbase/bk
[root@node102 ~]# df -h
Filesystem                 Size  Used Avail Use% Mounted on
......
192.168.1.103:/shdata/bk2   20G   32M   20G   1% /data/kingbase/bk

3)备库执行sys_backup.sh init

[kingbase@node102 bin]$ ./sys_backup.sh init
# pre-condition: check the non-archived WAL files
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)
# create stanza and check...DONE
# initial first full backup...(maybe several minutes)
# initial first full backup...DONE
# Initial sys_rman OK.
'sys_backup.sh start' should be executed when need back-rest feature.
'sys_backup.sh start' will add CRONTAB items.
Or you can manual backup once with user-guide.
You have mail in /var/spool/mail/kingbase

4)查看备份信息

Tips:
备库作为repo,存储了所有备份,主库只是存储了sys_rman.conf。

备库:

[root@node102 ~]# ls -lh /data/kingbase/bk
total 0
drwxrwxr-x 4 kingbase kingbase 53 Apr 13 16:15 c7_bk
[root@node102 ~]# ls -lh /data/kingbase/bk/c7_bk/
total 4.0K
drwxr-x--- 3 kingbase kingbase  21 Apr 13 16:15 archive
drwxr-x--- 3 kingbase kingbase  21 Apr 13 16:15 backup
-rw-rw-r-- 1 kingbase kingbase 850 Apr 13 16:15 sys_rman.conf

主库:

[root@node101 ~]#   ls -lh /data/kingbase/bk
total 0
drwxrwxr-x 2 kingbase kingbase 26 Apr 13 16:15 c7_bk
[root@node101 ~]#   ls -lh /data/kingbase/bk/c7_bk
total 4.0K
-rw-rw-r-- 1 kingbase kingbase 451 Apr 13 16:15 sys_rman.conf

七、总结
在集群环境下,如果备库作为repo-path以‘cluster’模式执行备份,将同时会在主备库都建立repo-path对应的目录,在主库的repo-path中写入sys_rman.conf,在备库的repo-path中存储备份和写入sys_rman.conf;
如果主备库使用是nfs共享存储,将在写入数据时出现问题,对nfs共享存储备份的支持还需要进一步支持和细化。