KingbaseES V8R6集群运维系列 -- 修改ssh通信为 sys_securecmdd 通信

发布时间 2023-06-06 15:38:02作者: KINGBASE研究院

一、适用于:

本文档使用于KingbaseES V008R006版本。

二、关于SYS_SECURECMDD:

sys_securecmdd是KingbaseES集群自带的工具,集群监控、管理集群时通过sys_securecmdd安全执行命令而不使用ssh服务。

sys_securecmdd主要包含以下文件:

服务端 sys_securecmdd 默认监听8890端口,接受客户端连接。
sys_secureftp 服务端调用,用于接收文件。
sys_HAscmdd.sh 脚本,管理服务端。
客户端 sys_securecmd 客户端,用于连接服务端。
密钥文件 accept_hosts 免密文件
key_file 私钥文件
其他文件 securecmdd_config 服务端配置文件
securecmd_config 客户端配置文件
sys_HAscmdd.conf 脚本配置文件
securecmdd.service 服务模板文件,服务端可以使用此文件 注册为服务。
依赖库文件 libcrypto.so.10 依赖openssl编译,为了不同环境能够 使用,需要带上编译使用的库文件。

sys_HAscmdd.conf是sys_securecmdd的配置文件,其中参数说明如下:

参数名称 描述 取值约束
start_method 启动sys_securecmdd进程并保证进程 高可用的方式。systemd,通用机集群 默认值,通过service服务启动 sys_securecmdd;crontab,专用机 集群默认值,通过crond服务定时启动 sys_securecmdd。 crontab, systemd 默认为crontab。
scmd_port 进程sys_securecmdd的监听端口,修 改后,需要使用 sys_HAscmdd.sh脚 本初始化。 INT,默认8890。

三、安装部署SYS_SECURECMDD服务:

进行安装部署SYS_SECURECMDD服务期间,不要停止数据库。

1. 部署SYS_SECURECMDD服务:

1.1 查询服务器防火墙是否开启:

所有的节点执行此操作:

systemctl status firewalld.service
# 如果active状态是running,表示防火墙是开启的。 
Active: active (running)

# 防火墙开启的话需要添加对应策略
# 其中10046是为sys_securecmdd服务预留的端口
firewall-cmd --permanent --add-port=10046/tcp
firewall-cmd --permanent --add-port=10046/udp
firewall-cmd --reload

# 添加完策略后可以使用以下命令查看是否生效
# 如果有添加的端口输出说明是生效的
firewall-cmd --list-port
54321/tcp 54321/udp 10046/tcp 10046/udp

1.2 上传securecmdd.zip到集群所有节点:

# zip包默认路径
../V008R006C007B0012/ClientTools/guitools/DeployTools/zip/Lin64/

$ ls -l
total 2260
drwxrwxr-x.  3 kes_v8r6c7b12 kes_v8r6c7b12      83 Feb 27 16:04 cluster
-rw-r--r--.  1 kes_v8r6c7b12 kes_v8r6c7b12 2115099 Mar  1 14:22 securecmdd.zip

# scp securecmdd.zip到node2节点
$ scp securecmdd.zip kes_v8r6c7b12@node2:~
The authenticity of host 'node2 (192.168.10.43)' can't be established.
securecmdd.zip                             100% 2066KB  14.8MB/s   00:00    

1.3 解压securecmdd.zip并安装securecmdd:

集群所有节点都执行以下操作:

# 解压securecmdd.zip包
$ unzip securecmdd.zip 
Archive:  securecmdd.zip
   creating: securecmdd/
   creating: securecmdd/lib/
  inflating: securecmdd/lib/libcrypto.so.10  
  inflating: securecmdd/lib/libssl.so.10  
   creating: securecmdd/bin/
  inflating: securecmdd/bin/sys_securecmd  
  inflating: securecmdd/bin/sys_secureftp  
  inflating: securecmdd/bin/sys_HAscmdd.sh  
  inflating: securecmdd/bin/sys_securecmdd  
   creating: securecmdd/share/
  inflating: securecmdd/share/sys_HAscmdd.conf  
  inflating: securecmdd/share/key_file  
  inflating: securecmdd/share/securecmdd_config  
  inflating: securecmdd/share/securecmdd.service  
  inflating: securecmdd/share/securecmd_config  
  inflating: securecmdd/share/accept_hosts

# 修改sys_HAscmdd.conf更改默认端口8890为10046
# sys_HAscmdd.conf配置文件在securecmdd/share目录下
vi sys_HAscmdd.conf
scmd_port=10046

# 执行sys_HAscmdd.sh init进行初始化
# 如果出现以下错误,需要切换为root用户执行
$ sys_HAscmdd.sh init
Only execute by root, current user is kes_v8r6c7b12

# init成功
 ./sys_HAscmdd.sh init
successfully initialized the sys_securecmdd, please use "./sys_HAscmdd.sh start" to start the sys_securecmdd

# 使用./sys_HAscmdd.sh start启动
 ./sys_HAscmdd.sh start
Created symlink /etc/systemd/system/multi-user.target.wants/securecmdd.service → /etc/systemd/system/securecmdd.service.

# 查看是否正常启动
 systemctl status securecmdd
● securecmdd.service - KingbaseES - sys_securecmdd daemon
   Loaded: loaded (/etc/systemd/system/securecmdd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-03-01 14:55:36 CST; 12s ago
 Main PID: 39535 (sys_securecmdd)
    Tasks: 1 (limit: 12498)
   Memory: 668.0K
   CGroup: /system.slice/securecmdd.service
           └─39535 sys_securecmdd: /home/kes_v8r6c7b12/securecmdd/bin/sys_securecmdd -f /etc/.kes/securecmdd_config [listener] 0 of 128-256 startups

Mar 01 14:55:36 node1 systemd[1]: Started KingbaseES - sys_securecmdd daemon.

# 测试连接是否正常
./sys_securecmd kes_v8r6c7b12@192.168.10.43 date
Wed Mar  1 15:02:29 CST 2023
./sys_securecmd kes_v8r6c7b12@192.168.10.40 date
Wed Mar  1 15:02:38 CST 2023

测试通过后,至此完成securecmdd的安装。

sys_securecmdd安装完成后,会在root、kingbase用户目录生成.es目录,包含以下文件(.es目录的文档建议不要修改):

key_file是sys_securecmdd服务私钥文件。

accept_hosts是sys_securecmdd服务密钥文件(集群节点互信使用)。

# root用户目录
[root@node2 ~]# ls -l .es/
total 8
-rw------- 1 root root  381 Mar  3 14:07 accept_hosts
-rw------- 1 root root 1675 Mar  3 14:07 key_file

# 数据库用户目录
[root@node2 ~]# ls -l /home/kes_v8r6c7b12/.es/
total 8
-rw------- 1 kes_v8r6c7b12 kes_v8r6c7b12  381 Mar  3 14:07 accept_hosts
-rw------- 1 kes_v8r6c7b12 kes_v8r6c7b12 1675 Mar  3 14:07 key_file
[root@node2 ~]# 

修改accept_hosts文件会导致集群节点互信失效,失效后的处理:

在所有的节点执行以下操作:

# 停止sys_securecmdd服务
./sys_HAscmdd.sh stop
# 重新初始化sys_securecmdd服务
./sys_HAscmdd.sh init
# 启动sys_securecmdd服务
./sys_HAscmdd.sh start

# 测试节点连通性
[root@node2 bin]# ./sys_securecmd root@node1 date
Fri Mar  3 14:07:33 CST 2023
[root@node2 bin]# ./sys_securecmd root@node2 date
Fri Mar  3 14:07:36 CST 2023
[root@node2 bin]# ./sys_securecmd root@192.168.10.40 date
Fri Mar  3 14:07:45 CST 2023
[root@node2 bin]# ./sys_securecmd root@192.168.10.43 date
Fri Mar  3 14:07:48 CST 2023

四、修改数据库集群使用SYS_SECURECMDD通信

1. 修改repmgr.conf配置文件使用SYS_SECURECMDD通信:

在集群所有节点进行以下操作:

# 修改repmgr.con文件里面use_scmd=off为on
# use_scmd=off 是不使用SYS_SECURECMDD通信,使用系统SSH进行通信。use_scmd=on 使用SYS_SECURECMDD通信
use_scmd=on
# 修改scmd_options 选项里面端口为sys_HAscmdd.conf文件里面scmd_port=10046
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 10046'

2. 修改完成后,使用sys_monitor.sh重启数据库集群。

[kes_v8r6c7b12@node1 ~]$ sys_monitor.sh restart
2023-03-03 14:19:38 Ready to stop all DB ...
Service process "node_export" was killed at process 9578
Service process "postgres_ex" was killed at process 9579
Service process "node_export" was killed at process 1344
Service process "postgres_ex" was killed at process 1345
2023-03-03 14:19:42 begin to stop repmgrd on "[192.168.10.40]".
2023-03-03 14:19:42 repmgrd on "[192.168.10.40]" stop success.
2023-03-03 14:19:42 begin to stop repmgrd on "[192.168.10.43]".
2023-03-03 14:19:43 repmgrd on "[192.168.10.43]" stop success.
2023-03-03 14:19:43 begin to stop DB on "[192.168.10.43]".
waiting for server to shut down.... done
server stopped
2023-03-03 14:19:43 DB on "[192.168.10.43]" stop success.
2023-03-03 14:19:43 begin to stop DB on "[192.168.10.40]".
waiting for server to shut down.... done
server stopped
2023-03-03 14:19:43 DB on "[192.168.10.40]" stop success.
2023-03-03 14:19:44 Done.
2023-03-03 14:19:44 Ready to start all DB ...
2023-03-03 14:19:44 begin to start DB on "[192.168.10.40]".
waiting for server to start.... done
server started
2023-03-03 14:19:44 execute to start DB on "[192.168.10.40]" success, connect to check it.
2023-03-03 14:19:45 DB on "[192.168.10.40]" start success.
2023-03-03 14:19:45 Try to ping trusted_servers on host 192.168.10.40 ...
2023-03-03 14:19:48 Try to ping trusted_servers on host 192.168.10.43 ...
2023-03-03 14:19:50 begin to start DB on "[192.168.10.43]".
waiting for server to start.... done
server started
2023-03-03 14:19:51 execute to start DB on "[192.168.10.43]" success, connect to check it.
2023-03-03 14:19:52 DB on "[192.168.10.43]" start success.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.10.40 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.10.43 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2023-03-03 14:19:52 The primary DB is started.
2023-03-03 14:19:52 begin to start repmgrd on "[192.168.10.40]".
[2023-03-03 14:19:53] [NOTICE] using provided configuration file "/home/kes_v8r6c7b12/cluster/kingbase/etc/repmgr.conf"
[2023-03-03 14:19:53] [NOTICE] redirecting logging output to "/home/kes_v8r6c7b12/cluster/kingbase/log/hamgr.log"

2023-03-03 14:19:54 repmgrd on "[192.168.10.40]" start success.
2023-03-03 14:19:54 begin to start repmgrd on "[192.168.10.43]".
[2023-03-03 14:19:55] [NOTICE] using provided configuration file "/home/kes_v8r6c7b12/cluster/kingbase/etc/repmgr.conf"
[2023-03-03 14:19:55] [NOTICE] redirecting logging output to "/home/kes_v8r6c7b12/cluster/kingbase/log/hamgr.log"

2023-03-03 14:19:56 repmgrd on "[192.168.10.43]" start success.
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 10436 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 5932  | no      | 1 second(s) ago    
[2023-03-03 14:19:58] [NOTICE] redirecting logging output to "/home/kes_v8r6c7b12/cluster/kingbase/log/kbha.log"

[2023-03-03 14:20:00] [NOTICE] redirecting logging output to "/home/kes_v8r6c7b12/cluster/kingbase/log/kbha.log"

2023-03-03 14:20:01 Done.
[kes_v8r6c7b12@node1 ~]$ 

至此,集群通信切换完成。

五、验证是否切换成功:

使用KingbaseES备份进行验证,集群使用的通信服务:

执行以下命令,观察备份执行输出:

sh -x sys_backup.sh init
# 如果有以下内容输出说明集群通信服务切换成功
sys_securecmd -q -n -o ConnectTimeout=30 -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey