postgresql13.6流复制主从同步

发布时间 2023-12-22 14:51:02作者: xuege

安装好PG之后,如果给线上业务使用,至少得弄个主从,否则主库挂了,就可能导致业务长时间受影响。下面来看看怎么配置PG主从。

环境描述:

  • PG版本:13.6
  • 主库IP:172.16.103.225
  • 从库IP:172.16.103.226
  • 数据目录:/postgresql/pgdata

搭建主从(流复制)步骤:

  1. 在主库创建同步用户,并授权, (最好提前做好主备切换的权限配置,提前规划好pg_hba.conf, 以便后面主备切换)
  2. 在从库(下面也称standby)上安装好PG软件,安装跟主库一样(保持目录一致性),删除掉pgdata(数据目录)和archive(归档目录)下的文件,注意这里是操作从库的,不要干了主库!!
  3. 在从库上使用pg_basebackup命令去搭建同步
  4. 启动从库

具体操作命令

  1. 主库上创建用户,授权
postgres=# CREATE ROLE replica login replication encrypted password 'repl_6534';
CREATE ROLE
  1. 配置pg_hba.conf, 允许从库访问
$ vim pg_hba.conf 
host    replication     all             172.16.103.225/32         md5
host    replication     all             172.16.103.226/32         md5
postgres=# select pg_reload_conf();    # 重载一下配置文件
  1. 从库上执行pg_basebackup去搭建同步
$ pg_basebackup -h 172.16.103.225 -U replica -p 5432 -F p   -X s -v -P -R -D /postgresql/pgdata
Password: 
pg_basebackup: initiating base backup, waiting for checkpoint to complete
WARNING:  skipping special file "./.s.PGSQL.5432"
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/1B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_28111"
WARNING:  skipping special file "./.s.PGSQL.5432"pgdata/backup_label    )
32338/32338 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 0/1B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
  1. 启动从库
$ pg_ctl start -l /postgresql/pg_log/pg.log 
waiting for server to start.... done
server started

查看数据库同步信息

1. 操作系统上进程

主库上会多一个walsender的进程

$ ps -ef | grep postgres | grep send
pgsql     7854 18709  0 11:19 ?        00:00:00 postgres: walsender replica 172.16.103.226(12836) streaming 0/1C0003E0

从库上会有一个walreceiver的进程

$ ps -ef | grep postgres
pgsql    30471     1  0 11:19 ?        00:00:00 /postgresql/pg13/bin/postgres
pgsql    30472 30471  0 11:19 ?        00:00:00 postgres: startup recovering 00000001000000000000001C
pgsql    30473 30471  0 11:19 ?        00:00:00 postgres: checkpointer 
pgsql    30474 30471  0 11:19 ?        00:00:00 postgres: background writer 
pgsql    30475 30471  0 11:19 ?        00:00:00 postgres: stats collector 
pgsql    30476 30471  0 11:19 ?        00:00:00 postgres: walreceiver 

2. 数据库表中的信息

postgres=# SELECT pg_is_in_recovery();
主库上返回f, 从库上返回t。

在主库上,查询表pg_stat_replication 可以看到从库同步的状态

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
pid              | 7854
usesysid         | 16436
usename          | replica
application_name | walreceiver
client_addr      | 172.16.103.226
client_hostname  | 
client_port      | 12836
backend_start    | 2023-12-22 11:19:53.781303+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/1C0003E0
write_lsn        | 0/1C0003E0
flush_lsn        | 0/1C0003E0
replay_lsn       | 0/1C0003E0
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2023-12-22 11:24:24.29878+08

从库上可以查看 pg_stat_wal_receiver表查看同步状态

postgres=# select * from pg_stat_wal_receiver;
pid                   | 30476
status                | streaming
receive_start_lsn     | 0/1C000000
receive_start_tli     | 1
written_lsn           | 0/1C0003E0
flushed_lsn           | 0/1C000000
received_tli          | 1
last_msg_send_time    | 2023-12-22 11:32:25.206129+08
last_msg_receipt_time | 2023-12-22 11:32:25.206617+08
latest_end_lsn        | 0/1C0003E0
latest_end_time       | 2023-12-22 11:19:53.783906+08
slot_name             | 
sender_host           | 172.16.103.225
sender_port           | 5432
conninfo              | user=replica password=******** channel_binding=disable dbname=replication host=172.16.103.225 port=5432 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

需要注意的是,并不是说只有主库的pg_stat_replication有信息,从库一定没有,假如是一个级联的环境的话,那从库(它是某个从库的主库)也是有输出的。
pg_stat_replication表的几个字段的解释如下:

  • sent_lsn: 发送到standby的最后一个wal位置(还未write)。
  • write_lsn: 写到standby操作系统上的最后一个wal位置(还未flush)。
  • flush_lsn: 刷新到standby磁盘上的最后一个wal位置(到达disk)。
  • replay_lsn: standby上重放的最后的事务日志位置(此时用户可见)。
select pg_current_wal_lsn(); 

是当前的wal_lsn, 也就是可以说主备延迟的字节数可以用pg_current_wal_lsn() - replay_lsn 得到。

主备切换步骤

  1. 关闭主库
  2. 从库上执行 pg_ctl promote 提升为主库
  3. 应用配置重连新主库(原备库)【也可以提前规划好VIP, 把VIP切换到新主库】
  4. 原主库上创建standby.signal文件, 配置postgresql.auto.conf文件, 以同步新主库
  5. 启动原主库,原主库变成了新从库
    注意:异步情况下,如果主库宕机,切换从库的话,可能会有数据丢失。

主备切换具体操作命令

  1. 模拟主库宕机,直接停掉, 主库执行
$ pg_ctl stop -m immediate
waiting for server to shut down.... done
server stopped

此时,从库上执行select * from pg_stat_wal_receiver; 的话,已经没有输出。 日志会有同步报错信息。

  1. 从库上执行
$ pg_ctl promote  
waiting for server to promote.... done
server promoted

查看数据库日志的话,可以看到以下信息

2023-12-22 11:45:44.835 CST [30472] LOG:  received promote request
2023-12-22 11:45:44.835 CST [30472] LOG:  redo done at 0/1C0003A8
2023-12-22 11:45:44.837 CST [30472] LOG:  selected new timeline ID: 2
2023-12-22 11:45:44.883 CST [30472] LOG:  archive recovery complete
2023-12-22 11:45:44.889 CST [30471] LOG:  database system is ready to accept connections

原从库上的standby.signal文件也已经没了。
3. 原主库上创建standby.signal文件,把它变成新主库的从库

$ cd /postgresql/pgdata            # 这个是我DB的数据目录,上面所说的postgresql.conf/postgresql.auto.conf/pg_hba.conf 均在该目录下
$ touch standby.signal
  1. 参考原来的从库的postgresql.auto.conf文件,配置老主库从新从库
$ vim postgresql.auto.conf 
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=repl_6534 channel_binding=disable host=172.16.103.226 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
  1. 启动原主库(新从库)
$ pg_ctl start
$ psql
psql (13.6)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
pid                   | 5776
status                | streaming
receive_start_lsn     | 0/1C000000
receive_start_tli     | 2
written_lsn           | 0/1C0026C8
flushed_lsn           | 0/1C000000
received_tli          | 2
last_msg_send_time    | 2023-12-22 14:21:05.444004+08
last_msg_receipt_time | 2023-12-22 14:21:05.444451+08
latest_end_lsn        | 0/1C0026C8
latest_end_time       | 2023-12-22 14:20:35.378922+08
slot_name             | 
sender_host           | 172.16.103.226
sender_port           | 5432
conninfo              | user=replica password=******** channel_binding=disable dbname=replication host=172.16.103.226 port=5432 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

postgres=#