搭建pg-ha集群-526互联

[1] https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#5_Minutes_to_Simple_Replication
[2] https://wiki.postgresql.org/wiki/Hot_Standby
[3] postgresql96 internal remark, HP
[4] https://www.postgresql.org/docs/13/runtime-config-replication.html, 配置全在这

hot standby

本节参考[2].

primary:

initdb -D /mnt/sdb/pg13data/12345

# vi postgresql.conf
port = 12345
wal_level = hot_standby # In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are still accepted but mapped to replica.  -- [PostgreSQL: Documentation: 14: 20.5. Write Ahead Log](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL)
archive_mode = on
archive_command = 'cp -i %p /mnt/sdb/pg13data/12345-archive/%f' # -i, 如果真有重复会怎样todo
max_wal_senders = 3

mkdir /mnt/sdb/pg13data/12345-archive
pg_ctl -D /mnt/sdb/pg13data/12345 -l /mnt/sdb/pg13data/12345.log start

standby, 在同一机器:

# https://www.postgresql.org/docs/13/app-pgbasebackup.html
pg_basebackup -D /mnt/sdb/pg13data/12345-standby -w -R --wal-method=stream --dbname="host=127.0.0.1 user=wang port=12345"

# vi postgresql.conf
hot_standby = on
port = 12346
# standby_mode = 'on' # The standby_mode setting has been removed. A standby.signal file in the data directory is used instead. See Standby Server Operation for details. -- [PostgreSQL: Documentation: 13: O.1. recovery.conf file merged into postgresql.conf](https://www.postgresql.org/docs/13/recovery-config.html)
restore_command = 'cp -i /mnt/sdb/pg13data/12345-archive/%f %p'

pg_ctl -D /mnt/sdb/pg13data/12345-standby/ -l /mnt/sdb/pg13data/12345-standby.log start

psql -p 12346 postgres
postgres=# select datid,pid,application_name,client_addr,wait_event_type,wait_event,state,query from pg_stat_activity where application_name <> 'psql';
 datid |  pid  | application_name | client_addr | wait_event_type |    wait_event     | state | query 
-------+-------+------------------+-------------+-----------------+-------------------+-------+-------
       | 15927 |                  |             | Activity        | RecoveryWalStream |       | 
       | 15932 |                  |             | Activity        | BgWriterHibernate |       | 
       | 15930 |                  |             | Activity        | CheckpointerMain  |       | 
       | 15937 |                  |             | Activity        | WalReceiverMain   |       | 
(4 rows)

standby log:

2022-02-09 17:50:59.764 CST [14490] LOG:  database system is shut down
2022-02-09 17:51:06.168 CST [15314] LOG:  starting PostgreSQL 13.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2022-02-09 17:51:06.169 CST [15314] LOG:  listening on IPv6 address "::1", port 12346
2022-02-09 17:51:06.169 CST [15314] LOG:  listening on IPv4 address "127.0.0.1", port 12346
2022-02-09 17:51:06.176 CST [15314] LOG:  listening on Unix socket "/tmp/.s.PGSQL.12346"
2022-02-09 17:51:06.183 CST [15315] LOG:  database system was shut down in recovery at 2022-02-09 17:50:59 CST
cp: cannot stat ‘/mnt/sdb/pg13data/12345-archive/00000002.history’: No such file or directory
2022-02-09 17:51:06.186 CST [15315] LOG:  entering standby mode
2022-02-09 17:51:06.198 CST [15315] LOG:  restored log file "000000010000000000000004" from archive
2022-02-09 17:51:06.236 CST [15315] LOG:  consistent recovery state reached at 0/40000A0
2022-02-09 17:51:06.236 CST [15315] LOG:  redo starts at 0/40000A0
2022-02-09 17:51:06.237 CST [15314] LOG:  database system is ready to accept read only connections
2022-02-09 17:51:06.248 CST [15315] LOG:  restored log file "000000010000000000000005" from archive
cp: cannot stat ‘/mnt/sdb/pg13data/12345-archive/000000010000000000000006’: No such file or directory
2022-02-09 17:51:06.270 CST [15315] LOG:  invalid record length at 0/6000060: wanted 24, got 0
2022-02-09 17:51:06.274 CST [15326] LOG:  started streaming WAL from primary at 0/6000000 on timeline 1

cascading replication

pg_basebackup -D /mnt/sdb/pg13data/12345-standby-standby -w -R --wal-method=stream --dbname="host=127.0.0.1 user=fang port=12346"

# vi 12345-standby-standby/postgresql.conf
hot_standby = on
port = 12347
restore_command = 'cp -i /mnt/sdb/pg13data/12345-archive/%f %p'
primary_slot_name = ''

pg_ctl -D /mnt/sdb/pg13data/12345-standby-standby/ -l /mnt/sdb/pg13data/12345-standby-standby.log start

-- primary
psql -p 12345 postgres
select * from pg_stat_replication;
create table t1234(id int);
insert into t1234 values(1),(2);

-- standby12346
psql -p 12346 postgres
select * from pg_stat_replication;

-- standby of standby12346
psql -p 12347 postgres
select * from t1234;

warm standby

Hot Standby is identical to Warm Standby, except that the Standby is available to run read-only queries. This offers all of the advantages of Warm Standby, plus the ability to distribute some business workload to the Standby server(s). Hot Standby by itself requires Log Shipping.[1]

standby设置:

hot_standby = off

这个选项需要重启standby. reload不管用.
log:

2022-04-19 11:07:15.187 CST [30472] LOG:  received SIGHUP, reloading configuration files
2022-04-19 11:07:15.189 CST [30472] LOG:  parameter "hot_standby" cannot be changed without restarting the server

warm standby无法连接, 提示错误:

$ psql -p 12346 postgres
psql: error: FATAL:  the database system is starting up

测试standby不在线一段时间后, 再启动standby

先把standby停止.

primary设置, 并reload, 否则会等待standby确认:

synchronous_commit = on         # synchronization level;
synchronous_standby_names = '' # standby servers that provide sync rep

pg_ctl -D 12345 -l 12345.log reload
pgbench -p12345 -i -s10 postgres
pgbench -p12345 -c2 -j1 -T180 -P1 postgres

可以看到有wal被archive了

$ ll 12345-archive/ | wc
    402    3611   28501
$ ll 12345-archive/ | wc
    403    3620   28572
$ ll 12345-archive/ | wc
    406    3647   28785

启动standby: pg_ctl -D 12345-standby -l 12345-standby.log restart
standby.log

2022-04-19 11:28:31.250 CST [30877] LOG:  listening on Unix socket "/tmp/.s.PGSQL.12346"
2022-04-19 11:28:31.267 CST [30878] LOG:  database system was shut down in recovery at 2022-04-19 11:17:11 CST
cp: cannot stat ‘/mnt/sdb/pg13data/12345-archive/00000002.history’: No such file or directory
2022-04-19 11:28:31.283 CST [30878] LOG:  entering standby mode
2022-04-19 11:28:31.314 CST [30878] LOG:  restored log file "000000010000000100000082" from archive
2022-04-19 11:28:31.332 CST [30878] DEBUG:  got WAL segment from archive
2022-04-19 11:28:31.332 CST [30878] DEBUG:  checkpoint record is at 1/82000028
2022-04-19 11:28:31.332 CST [30878] DEBUG:  redo record is at 1/82000028; shutdown true
2022-04-19 11:28:31.333 CST [30878] DEBUG:  next transaction ID: 61904; next OID: 29111
2022-04-19 11:28:31.333 CST [30878] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2022-04-19 11:28:31.333 CST [30878] DEBUG:  oldest unfrozen transaction ID: 478, in database 1
2022-04-19 11:28:31.333 CST [30878] DEBUG:  oldest MultiXactId: 1, in database 1
2022-04-19 11:28:31.333 CST [30878] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2022-04-19 11:28:31.333 CST [30878] DEBUG:  transaction ID wrap limit is 2147484125, limited by database with OID 1
2022-04-19 11:28:31.333 CST [30878] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-04-19 11:28:31.333 CST [30878] DEBUG:  starting up replication slots
2022-04-19 11:28:31.340 CST [30878] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2022-04-19 11:28:31.341 CST [30878] LOG:  consistent recovery state reached at 1/820000A0
2022-04-19 11:28:31.341 CST [30878] LOG:  redo starts at 1/820000A0
2022-04-19 11:28:31.367 CST [30878] LOG:  restored log file "000000010000000100000083" from archive
2022-04-19 11:28:31.385 CST [30878] DEBUG:  got WAL segment from archive
2022-04-19 11:28:31.385 CST [30878] DEBUG:  transaction ID wrap limit is 2147484125, limited by database with OID 1
2022-04-19 11:28:31.385 CST [30878] CONTEXT:  WAL redo at 1/83000028 for XLOG/CHECKPOINT_SHUTDOWN: redo 1/83000028; tli 1; prev tli 1; fpw true; xid 0:61904; oid 29111; multi 1; offset 0; oldest xid 478 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
2022-04-19 11:28:31.415 CST [30878] LOG:  restored log file "000000010000000100000084" from archive
2022-04-19 11:28:31.434 CST [30878] DEBUG:  got WAL segment from archive
2022-04-19 11:28:31.434 CST [30878] DEBUG:  transaction ID wrap limit is 2147484125, limited by database with OID 1
2022-04-19 11:28:31.434 CST [30878] CONTEXT:  WAL redo at 1/84000028 for XLOG/CHECKPOINT_SHUTDOWN: redo 1/84000028; tli 1; prev tli 1; fpw true; xid 0:61904; oid 29111; multi 1; offset 0; oldest xid 478 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
2022-04-19 11:28:31.602 CST [30878] LOG:  restored log file "000000010000000100000085" from archive
2022-04-19 11:28:31.613 CST [30878] DEBUG:  got WAL segment from archive
...此处省略N个restored log file...
2022-04-19 11:28:34.070 CST [30878] LOG:  restored log file "000000010000000100000094" from archive
2022-04-19 11:28:34.098 CST [30878] DEBUG:  got WAL segment from archive
cp: cannot stat ‘/mnt/sdb/pg13data/12345-archive/000000010000000100000095’: No such file or directory
2022-04-19 11:28:34.364 CST [30906] LOG:  started streaming WAL from primary at 1/95000000 on timeline 1
2022-04-19 11:33:31.603 CST [30882] DEBUG:  performing replication slot checkpoint

restore_command, archive_cleanup_command

如果主备不在同一天机器, 应该使用scp[3,p250]等命令. 或其他方式可以让standby访问primary的archive.
primary:

-archive_command = 'cp -i %p /mnt/sdb/pg13data/12345-archive/%f'
+archive_command = 'scp %p vm101:/mnt/sdb/pg13data/12345-archive/%f'

standby:

restore_command = 'cp -i /mnt/sdb/pg13data/12345-archive/%f %p'
+archive_cleanup_command = 'pg_archivecleanup /mnt/sdb/pg13data/12345-archive %r' # command to execute at every restartpoint

standby启动后, 会先找archive, 再找pg_wal, 再启用流复制.
archive由standby来清理. 如果有多个standby, 不能这么做, 因为各个standby同步进度不一样.

pg_archivecleanup is designed to be used as an archive_cleanup_command to clean up WAL file archives when running as a standby server (see Section 26.2).
-- PostgreSQL: Documentation: 13: pg_archivecleanup

The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command in typical single-standby configurations.
-- PostgreSQL: Documentation: 13: 26.2. Log-Shipping Standby Servers

replication slot

https://www.postgresql.org/docs/13/view-pg-replication-slots.html
https://www.postgresql.org/docs/13/warm-standby.html#STREAMING-REPLICATION-SLOTS

primary创建slot:

postgres=# SELECT * FROM pg_create_physical_replication_slot('slot1');
 slot_name | lsn 
-----------+-----
 slot1     | 
(1 row)

postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------
 slot1     |        | physical  |        |          | f         | f      |            |      |              |             |                     |            |

standby配置, reload:

primary_slot_name = 'slot1'

primary:

postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------
 slot1     |        | physical  |        |          | f         | t      |       5254 |      |              | 1/BC890AF0  |                     | reserved   |

使用slot,在stadnby不在线时, wal不会被archive

-rw-------. 1 wang wang 16777216 Apr 20 14:35 0000000100000001000000DD
-rw-------. 1 wang wang 16777216 Apr 20 14:36 0000000100000001000000DE
-rw-------. 1 wang wang 16777216 Apr 20 14:36 0000000100000001000000DF
-rw-------. 1 wang wang 16777216 Apr 20 14:37 0000000100000001000000E0
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000E1
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000E2
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000E3
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000E4
drwx------. 2 wang wang     4096 Apr 20 14:39 archive_status
-rw-------. 1 wang wang 16777216 Apr 20 14:42 0000000100000001000000E5
$ ll -lrt 12345/pg_wal/ | wc
     39     344    2717

把standby起来后, 等待同步完成, 在primary做一次checkpoint, 就可以把wal归档:

-rw-------. 1 wang wang 16777216 Apr 20 14:29 0000000100000001000000EC
-rw-------. 1 wang wang 16777216 Apr 20 14:30 0000000100000001000000E9
-rw-------. 1 wang wang 16777216 Apr 20 14:33 0000000100000001000000E7
-rw-------. 1 wang wang 16777216 Apr 20 14:34 0000000100000001000000EB
-rw-------. 1 wang wang 16777216 Apr 20 14:34 0000000100000001000000F0
-rw-------. 1 wang wang 16777216 Apr 20 14:35 0000000100000001000000F1
-rw-------. 1 wang wang 16777216 Apr 20 14:35 0000000100000001000000EF
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000ED
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000F4
-rw-------. 1 wang wang 16777216 Apr 20 14:39 0000000100000001000000E8
-rw-------. 1 wang wang 16777216 Apr 20 15:56 0000000100000001000000F6
drwx------. 2 wang wang     4096 Apr 20 15:58 archive_status
-rw-------. 1 wang wang 16777216 Apr 20 15:58 0000000100000001000000E6
$ ll -lrt 12345/pg_wal/ | wc
     20     173    1368

restartpoint

The checkpointer process and the background writer process are active during recovery. The checkpointer process will perform restartpoints (similar to checkpoints on the primary) and the background writer process will perform normal block cleaning activities. This can include updates of the hint bit information stored on the standby server. The CHECKPOINT command is accepted during recovery, though it performs a restartpoint rather than a new checkpoint.
-- PostgreSQL: Documentation: 13: 26.5. Hot Standby

If executed during recovery, the CHECKPOINT command will force a restartpoint (see Section 29.4) rather than writing a new checkpoint.
-- PostgreSQL: Documentation: 13: CHECKPOINT

In archive recovery or standby mode, the server periodically performs restartpoints, which are similar to checkpoints in normal operation: the server forces all its state to disk, updates the pg_control file to indicate that the already-processed WAL data need not be scanned again, and then recycles any old log segment files in the pg_wal directory. Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records. A restartpoint is triggered when a checkpoint record is reached if at least checkpoint_timeout seconds have passed since the last restartpoint, or if WAL size is about to exceed max_wal_size. However, because of limitations on when a restartpoint can be performed, max_wal_size is often exceeded during recovery, by up to one checkpoint cycle's worth of WAL. (max_wal_size is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.)
-- PostgreSQL: Documentation: 13: 29.4. WAL Configuration

Continuous Archiving in Standby

https://www.postgresql.org/docs/13/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY

sync and async replication

[3,p248]

If synchronous_standby_names is empty, the only meaningful settings are on and off; remote_apply, remote_write and local all provide the same local synchronization level as on.
-- PostgreSQL: Documentation: 13: 19.5. Write Ahead Log

synchronous_standby_names = '*',到底是什么意思? 任何一个standby确认了就行? 代码搜sync_standby_priority, SyncRepStandbyNames, SyncRepGetCandidateStandbys. SyncRepInitConfig-->SyncRepGetStandbyPriority; 看了代码,大致逻辑是,根据_names来确定priority,第一个是1,第二个是2,..., replication walsender启动时,会检查自己的名字在第几个,即确定自己的priority. 如果_names是'*', 那么所有walsender的priority都是1. 找不到自己的名字, 0. 解析_names还有一个专门的grammar: syncrep_gram.y.

The special entry * matches any standby name.
-- PostgreSQL: Documentation: 13: 19.6. Replication

集群postgresql-ha postgresql rainbond

集群autofailover repmgr pg