KingbaseES V8R6集群运维案例之---查看主备流复制之间的延迟

发布时间 2023-10-10 17:58:58作者: 天涯客1224

案例说明:
在主备流复制集群运维过程中,对于同步复制节点,在读写分离环境应用需要强一致性访问时,需要监控主备之间流复制的延迟。

适用版本:
KingbaseES V8R6

一、主库查询流复制延迟

1、 通过sys_stat_replication
获取sent_lsn、write_lsn、flush_lsn、replay_lsn获取备库延迟信息

1)执行语句

select usename,application_name,client_addr,state,  
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), sent_lsn) sent_delay,   
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), write_lsn) write_delay,   
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), flush_lsn) flush_delay,   
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), replay_lsn) replay_delay   
from pg_stat_replication;

2)执行结果

prod=# select usename,application_name,client_addr,state,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), sent_lsn) sent_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), write_lsn) write_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), flush_lsn) flush_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(), replay_lsn) replay_delay
prod-# from pg_stat_replication;
 usename | application_name |  client_addr  |   state   | sent_delay | write_delay | flush_delay | replay_delay
---------+------------------+---------------+-----------+------------+-------------+-------------+--------------
 esrep   | node2            | 192.168.1.202 | streaming |          0 |           0 |           0 |            0
(1 row)

2、通过slot获取延迟信息

restart_lsn        最老的WAL日志的LSN 在复制槽中
数据库会把所有复制槽里restart_lsn最小的作为最老的LSN,这个LSN往后的所有较新的日志都会保留下来。

1)查询语句

select slot_name,plugin,slot_type,database,active,                       
xmin,catalog_xmin,  
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay   
from pg_replication_slots; 

2)查询结果

prod=# select slot_name,plugin,slot_type,database,active,
prod-# xmin,catalog_xmin,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay
prod-# from pg_replication_slots;
   slot_name   | plugin | slot_type | database | active | xmin | catalog_xmin | restart_delay
---------------+--------+-----------+----------+--------+------+--------------+---------------
 repmgr_slot_2 |        | physical  |          | t      | 2130 |              |             0
(1 row)

3)查询语句

select slot_name,plugin,slot_type,database,temporary,active,active_pid,  
xmin,catalog_xmin,  
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay,   
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),confirmed_flush_lsn) flush_delay   
from pg_replication_slots; 

4)查询结果

prod=# select slot_name,plugin,slot_type,database,temporary,active,active_pid,
prod-# xmin,catalog_xmin,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay,
prod-# pg_wal_lsn_diff(pg_current_wal_insert_lsn(),confirmed_flush_lsn) flush_delay
prod-# from pg_replication_slots;
   slot_name   | plugin | slot_type | database | temporary | active | active_pid | xmin | catalog_xmin | restart_
delay | flush_delay
---------------+--------+-----------+----------+-----------+--------+------------+------+--------------+---------
------+-------------
 repmgr_slot_2 |        | physical  |          | f         | t      |      10895 | 2130 |              |
    0 |
(1 row)

二、从备库执行

1)查询语句

select pg_is_in_recovery(),pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), 
       pg_last_xact_replay_timestamp();

2)查询结果

test=# select pg_is_in_recovery(),pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(),
test-#        pg_last_xact_replay_timestamp();
 pg_is_in_recovery | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_replay_timestamp
-------------------+-------------------------+------------------------+-------------------------------
 t                 | 2/1D000B50              | 2/1D000B50             | 2023-10-10 11:50:16.738548+08
(1 row)

3)查询语句

select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
 pg_last_wal_replay_lsn(),pg_last_xact_replay_timestamp();

4)查询结果

test=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
test-#  pg_last_wal_replay_lsn(),pg_last_xact_replay_timestamp();
 pg_is_in_recovery | pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_re
play_timestamp
-------------------+-------------------------+-------------------------+------------------------+----------------
---------------
 t                 | f                       | 2/1D000B50              | 2/1D000B50             | 2023-10-10 11:5
0:16.738548+08
(1 row)

5)查询语句

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

6)查询结果

test=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
test-# THEN 0
test-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
test-# END AS log_delay;
 log_delay
-----------
         0
(1 row)

三、查看主备之间的延迟wal差异量

1)查看主库

prod=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 2/1D000B50
(1 row)

2)查看备库

select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), 
pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();


test=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),
test-# pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

 pg_is_in_recovery | pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_re
play_timestamp
-------------------+-------------------------+-------------------------+------------------------+----------------
---------------
 t                 | f                       | 1/1D000B50              | 1/1D000B50             | 2023-10-10 11:5
0:16.738548+08
(1 row)

3)查看lsn差异函数

pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn)
pg_xlog_location_diff (location pg_lsn, location pg_lsn)


4)获取wal日志差异
prod=# select pg_wal_lsn_diff('2/1D000B50','1/1D000B50');
 pg_wal_lsn_diff
-----------------
      4294967296
(1 row)

prod=# select round(4294967296/pow(1024,3.0),2) missing_lsn_GiB;
 missing_lsn_GiB
-----------------
            4.00
(1 row)

四、总结
可以通过脚本的方式监控主备流复制之间的延迟,第一时间获悉集群运行状态。