KingbaseES 中select for update语句引起的锁问题

发布时间 2023-09-19 19:22:46作者: KINGBASE研究院

背景

客户现场执行压测时候,发生周期性的TPS大幅下降,通过查看kwr报告发现DBcpu时间占DBtime时间很少,百分之90的DBtime花费在tuple锁等待上,等待事件类型是lock。
等待时间最多的语句是select fd_id,ctid,xmin from ... for update

含义

select for update语句目的在查询时,避免其他用户对该表进行dml等操作,造成表的不一致性。
select for update语句需要等待表锁,以及表对应行锁释放之后才能返回查询结果。

表级锁模式

常见锁模式以及应用场景:
ACCESS SHARE:select操作获取该模式锁资源,通常情况下所有只读取不修改表的查询都会获取该模式锁资源
ROW SHARE: select for update 和 select for share 命令获取该模式锁资源
ROW EXCLUSIVE: DML操作通过会获取该模式锁资源,通常情况下任何需要修改表数据的操作都会持有该模式锁资源
SHARE UPDATE EXCLUSIVE:对于 vacuum(非full)、create index concurrnetly、reindex concurrently、create statistics、alter index相关、alter table相关操作会持有该模式锁资源,该模式锁资源主要是为了范围并发对同一张表DDL变更以及vacuum操作
SHARE: create index (非concurrently)操作会持有该模式锁资源,该模式锁资源可阻止并发对表的创建索引操作
SHARE ROW EXCLUSIVE: cerate trigger、一些alter table操作会持有该模式锁资源
EXCLUSIVE: refresh materialized view concurrently操作会持有该模式锁资源
ACCESS EXCLUSIVE:lock table模式锁定模式,drop table、truncate、reindex、 cluster、vacuum full、refresh materialized view (without concurrently) 还有一些alter table、alter index操作需要获取该模式锁资源,该模式锁资源保证了持有者唯一访问表

测试

测试环境隔离级别是 read committed,数据库版本是KingbaseESV8R6C7

1.update语句阻塞select for update语句

开启一个会话,不提交
begin;
update t3 set name='qite' where id=1;

开启另一个会话执行,发现被阻塞
select id,name from t3 for update;

在另一个会话查看锁情况。
查看表t3有关的锁,这里表t3持有了tuple上的AccessExclusiveLock锁,这是最高级别的锁,阻塞一切sql语句。
pid4203持有的锁row share表示正在执行select for update语句,AccessExclusiveLock最高级别锁也是pid4203需要获取的。
pid4205持有的RowExclusiveLock锁表示执行dml语句需要获取的锁,这里可以看出pid4205会话很可能执行了update语句。
SELECT
pg_locks.pid,
a.datname,
locktype,
virtualtransaction,
transactionid,
nspname,
relname,
mode,
granted,
cast(date_trunc('second',query_start) AS timestamp) AS query_start
FROM
pg_locks
LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
LEFT OUTER JOIN pg_namespace ON (pg_namespace.oid = pg_class.relnamespace),
pg_stat_activity a
WHERE NOT pg_locks.pid = pg_backend_pid()
AND pg_locks.pid=a.pid
and pg_class.relname='t3';
 pid  | datname | locktype | virtualtransaction | transactionid | nspname | relname |        mode         | granted |     query_start
------+---------+----------+--------------------+---------------+---------+---------+---------------------+---------+---------------------
 4203 | test    | tuple    | 4/43               |               | public  | t3      | AccessExclusiveLock | t       | 2023-05-16 15:39:11
 4203 | test    | relation | 4/43               |               | public  | t3      | RowShareLock        | t       | 2023-05-16 15:39:11
 4205 | test    | relation | 5/10               |               | public  | t3      | RowExclusiveLock    | t       | 2023-05-16 15:39:08
 4205 | test    | relation | 5/10               |               | public  | t3      | AccessShareLock     | t       | 2023-05-16 15:39:08
(4 rows)

执行另一个更直观的sql语句查看锁情况,开启第三个会话执行如下sql。
SELECT blocked_locks.pid     AS blocked_pid,
      blocked_activity.usename  AS blocked_user,
      blocking_locks.pid     AS blocking_pid,
      blocking_activity.usename AS blocking_user,
      blocked_activity.query    AS blocked_statement,
      blocking_activity.query   AS current_statement_in_blocking_process
FROM  sys_catalog.sys_locks         blocked_locks
 JOIN sys_catalog.sys_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
 JOIN sys_catalog.sys_locks         blocking_locks 
     ON blocking_locks.locktype = blocked_locks.locktype
     AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
     AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
     AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
     AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
     AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
     AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
     AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
     AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
     AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
     AND blocking_locks.pid != blocked_locks.pid
 JOIN sys_catalog.sys_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
 blocked_pid | blocked_user | blocking_pid | blocking_user |         blocked_statement          | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+------------------------------------+---------------------------------------
        4203 | system       |         4205 | system        | select id,name from t3 for update; | update t3 set name='qite' where id=1;
(1 row)


在操作系统查看有关pid,4203pid中的select语句被阻塞,阻塞源头是第一个会话执行的update语句,pid是4205。
kingbas+  4203  4154  0 15:27 ?        00:00:00 kingbase: system test [local] SELECT waiting
kingbas+  4205  4154  0 15:27 ?        00:00:00 kingbase: system test [local] idle in transaction

第一个会话中的update语句commit或rollback后,事务结束后,第二个会话中的select for update语句可以执行成功,原因是blocking_pid被释放了。

2.select for update语句阻塞update语句

开启一个会话执行如下sql
begin;
select id,name from t3 for update;

开启另一个会话执行,发现被阻塞
update t3 set name='qitu' where id=1;

在另一个会话查看锁情况
pid4203对应RowExclusiveLock锁表示执行dml语句需要获取的锁,pid4205持有的RowShareLock锁说明此会话在执行select for update语句。
SELECT
pg_locks.pid,
a.datname,
locktype,
virtualtransaction,
transactionid,
nspname,
relname,
mode,
granted,
cast(date_trunc('second',query_start) AS timestamp) AS query_start
FROM
pg_locks
LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
LEFT OUTER JOIN pg_namespace ON (pg_namespace.oid = pg_class.relnamespace),
pg_stat_activity a
WHERE NOT pg_locks.pid = pg_backend_pid()
AND pg_locks.pid=a.pid
and pg_class.relname='t3';
 pid  | datname | locktype | virtualtransaction | transactionid | nspname | relname |       mode       | granted |     query_start
------+---------+----------+--------------------+---------------+---------+---------+------------------+---------+---------------------
 4203 | test    | tuple    | 4/44               |               | public  | t3      | ExclusiveLock    | t       | 2023-05-16 15:54:16
 4203 | test    | relation | 4/44               |               | public  | t3      | RowExclusiveLock | t       | 2023-05-16 15:54:16
 4203 | test    | relation | 4/44               |               | public  | t3      | AccessShareLock  | t       | 2023-05-16 15:54:16
 4205 | test    | relation | 5/11               |               | public  | t3      | RowShareLock     | t       | 2023-05-16 15:54:00
(4 rows)

执行如下sql,果然阻塞源是pig4205执行的语句select for update。
SELECT blocked_locks.pid     AS blocked_pid,
      blocked_activity.usename  AS blocked_user,
      blocking_locks.pid     AS blocking_pid,
      blocking_activity.usename AS blocking_user,
      blocked_activity.query    AS blocked_statement,
      blocking_activity.query   AS current_statement_in_blocking_process
FROM  sys_catalog.sys_locks         blocked_locks
 JOIN sys_catalog.sys_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
 JOIN sys_catalog.sys_locks         blocking_locks 
     ON blocking_locks.locktype = blocked_locks.locktype
     AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
     AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
     AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
     AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
     AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
     AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
     AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
     AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
     AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
     AND blocking_locks.pid != blocked_locks.pid
 JOIN sys_catalog.sys_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
 blocked_pid | blocked_user | blocking_pid | blocking_user |           blocked_statement           | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+---------------------------------------+---------------------------------------
        4203 | system       |         4205 | system        | update t3 set name='qitu' where id=1; | select id,name from t3 for update;
(1 row)

操作系统查看对应pid情况,pid4203执行update语句被阻塞,阻塞源是pid4205。
kingbas+  4203  4154  0 15:27 ?        00:00:00 kingbase: system test [local] UPDATE waiting
kingbas+  4205  4154  0 15:27 ?        00:00:00 kingbase: system test [local] idle in transaction

解决方法是持有锁会话结束事务(提交或回滚),随后被阻塞的update语句执行成功,或杀掉持有锁的进程。

总结

select for update几种其他用法:
select * from t for update nowait 不等待行锁释放,提示锁冲突,不返回结果。
select * from t for update wait 5 等待5秒,若行锁仍未释放,则提示锁冲突,不返回结果。
select * from t for update skip locked 查询返回查询结果,但忽略有行锁的记录。
使用FOR UPDATE WAIT”子句有几个优点:
1.防止无限期地等待被锁定的行。
2.允许应用程序中对锁的等待时间进行控制。
3.在不确定sql执行前表是否被锁,wait关键字起到关键作用。
4.若使用了skip locked,则可以越过锁定的行,而不会报错。

此案例提供了数据库中锁阻塞的排查方法,如生产环境中遇到其他类型lock重型锁阻塞,也可用以上方法分析。