percona toolkit之pt-slave-restart使用

发布时间 2023-06-07 17:50:05作者: 高&玉

介绍

pt-slave-restart是一款mysql主从异常处理工具,监控replica端SQL_THREAD是否正常,如果异常pt-slave-restart尝试跳过导致错误的事务,然后重新启动SQL_THREAD已达到修复SQL_THREAD的目的。

 

测试

MySQL主从复制source端

mysql> select * from t1;
+----+-------+
| id | name  |
+----+-------+
|  1 | name1 |
|  2 | name2 |
|  3 | name3 |
+----+-------+

 

MySQL主从复制replica端

mysql> select * from t1;
+----+-------+
| id | name  |
+----+-------+
|  1 | name1 |
|  2 | name2 |
|  3 | name3 |
+----+-------+

 

执行pt-slave-restart监控replica端

[root]# pt-slave-restart h=192.168.1.73,P=3312,u=root,p=m密码

 

MySQL主从复制source端

mysql> set sql_log_bin=0;
mysql> insert into t1 values(4,'name4');
mysql> set sql_log_bin=1;
mysql> delete from t1 where id=4;

 

MySQL主从复制replica端,此时SQL_THREAD线程报错1062报错:

mysql> show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 192.168.1.73
                  Source_User: repuser
                  Source_Port: 3311
                Connect_Retry: 60
              Source_Log_File: binlog.000045
          Read_Source_Log_Pos: 1021064795
               Relay_Log_File: 64562e05689a-relay-bin.000056
                Relay_Log_Pos: 1095227
        Relay_Source_Log_File: binlog.000045
           Replica_IO_Running: Yes
          Replica_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000045, end_log_pos 1021061873. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 1021061586
              Relay_Log_Space: 1021065568
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1062
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000045, end_log_pos 1021061873. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Source_Server_Id: 88
                  Source_UUID: e28ee878-f92e-11ed-b68e-0242ac110003
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: 
           Source_Retry_Count: 86400
                  Source_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 230607 09:04:33
               Source_SSL_Crl: 
           Source_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Source_TLS_Version: 
       Source_public_key_path: 
        Get_Source_public_key: 0
            Network_Namespace: 

 

稍等一会儿,pt-slave-restart会自动修复replica端主从异常

[root]# pt-slave-restart h=192.168.1.73,P=3312,u=root,p=Gaoyu@029
2023-06-07T17:01:10 P=3312,h=192.168.1.73,p=...,u=root 64562e05689a-relay-bin.000056     1004491 1062 

注释:
    时间戳:2023-06-07T17:01:10
    replica端信息:P=3312,h=192.168.1.73,p=...,u=root
    relay log:64562e05689a-relay-bin.000056
    relay log位置:1004491
    主从复制最后报错码:1062

 

查看replica端主从复制信息,此时主从复制恢复正常

mysql> show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 192.168.1.73
                  Source_User: repuser
                  Source_Port: 3311
                Connect_Retry: 60
              Source_Log_File: binlog.000045
          Read_Source_Log_Pos: 1021515791
               Relay_Log_File: 64562e05689a-relay-bin.000056
                Relay_Log_Pos: 1549019
        Relay_Source_Log_File: binlog.000045
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 1021515378
              Relay_Log_Space: 1021516564
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Source_Server_Id: 88
                  Source_UUID: e28ee878-f92e-11ed-b68e-0242ac110003
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Source_Retry_Count: 86400
                  Source_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Source_SSL_Crl: 
           Source_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Source_TLS_Version: 
       Source_public_key_path: 
        Get_Source_public_key: 0
            Network_Namespace: 

 

此时source端与replica端t1表数据是一致的,因为source端先insert后delete。 

 

参考:https://docs.percona.com/percona-toolkit/pt-slave-restart.html