背景

环境是由3个MySQL8.0.31节点组成的MGR单主集群(建议配置奇数个节点)

正常节点:
正常节点

可以看到,在正常节点无法和异常的节点通信,当前组里面只有2个节点了。

异常节点:
异常节点

可以看到,异常的节点无法和其他2个正常节点通信,而且自己的状态是ERROR状态。

排查过程:

2023-02-19T08:35:12.689967+08:00 26 [Warning] [MY-010957] [Server] The replication timestamps have returned to normal values.
----------这部分日志说明它连接不上其他节点了----------------
2023-02-19T11:23:59.451419+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.236:5562 has become unreachable.'
2023-02-19T11:23:59.466959+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.17.37:5562 has become unreachable.'
2023-02-19T11:23:59.467008+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.129:5562 has become unreachable.'
2023-02-19T11:23:59.467020+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xx.xx.184:5562 has become unreachable.'

-------无法连接其他节点,停止所有更新,并提示可以使用参数group_replication_force_members重开一个组------
2023-02-19T11:23:59.467218+08:00 0 [ERROR] [MY-011495] [Repl] Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This
server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to forc
e a new group membership.'

----------这部分日志说明它网络恢复了----------------
2023-02-19T11:24:02.141493+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.236:5562 is reachable again.'
2023-02-19T11:24:03.219209+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.17.37:5562 is reachable again.'
2023-02-19T11:24:03.245901+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.129:5562 is reachable again.'
2023-02-19T11:24:03.245941+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xx.xx.184:5562 is reachable again.'
2023-02-19T11:24:03.245956+08:00 0 [Warning] [MY-011498] [Repl] Plugin group_replication reported: 'The member has resumed contact with a majority of the members in the gro
up. Regular operation is restored and transactions are unblocked.'

--------由于网络原因,它被从MGR复制组中踢出了,状态变更成ERROR-------
2023-02-19T11:24:06.907099+08:00 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing membe
r status to ERROR.'

--------一些事务将会回滚--------------
2023-02-19T11:24:06.922962+08:00 0 [Warning] [MY-011630] [Repl] Plugin group_replication reported: 'Due to a plugin error, some transactions were unable to be certified and
will now rollback.'

--------MySQL被自动设置成read only模式-----------
2023-02-19T11:24:06.923035+08:00 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was de
tected.'

--------等待冲突检测,执行before_commit函数失败---------
2023-02-19T11:24:06.923078+08:00 312718417 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on
session 312718417'
2023-02-19T11:24:06.923081+08:00 397259854 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on
session 397259854'
2023-02-19T11:24:06.923165+08:00 312718417 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed
2023-02-19T11:24:06.923163+08:00 397259854 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed

-------由于自身状态ERROR无法加入组,提示修复错误或者重启MGR------
2023-02-19T11:24:08.587817+08:00 397259854 [ERROR] [MY-011601] [Repl] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication is on ERROR
state. Check for errors and restart the plugin'

重启下MGR:

mysql> stop group_replication;
Query OK, 0 rows affected (17.52 sec)

mysql> start group_replication;
Query OK, 0 rows affected (17.52 sec)

重启之后,可以看到,ERROR的状态变成了Recovering,说明节点正在执行恢复操作。不一会儿,就都变成Online状态了。

恢复后的集群状态

最后修改:2023 年 02 月 20 日
如果觉得我的文章对你有用,请随意赞赏