本文共 4181 字,大约阅读时间需要 13 分钟。
mysql-mmm故障解决一例
关键字:FATAL Couldn't configure IP 'x.x.x.x' on interface 'eth1': undef
故障现象:
在mmm_monitor上ping agent的虚拟机ip,其中一个无法ping通 # mmm_control show # Warning: agent on host db3 is not reachable db1(10.1.1.15) master/ONLINE. Roles: reader(10.1.1.23), writer(10.1.1.20) db2(10.1.1.14) master/ONLINE. Roles: reader(10.1.1.22) db3(10.1.1.13) slave/ONLINE. Roles: reader(10.1.1.21) # Role writer is assigned to it's preferred host db1.# ping 10.1.1.21
PING 10.1.1.21 (10.1.1.21) 56(84) bytes of data. From 10.1.1.12 icmp_seq=2 Destination Host Unreachable From 10.1.1.12 icmp_seq=3 Destination Host Unreachable From 10.1.1.12 icmp_seq=4 Destination Host Unreachable--- 10.1.1.21 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2998ms , pipe 3 # ping 10.1.1.22 PING 10.1.1.22 (10.1.1.22) 56(84) bytes of data. 64 bytes from 10.1.1.22: icmp_seq=1 ttl=64 time=0.102 ms--- 10.1.1.22 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.102/0.102/0.102/0.000 ms在db3的实体机 10.1.1.13上:
查看是否有此IP,结果此IP没有被设置到此机器 # ip add 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:80:3f:03:47:ce brd ff:ff:ff:ff:ff:ff inet 6.6.6.6/28 brd 122.225.32.143 scope global eth0 inet6 fe80::280:3fff:fe03:47ce/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:80:3f:03:47:cf brd ff:ff:ff:ff:ff:ff inet 10.1.1.13/24 brd 10.1.1.255 scope global eth1 inet6 fe80::280:3fff:fe03:47cf/64 scope link valid_lft forever preferred_lft forever 4: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0查看mysql-mmm-agent的日志
2011/06/02 20:07:50 INFO Changing active master to 'db1' 2011/06/02 20:07:50 FATAL Failed to change master to 'db1': undef 2011/06/02 20:07:50 FATAL Couldn't configure IP '10.1.1.21' on interface 'eth1': undef根据mysql-mmm-agent的日志,通过google找到了解决问题的方法
# /usr/lib/mysql-mmm/agent/configure_ip eth1 10.1.1.21 Can't locate Net/ARP.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/lib/perl5/vendor_perl/5.8.8/MMM/Agent/Helpers/Network.pm line 11. BEGIN failed--compilation aborted at /usr/lib/perl5/vendor_perl/5.8.8/MMM/Agent/Helpers/Network.pm line 11. Compilation failed in require at /usr/lib/perl5/vendor_perl/5.8.8/MMM/Agent/Helpers/Actions.pm line 5. BEGIN failed--compilation aborted at /usr/lib/perl5/vendor_perl/5.8.8/MMM/Agent/Helpers/Actions.pm line 5. Compilation failed in require at /usr/lib/mysql-mmm/agent/configure_ip line 6. BEGIN failed--compilation aborted at /usr/lib/mysql-mmm/agent/configure_ip line 6. 原来是arp.pm没有安装,我们现在就来安装它# perl -MCPAN -e shell
cpan> install Net::ARP 安装完成以后通过mmm_monitor将db3置于离线,在置于在线,测试是否可以ping通。 # mmm_control set_offline db3 OK: State of 'db3' changed to ADMIN_OFFLINE. Now you can wait some time and check all roles! # mmm_control set_online db3 OK: State of 'db3' changed to ONLINE. Now you can wait some time and check its new roles! # mmm_control show db1(10.1.1.15) master/ONLINE. Roles: reader(10.1.1.23), writer(10.1.1.20) db2(10.1.1.14) master/ONLINE. Roles: reader(10.1.1.22) db3(10.1.1.13) slave/ONLINE. Roles: reader(10.1.1.21) # Role writer is assigned to it's preferred host db1. # ping 10.1.1.21 PING 10.1.1.21 (10.1.1.21) 56(84) bytes of data. 64 bytes from 10.1.1.21: icmp_seq=1 ttl=64 time=0.181 ms 64 bytes from 10.1.1.21: icmp_seq=2 ttl=64 time=0.079 ms问题解决了。
最后总结一下: 这个问题其实是安装时候不小心遗留下来的,由于db3是纯slave,所以一般是通过真实ip去访问,没有用到虚拟IP,mmm_monitor也完全没有表现出任何的故障信息。问题是在配置读写分离时候,用到了slave的虚拟IP,才发现的。所以在需要上线的架构,最好还是安装官方文档,一一检查清楚,避免不必要的故障。