记一次mysql无法启动的解决过程

正文

今天开发人员反馈一个问题,就是某一台开发环境机器的mysql无法启动了,但是如果这台服务器重启的话,mysql就好使,而第二天就会出现mysql死掉然后无法启动的情况。我使用service mysqld restart,命令行反馈如下的内容:

1
2
3
[root@iZ2373j9xivZ data]# service mysqld restart
MySQL server PID file could not be found! [FAILED]
Starting MySQL........The server quit without updating PID file (/data/mysql/data/sock/mysql.pid). [FAILED]

打开错误日志看一下,里面是这么写的:

1
2
3
4
5
6
7
8
2018-03-10 00:26:24 25919 [Note] InnoDB: Using CPU crc32 instructions
2018-03-10 00:26:24 25919 [Note] InnoDB: Initializing buffer pool, size = 4.0G
InnoDB: mmap(549453824 bytes) failed; errno 12
2018-03-10 00:26:24 25919 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' init function returned error.
2018-03-10 00:26:24 25919 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2018-03-10 00:26:24 25919 [ERROR] Unknown/unsupported storage engine: INNODB
2018-03-10 00:26:24 25919 [ERROR] Aborting

InnoDB: mmap(549453824 bytes) failed; errno 12,然后我就free -m查看一下,当前服务器的内存已经不够用了。

1
2
3
4
5
[root@iZ2373j9xivZ sock]# free -m
total used free shared buffers cached
Mem: 7869 7747 121 0 16 15
-/+ buffers/cache: 7716 152
Swap: 0 0 0

那么是什么在占用这台服务器的内存?使用ps aux | sort -k4nr |head -5 这个命令查找当前占用内存最大的五个进程一看,全是php-fpm,同时也发现服务器里面运行大量的php-fpm,在征得开发人员的同意之后,重启php-fpm进程,内存空出来很多。

此时再次service mysqld restart,发现mysql的错误日志改成如下的样子了:

1
2
3
4
2018-03-12 10:53:42 28238 [Note] InnoDB: Highest supported file format is Barracuda.
2018-03-12 10:53:42 28238 [Note] InnoDB: The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!
2018-03-12 10:53:42 28238 [Note] InnoDB: Database was not shutdown normally!
2018-03-12 10:53:42 28238 [Note] InnoDB: Starting crash recovery.

这次变成了The log sequence numbers 16939991440 and 16939991440 in ibdata files do not match the log sequence number 16940121908 in the ib_logfiles!,我打开my.cnf,适当的调小了max_connectionsinnodb_buffer_pool_size,然后service mysqld restart的时候发现错误又变了:

1
2
3
4
5
6
7
8
2018-03-12 11:03:57 29190 [Note] InnoDB: 5.6.27 started; log sequence number 16940121918
2018-03-12 11:03:57 29190 [Note] Server hostname (bind-address): '*'; port: 3306
2018-03-12 11:03:57 29190 [Note] IPv6 is not available.
2018-03-12 11:03:57 29190 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2018-03-12 11:03:57 29190 [Note] Server socket created on IP: '0.0.0.0'.
2018-03-12 11:03:57 29190 [ERROR] Can't start server : Bind on unix socket: Permission denied
2018-03-12 11:03:57 29190 [ERROR] Do you already have another mysqld server running on socket: /data/mysql/data/sock/mysql.sock ?
2018-03-12 11:03:57 29190 [ERROR] Aborting

这就是文件权限问题了,我再次打开my.cnf发现里面的user填写的是mysql,那么把/data/mysql/data/sock/mysql.sock这一系列的文件的所属人都改成了mysql用户,这一次重启mysql就OK了。

为什么这个mysql会好好的突然自动死掉呢?我发现日志里面有这样的字样:InnoDB: Database was not shutdown normally!,于是我猜想很有可能是php-fpm这进程不断地增长,占用的内存太大,导致mysqllinux的内核杀死了。于是查看/var/log/message的文件,结合mysql的错误日志时间找到了如下的字样:

1
2
3
4
5
Mar 10 00:26:22 iZ2373j9xivZ kernel: Out of memory: Kill process 1883 (mysqld) score 53 or sacrifice child
Mar 10 00:26:22 iZ2373j9xivZ kernel: Killed process 1883, UID 501, (mysqld) total-vm:6849508kB, anon-rss:429368kB, file-rss:176kB
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: php-fpm cpuset=/ mems_allowed=0
Mar 10 04:11:38 iZ2373j9xivZ kernel: Pid: 4375, comm: php-fpm Not tainted 2.6.32-431.23.3.el6.x86_64 #1

证据确凿,php-fpm的无休止增长导致服务器的可用内存变小,最后内核把mysql杀死,修改php-fpm的文件之后,暂时好了点…

参考资料

http://robinchen.me/tech/2016/03/14/tech-aliyun-centos-mysql-shutdown-itself-irregularly.html
http://www.wisedream.net/2017/12/20/traps/mysql-corrupt/

感谢您请我喝咖啡~O(∩_∩)O,如果要联系请直接发我邮箱chenx1242@163.com,我会回复你的
-------------本文结束感谢您的阅读-------------