Troubleshooting a SIGKILL
17 Jul 2017 in ConfigServer, SIGKILL
Recently I was called to investigate a problem where a PHP script stopped working after a random amount of time. It was a long running script interacting with mysql and it was called from both the webserver (apache via suphp) and the command line.
To give you some context the script was badly written, continuing the tradition of how bad the PHP as a programming language can be - of course this is mainly the programmer's fault:
ignore_user_abort(true);
// the end of time is coming
set_time_limit(0);
// who cares if the server is fucked up
error_reporting(E_ALL ^ E_DEPRECATED ^ E_NOTICE ^ E_WARNING);
// we don't care about warnings - of course *WE* introduced them
ini_set('display_errors', 'On');
// yes, display errors on the production server
mysql_query("SET NAMES = 'greek'");
mysql_query("set character_set_connection=greek");
mysql_query("set character_set_client=greek");
mysql_query('set character set greek');
mysql_query('set character_set_results greek');
mysql_query('SET wait_timeout=28800;');
// because we don't exactly know which one is working,
// just try out all possible permutations
function my_file_get_contents($url)
{
$filename = $url;
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
return $contents;
}
// is this code so old considering that file_get_contents
// is in PHP core since 4.3.0?
The linux server was setup (and managed) with CPanel from some hosting
company with operators doing things like this (excerpt from .bash_history
):
wall Is any one working on the server
wall <some username>
chmod 777 /on/some/publicly/available/file-from-webserber
Add on top of that the absense of any kind of manual about installed packages, customizations, checklists, policies, security, etc, and things gets interesting pretty quickly.
Back to the problem: After trying the script with nothing written in Apache/PHP
error log about a possible error (of course I changed error_reporting
to
E_ALL
), I switched to command line in order to find out what was happening:
php the-offending-script
<some output>
Killed
Killed? WTF!
Ok, let's look at kernel messages, is this run out of memory? Did the oom
killer kick in? Unfortunatelly no, everything seemed fine.
Is MySQL low on connections? Increased, no solution.
Watched memory footprint via memory_get_usage()
. Nothing suspicious.
Next step to try with strace
but again no usable hint.
The script was killed with an exit code of 137
(that is 128 + 9) which means
it received the SIGKILL
signal. So I increased user limits - or more preciselly
disabling the limits cpanel software has introduced.
Still the script was killed at random points.
Confused, I tail'd all of the /var/log/*.log
, run the script and voilla:
Jul 17 XX:XX:XX host lfd[31295]: *User Processing* PID:24845 Kill:1 User:XXX RSS:457(MB) EXE:/usr/local/bin/php CMD:php the-offending-script
This is from a file called /var/log/lfd.log
and it turned out its part of the
ConfigServer package.
What this does is to kill a process when it's above a memory limit, time limit
or a number of processes per user limit - in this case it was the memory.
The fun part was that the comments in the configuration file at
/etc/csf/csf.conf
about "Process Tracking" shows a warning about
not enabling this:
Warning: We don't recommend enabling this option unless absolutely necessary as it can cause unexpected problems when processes are suddenly terminated. It can also lead to system processes being terminated which could cause stability issues. It is much better to leave this option disabled and to investigate each case as it is reported when the triggers above are breached
So I just set the PT_USERKILL
to "0"
, restarted the LFD
daemon via
/etc/init.d/lfd restart
and problem solved!
PS: I forgot to tell you that no email was set in CSF configuration to receive these warnings. How awesome is that?