Ak ste administrátorom po určitú dobu, určite ste zistili situácie, keď server narastá v používaní CPU alebo využití pamäte a / alebo úrovniach načítania. Spustenie "top" vám nie vždy dá odpoveď. Takže, ako nájdeš tie zúfalé procesy, ktoré žutia systémové zdroje, aby si ich mohol zabiť?
Nasledujúci skript by mohol pomôcť. Bol napísaný pre webový server, takže niektoré jeho časti konkrétne hľadajú procesy httpd a niektoré časti, ktoré sa zaoberajú MySQL. V závislosti od nasadenia servera stačí komentovať / odstrániť tieto sekcie a pridať ďalšie. Mal by sa použiť ako východiskový bod.
Predpoklady pre túto verziu skriptu sú niektoré freeware vydané pod GNU General Public License s názvom mytop (k dispozícii na adrese https://jeremy.zawodny.com/mysql/mytop/), čo je fantastický nástroj na kontrolu fungovania MySQL. Stárne, ale stále funguje skvelo pre naše účely tu. Navyše používam mutt ako poštový priečinok - možno budete chcieť zmeniť skript jednoducho pomocou linuxu postaveného v utilitách `mail`. Spúšťam to cez cron každú hodinu; upravte podľa vlastného uváženia. Oh - a tento skript musí bežať ako root, pretože číta z niektorých chránených oblastí servera.
Takže začnime, či nie?
Najskôr nastavte premenné skriptu:
#!/bin/bash # # Script to check system load average levels to try to determine # what processes are taking it overly high… # # 07Jul2010 tjones # # set environment dt=`date +%d%b%Y-%X` # Obviously, change the following directories to where your log files actually are kept tmpfile='/tmp/checkSystemLoad.tmp' logfile='/tmp/checkSystemLoad.log' msgLog='/var/log/messages' mysqlLog='/var/log/mysqld.log' # the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report) mailstop='[email protected]' mailstop1='[email protected]' machine=`hostname` # The following three are for mytop use - use a db user that has decent rights dbusr='username' dbpw='password' db='yourdatabasename' # The following is the load level to check on - 10 is really high, so you might want to lower it. levelToCheck=10
Ďalej skontrolujte úroveň zaťaženia a zistite, či skript by mal pokračovať:
# Set variables from system: loadLevel=`cat /proc/loadavg | awk '{print $1}'` loadLevel=$( printf '%0.f' $loadLevel )
# if the load level is greater than you want, start the script process. Otherwise, exit 0
if [ $loadLevel -gt $levelToCheck ]; then echo '' > $tmpfile echo '**************************************' >>$tmpfile echo 'Date: $dt ' >>$tmpfile echo 'Check System Load & Processes ' >>$tmpfile echo '**************************************' >>$tmpfile
A pokračujte v kontrolách a napíšte výsledky do dočasného súboru. Pridajte alebo odstráňte položky tu, kde je to vhodné pre vašu situáciu:
# Get more variables from system: httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l`
# Show current load level: echo 'Load Level Is: $loadLevel' >>$tmpfile echo '*************************************************' >>$tmpfile
# Show number of httpd processes now running (not including children): echo 'Number of httpd processes now: $httpdProcesses' >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show process list: echo 'Processes now running:' >>$tmpfile ps f -ef >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show current MySQL info: echo 'Results from mytop:' >>$tmpfile /usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
Všimnite si príkaz top, píšeme dva dočasné súbory. Jeden je pre oveľa menšie správy na mobilný telefón. Ak nechcete naliehavosť upozornení na mobilných telefónoch o tri ráno, môžete to vyriešiť (a vyradiť druhú poštovú rutinu neskôr v skripte).
# Show current top: echo 'top now shows:' >>$tmpfile echo 'top now shows:' >>$topfile /usr/bin/top -b -n1 >>$tmpfile /usr/bin/top -b -n1 >>$topfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
Ďalšie kontroly:
# Show current connections: echo 'netstat now shows:' >>$tmpfile /bin/netstat -p >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Check disk space echo 'disk space:' >>$tmpfile /bin/df -k >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
Potom napíšte dočasný obsah súboru do stálejšieho denníka a výsledky získate e-mailom príslušným stranám. Druhá zásielka je výsledkom porovnateľných výsledkov, ktoré pozostávajú jednoducho zo štandardu "top":
# Send results to log file: /bin/cat $tmpfile >>$logfile
# And email results to sysadmin: /usr/bin/mutt -s '$machine has a high load level! - $dt' -a $mysqlLog -a $msgLog $mailstop <$tmpfile /usr/bin/mutt -s '$machine has a high load level! - $dt' $mailstop1 <$topfile echo '**************************************' >>$logfile
A potom nejaká domácnosť a výstup:
# And then remove the temp file: rm $tmpfile rm $topfile fi
# exit 0
Dúfajme, že to niekomu pomôže. Plne zostavený skript je:
#!/bin/bash # # Script to check system load average levels to try to determine what processes are # taking it overly high… # # set environment dt=`date +%d%b%Y-%X` # Obviously, change the following directories to where your log files actually are kept tmpfile='/tmp/checkSystemLoad.tmp' logfile='/tmp/checkSystemLoad.log' msgLog='/var/log/messages' mysqlLog='/var/log/mysqld.log' # the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report) mailstop='[email protected]' mailstop1='[email protected]' machine=`hostname` # The following three are for mytop use - use a db user that has decent rights dbusr='username' dbpw='password' db='yourdatabasename' # The following is the load level to check on - 10 is really high, so you might want to lower it. levelToCheck=10 # Set variables from system: loadLevel=`cat /proc/loadavg | awk '{print $1}'` loadLevel=$( printf '%0.f' $loadLevel )
# if the load level is greater than you want, start the script process. Otherwise, exit 0
if [ $loadLevel -gt $levelToCheck ]; then echo '' > $tmpfile echo '**************************************' >>$tmpfile echo 'Date: $dt ' >>$tmpfile echo 'Check System Load & Processes ' >>$tmpfile echo '**************************************' >>$tmpfile
# Get more variables from system: httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l`
# Show current load level: echo 'Load Level Is: $loadLevel' >>$tmpfile echo '*************************************************' >>$tmpfile
# Show number of httpd processes now running (not including children): echo 'Number of httpd processes now: $httpdProcesses' >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show process list: echo 'Processes now running:' >>$tmpfile ps f -ef >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show current MySQL info: echo 'Results from mytop:' >>$tmpfile /usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show current top: echo 'top now shows:' >>$tmpfile echo 'top now shows:' >>$topfile /usr/bin/top -b -n1 >>$tmpfile /usr/bin/top -b -n1 >>$topfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Show current connections: echo 'netstat now shows:' >>$tmpfile /bin/netstat -p >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Check disk space echo 'disk space:' >>$tmpfile /bin/df -k >>$tmpfile echo '*************************************************' >>$tmpfile echo '' >>$tmpfile
# Send results to log file: /bin/cat $tmpfile >>$logfile
# And email results to sysadmin: /usr/bin/mutt -s '$machine has a high load level! - $dt' -a $mysqlLog -a $msgLog $mailstop <$tmpfile /usr/bin/mutt -s '$machine has a high load level! - $dt' $mailstop1 <$topfile echo '**************************************' >>$logfile
# And then remove the temp file: rm $tmpfile rm $topfile fi
# exit 0