Health check script Linux

Bellow I will present script checking Load Average on a Linux server which will send report if it becomes to high.
It is using Linux command ‘uptime’ which is pulling out server’s uptime as well as its Load Average:

In my script bellow I am using not the last minute load average, but the last five minutes (the second) one.
If you are not aware you can interpret a load average of “1.63, 0.70, 7.89” on a single-CPU system as:

– during the last minute, the system was overloaded by 63% on average (1.63 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).

– during the last 5 minutes, the CPU was idling 30% of the time on average.

– during the last 15 minutes, the system was overloaded 698% on average (7.89 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).

I have chosen the five minute interval as sending mails every minute is too aggressive in case of a server load. Also it could be something too short and handled by the server without notification.

* Note that some settings may need tuning because of changed/different command output

Once the script is ready you can set it as Cron. job Mine is set to check every 5th minute:

Tags

Filed Under: Bash ScriptingLinuxScriptingSystem

Anthony Gee About the Author: Anthony G. is an IT specialist with more than 9 years of solid working experience in the Web Hosting industry. Currently works as server support administrator, involved in consultative discussions about Web Hosting and server administration. One of the first writers in the Onlinehowto.net website, now writing for Free Tutorials community - he is publishing tutorials and articles for the wide public, as well as specific technical solutions.

Comments (5)

  1. Anthony Gee Maori says:

    Nice script for reporting issues really, but I think there is one general mistake:

    Uptime for two days:
    17:05:04 up 2 days, 2:16, 2 users, load average: 0.00, 0.04, 0.03

    Uptime for few hours:

    17:05:04 up 2 days, 2:16, 2 users, load average: 0.00, 0.04, 0.03

    Uptime for few hours:

    09:38:17 up 2:03, 2 users, load average: 0.55, 0.83, 0.73

    I suggest you will use delimiter to get the right uptime.

    Otherwise it is running already on my server :) and I will check back once I receive update on the tutorial.

    • Anthony Gee Anthony Gee says:

      Thanks Maori you are absolutely right , I have updated the script with:

      uptime | awk -F’load average:’ ‘{print $2}’|awk ‘{gsub(“,”,””); print $2}’

      Just I was creating this on a server with more than an year uptime.

  2. Jason says:

    I’m on the hunt for something that does the exact opposite, any way you might be able to help me or point me in the direction of a script that tells me if there is no activity, for example I have a server with a Wowza script only and every so often it spirals out of control and dies, going to all “0” load average. Heck, maybe even one that is zero or too high? (a guy can dream) I’ve tried external site monitoring but it never catches it for some reason.

    Was hoping for something just like your script to let me know of zero or no load and alert via email. Thanks for reading and for the work on what you have now.

  3. Jason says:

    I’m on the hunt for something that does the exact opposite, any way you might be able to help me or point me in the direction of a script that tells me if there is no activity, for example I have a server with a Wowza script only and every so often it spirals out of control and dies, going to all “0” load average. Heck, maybe even one that is zero or too high? (a guy can dream) I’ve tried external site monitoring but it never catches it for some reason.

    Was hoping for something just like your script to let me know of zero or no load and alert via email. Thanks for reading and for the work on what you have now.

    • Anthony Gee Anthony Gee says:

      Hi Jason,

      I think it is really easy just to modify this lines:
      […]
      max_loadavge=3
      […]
      if [[ “$loadavg” > “$max_loadavge” ]]
      […]

      To something like:

      […]
      min_loadavge=3
      […]
      if [[ “$loadavg” < "$min_loadavge" ]] [...] Then the script will send you message when the Load Average is lower than the one set for 'min_loadavge'

Leave a Reply