CPU and Memory issue

Latest response

What is the best way to capture the CPU and RAM utilization which I suspect to be cause for the reboot of the server or job failures..

For Eg: assume a scenario
An application owner says my jobs failed yesterday at this time..

now how can I get the CPU & RAM usage for at that particular time.. SAR & vmstat is not convincing me

Responses

Hi Arulanandam Sakthivel,

if you dont want to use SAR & vmstat, then the easiest way would be to create a script which collects your needed values and writing them to an outputfile with timestamp.

Check out the following script, which writes the timestamp, memoryusage and load (last minute) to the file /var/log/metrics.log.

Not my best script, but functional ;-)

#!/bin/bash
#Logfile
LOG=/var/log/metrics.log
touch $LOG

#collect all metrics and write them to file
function collect_and_write {
  get_memperc
  get_loadlastmin
  TS=$(date '+%Y-%m-%d_%H:%M:%S')
  /bin/printf "%s %s %s %s %s %s\n" $TS $MEMPERC $AVGLOAD
}

#Calc memory usage (total - free, cached & buffers)
function get_memperc {
  RES=$(cat /proc/meminfo | tr -s ' ')
  #Total MEMORY(physisch)
  MTOTAL=`/bin/echo "$RES%x" | /bin/grep "^MemTotal:" | /bin/cut -d ' ' -f 2`
  MFREE=`/bin/echo "$RES%x" | /bin/grep "^MemFree:" | /bin/cut -d ' ' -f 2`
  FCACHE=`/bin/echo "$RES%x" | /bin/grep "^Cached:" | /bin/cut -d ' ' -f 2`
  BUFFER=`/bin/echo "$RES%x" | /bin/grep "^Buffers:" | /bin/cut -d ' ' -f 2`
  USED=`/bin/echo "$MTOTAL-($MFREE+$FCACHE+$BUFFER)" | /bin/bc -l`
  MEMPERC=`/bin/echo "$USED/$MTOTAL*100" | /bin/bc -l| /bin/cut -d '.' -f 1`
}


function get_loadlastmin {
  AVGLOAD=`/bin/cat /proc/loadavg | /bin/cut -d ' ' -f 1`
}

collect_and_write | tee -a $LOG