RHEL 5.8 63bit, cpu 100%

Latest response

Hi All,

 

I am finding that the my Server is running at 100% but i can not find any processes that is consuming more that 1% of the CPU. Moreover, iotop command show me that there are 2 processes works hard on the hard disk, they are kjournald e pdflush. Due to this, the server is unable to work properly.

 

is there a way to fix this issue? Disable pdflush can be a good idea?

 

Let me know ;-)

 

Bye

Responses

When a process writes data to storage, unless it specifically tells the kernel to sync it and provide confirmation that its been written, the kernel will simply mark the page in memory as "dirty" and allow the process to continue with what it was doing.  The kernel is now free to flush the page out to disk whenever it sees fit, preventing the process from having to wait while the storage commits it to disk. 

 

pdflush is the part of the kernel that is responsible for flushing dirty pages in memory out to disk when certain controllable criteria are met.   The following sysctl settings control this criteria (these descriptions come from the kernel-doc package's /usr/share/doc/kernel-doc-$VERS/Documentation/filesystems/proc.txt):

 

 

dirty_writeback_centisecs
-------------------------
The pdflush writeback daemons will periodically wake up and write `old' data out to disk.  This tunable expresses the interval between those wakeups, in 100'ths of a second.

Setting this to zero disables periodic writeback altogether.

 

dirty_expire_centisecs
----------------------
This tunable is used to define when dirty data is old enough to be eligible for writeout by the pdflush daemons.  It is expressed in 100'ths of a second. Data which has been dirty in-memory for longer than this interval will be
written out next time a pdflush daemon wakes up.

 

dirty_background_ratio
----------------------
Contains, as a percentage of total system memory, the number of pages at which the pdflush background writeback daemon will start writing out dirty data.

 

dirty_ratio
-----------------
Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

 

So in other words, pdflush will wake up at a predetermined interval, and if any pages are eligible to be written out, it will flush them.  If the system reaches the dirty_background_ratio percentage of total memory as dirty, pdflush will begin writing out data in the background (what you are seeing).   If it reaches dirty_ratio, pdflush will run in-line with processes essentially causing their writes to be synchronous. 

 

If you're seeing pdflush active for long periods of time, or you have a large amount of memory, its often beneficial to drop some of these settings (except maybe dirty_ratio) in order to cause it to write out pages earlier, rather than wait until a huge amount has backed up and now has to spend all of its time catching up.  For instance, dirty_background_ratio defaults to 10 (%), and so on a system with 256Gb of memory, pdflush wouldn't start writing out until 25Gb were in use.  As you can imagine, 25Gb is a lot of data to write, and can take some time and eat a lot of CPU in the process.  You can also decrease dirty_writeback_centisecs to cause pdflush to wake up more frequently, and dirty_expire_centisecs to shorten the amount of time it takes for a dirty page to be eligible for writeback.  dirty_ratio defaults to 40 (%), so often I will leave it alone.  By decreasing it, you may increase the likelihood that regular I/O from your processes will be negatively impacted. 

 

The best way to find the optimal settings is to test.  I usually recommend halving the relevant settings and seeing how it does in a production-like test.  You can tweak further as needed.

Let me know if you have any questions.

 

Regards,

John Ruemker, RHCA

Red Hat Software Maintenance Engineer

Online User Groups Moderator