GFS2 cluster very low read performance

Posted on

Hello,

In one of my customers we are having problems with backing up several GFS2 data shares. We have 2 cluster nodes of RHEL 6.7 configured in active-active cluster with 7 GFS2 mountpoints. Mountpoints are currently looking like below:

/dev/mapper/1                     384G  291G   94G  76% /data1
/dev/mapper/2                    384G  280G  105G  73% /data2
/dev/mapper/3                    384G  265G  120G  69% /data3
/dev/mapper/4                     384G  298G   87G  78% /data4
/dev/mapper/5                    768G  549G  220G  72% /data5
/dev/mapper/6                    1.2T  1.2T   26G  98% /data6
/dev/mapper/7                     100G   92G  8.9G  92% /data7

Mounted like this

/dev/mapper/1 on /data1 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/2 on /data2 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/3 on /data3 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/4 on /data4 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/5 on /data5 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/6 on /data6 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)
/dev/mapper/7 on /data7 type gfs2 (rw,noatime,nodiratime,hostdata=jid=1)

Backup system is Networker and backup is taken from both nodes (at different times). Unfortunatelly lately backup keeps on failing, as files are being copied at very low speeds or stuck completely at 0 bytes transfer. We've tried to remout GFS2 systems and check if that help, but it's same, we've tried to drop caches (echo -n 3 >/proc/sys/vm/drop_caches) but didn't help either.

For some reason iotop after remounting showed improvement but after few seconds all disk reads were by "glock_workqueue" process:

TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
 5014 be/4 root      209.29 K/s    0.00 B/s  0.00 % 31.60 % [glock_workqueue]
 5012 be/4 root      205.48 K/s    0.00 B/s  0.00 % 29.61 % [glock_workqueue]
 5013 be/4 root      144.60 K/s    0.00 B/s  0.00 % 25.92 % [glock_workqueue]
 5011 be/4 root       49.47 K/s    0.00 B/s  0.00 %  7.28 % [glock_workqueue]
32677 be/4 root        7.61 K/s    0.00 B/s  0.00 %  3.86 % save -a (...) /data5 /data5
 9603 be/4 root        7.61 K/s    0.00 B/s  0.00 %  1.21 % save -a (...) /data6 /data6
 1474 be/3 root        0.00 B/s   22.83 K/s  0.00 %  0.40 % [jbd2/dm-7-8]

I've checked if writing is also degraded but doing dd 1GB file on one of mountpoints went fine with 25MB/s transfer.

Is it possible to solve that issue or the only solution is to implement cluster aware backup software?

Responses