Why does system hang with "Softlockup at RIP do_gettimeofday" in Red Hat Enterprise Linux 5 virtual guest?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.x ( here after RHEL5.x)
  • Kernel-2.6.18.194.17.1

Issue

  • Using kvm-clock, there are so many softlockup events.
Feb 24 13:53:31 kernel:  [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
Feb 24 13:53:31 hostname kernel:  [<ffffffff8006cb8a>] do_softirq+0x2c/0x85
Feb 24 13:53:31 hostname kernel:  [<ffffffff8006b342>] default_idle+0x0/0x50
Feb 24 13:53:31 hostname kernel:  [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
Feb 24 13:53:31 hostname kernel:  <EOI>  [<ffffffff8006b36b>] default_idle+0x29/0x50
Feb 24 13:53:31 hostname kernel:  [<ffffffff8004923a>] cpu_idle+0x95/0xb8
Feb 24 13:53:31 hostname kernel:  [<ffffffff8007796b>] start_secondary+0x498/0x4a7
Feb 24 13:53:31 hostname kernel: 
Feb 24 13:53:31 hostname kernel: BUG: soft lockup - CPU#6 stuck for 13s! [java:4852]
Feb 24 13:53:31 hostname kernel: CPU 6:
Feb 24 13:53:31 hostname kernel: Modules linked in: snapapi26(PU) nfs fscache nfs_acl lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport floppy joydev i2c_piix4 ide_cd i2c_core cdrom virtio_net serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Feb 24 13:53:31 hostname kernel: Pid: 4852, comm: java Tainted: P      2.6.18-194.17.1.el5 #1
Feb 24 13:53:31 hostname kernel: RIP: 0010:[<ffffffff8006e1c3>]  [<ffffffff8006e1c3>] do_gettimeofday+0x2c/0x90
Feb 24 13:53:31 hostname kernel: RSP: 0018:ffff810112967f28  EFLAGS: 00000246
Feb 24 13:53:31 hostname kernel: RAX: 0000000000000001 RBX: ffff810112967f58 RCX: 00000101a0f97fdd
Feb 24 13:53:31 hostname kernel: RDX: 00259c18fdadabe5 RSI: 003dac289d5d008f RDI: ffff810009035b00
Feb 24 13:53:31 hostname kernel: RBP: 0000000000000200 R08: 00000000f5550c88 R09: 0000000000000018
Feb 24 13:53:31 hostname kernel: R10: 00259d1a76bbc59d R11: 0000000000000200 R12: ffff810112967f58
Feb 24 13:53:31 hostname kernel: R13: ffff81010fa4a840 R14: ffff81010fa4a840 R15: 0000000000000006
Feb 24 13:53:31 hostname kernel: FS:  0000000048a03940(0063) GS:ffff81023fc54bc0(0000) knlGS:0000000000000000
Feb 24 13:53:31 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 24 13:53:31 hostname kernel: CR2: 000000001e0f32b0 CR3: 000000021571c000 CR4: 00000000000006e0
Feb 24 13:53:31 hostname kernel: 
Feb 24 13:53:31 hostname kernel: Call Trace:
Feb 24 13:53:31 hostname kernel:  [<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90
Feb 24 13:53:31 hostname kernel:  [<ffffffff8003a8a2>] sys_gettimeofday+0x1b/0x62
Feb 24 13:53:31 hostname kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
Feb 24 13:53:31 hostname kernel: 

Resolution

  • This was due to a bug which has been fixed in errata RHSA-2011-0017.
  • Upgrade the system to kernel-2.6.18-238.el5 or newer version to fix the issue.

Root Cause

  • Sometimes guests running on the Red Hat Enterprise Virtualization Hypervisor experience inconsistent time, such as the clock drifting backwards. This causes some applications to become unresponsive.

  • For more details refer to Bug 570824

Diagnostic Steps

  • Identify if the kvm-clock is being used
[root@rhel59 log]# cat dmesg |grep kvm
kvm-clock: cpu 0, msr 7eff:8043c401, boot clock
kvm-clock: cpu 0, msr 0:9003401, primary cpu clock
kvm-clock: cpu 1, msr 0:900ba81, secondary cpu clock
kvm-clock: cpu 2, msr 0:9014101, secondary cpu clock
kvm-clock: cpu 3, msr 0:901c781, secondary cpu clock
kvm-clock: cpu 4, msr 0:9024e01, secondary cpu clock
kvm-clock: cpu 5, msr 0:902d481, secondary cpu clock
kvm-clock: cpu 6, msr 0:9035b01, secondary cpu clock
kvm-clock: cpu 7, msr 0:903e181, secondary cpu clock
[root@rhel59 log]# cat dmesg |grep time.c
time.c: Using tsc for timekeeping HZ 1000
time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer.
time.c: Detected 2660.076 MHz processor.
  • Component
  • kvm

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments