Getting SCSI "task abort errors" on RHEL VMware guests

Environment

Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 6
VMware guest

Issue

Getting SCSI task abort errors on host X on RHEL VMware guests, and observing blocked tasks:

[24042.146557] sd 2:0:0:0: [sda] task abort on host 2, ffff881323e7fbc0
[24066.461916] sd 2:0:0:0: [sda] task abort on host 2, ffff881323e7fcc0
[24066.462232] sd 2:0:0:0: [sda] task abort on host 2, ffff881321f85c80
[24066.465951] INFO: task xfsaild/dm-0:655 blocked for more than 120 seconds.
[24066.466279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[24066.466463] xfsaild/dm-0    D ffff956d2b82b760     0   655      2 0x00000000
[24066.466467] Call Trace:
[24066.466481]  [<ffffffffa05b7e89>] schedule+0x29/0x70
[24066.466593]  [<ffffffffc042c687>] xfs_log_force+0x157/0x2e0 [xfs]
[24066.466599]  [<ffffffff9fee1190>] ? wake_up_state+0x20/0x20
[24066.466620]  [<ffffffffc0438e70>] ? xfs_trans_ail_cursor_first+0xa0/0xa0 [xfs]
[24066.466635]  [<ffffffffc0439000>] xfsaild+0x190/0x780 [xfs]
[24066.466653]  [<ffffffffc0438e70>] ? xfs_trans_ail_cursor_first+0xa0/0xa0 [xfs]
[24066.466657]  [<ffffffff9fecb641>] kthread+0xd1/0xe0
[24066.466659]  [<ffffffff9fecb570>] ? insert_kthread_work+0x40/0x40
[24066.466663]  [<ffffffffa05c51dd>] ret_from_fork_nospec_begin+0x7/0x21

Resolution

SCSI task abort messages are triggered by the Linux kernel's SCSI error-handling mechanism.
This occurs when an I/O request is sent to the storage stack but does not receive a response within a specific timeframe. In a virtualized environment, this is typically caused by an I/O timeout.
Specifically, the RHEL guest OS expects a completion signal from the VMware hypervisor for every operation. If the hypervisor, the underlying physical SAN, or the storage array experiences high latency and fails to complete the I/O before the guest's internal timer expires, the guest invokes an abort to attempt to reset the command and recover the path.

Root Cause

To troubleshoot and mitigate these errors, you can increase the I/O timeout period within the guest OS. This gives the VMware hypervisor and the underlying hardware more time to process requests during periods of high congestion without triggering the error handler.
1. Check the current timeout value (in seconds):
```
# cat /sys/block/sda/device/timeout
```
1. Increase the timeout value: It is recommended to double the current value to see if the abort errors subside. For example, if the current value is 30, increase it to 60:
```
# echo 60 > /sys/block/sda/device/timeout
```
1. Verify the change: By increasing this value, the guest will wait longer for the hypervisor to complete the I/O. If the task abort errors stop occurring after this change, it confirms that the root cause is transient storage latency exceeding the original guest thresholds.
Note: Changes made via echo to the /sys filesystem are not persistent and will revert after a reboot.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Getting SCSI "task abort errors" on RHEL VMware guests

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links