Hosts fails to activate with error "Host rhevhost.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Virtualization 3.5.8

  • Red Hat Enterprise Virtualization Hypervisor release 6.7 (20160219.0.el6ev)
    vdsm-4.16.35-2.el6ev.x86_64
    glusterfs-3.7.1-16.el6.x86_64

Issue

  • RHEV-H host was rebooted and fails to activate with the following error:
Hosts fails to activate with error "Host rhevhost.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational.

Resolution

  • The gluster nodes are not Red Hat Storage. Customer will need work with their gluster support to correct the problem.

Root Cause

  • Split brain situation on the glusterfs node.

Diagnostic Steps

  • [Errno 5] Input/output error is logged in the host VDSM log file when host activation fails.
Thread-13::DEBUG::2016-07-14 11:50:04,424::resourceManager::652::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.HsmDomainMonitorLock', Clearing records.
Thread-13::ERROR::2016-07-14 11:50:04,424::task::866::Storage.TaskManager.Task::(_setError) Task=`93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 1039, in connectStoragePool
  File "/usr/share/vdsm/storage/hsm.py", line 1104, in _connectStoragePool
  File "/usr/share/vdsm/storage/sp.py", line 637, in connect
  File "/usr/share/vdsm/storage/sp.py", line 1179, in __rebuild
  File "/usr/share/vdsm/storage/sp.py", line 1387, in setMasterDomain
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/glusterSD.py", line 32, in findDomain
  File "/usr/share/vdsm/storage/fileSD.py", line 160, in __init__
  File "/usr/share/vdsm/storage/fileSD.py", line 89, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
  File "/usr/lib/python2.6/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.6/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 5] Input/output error
Thread-13::DEBUG::2016-07-14 11:50:04,425::task::885::Storage.TaskManager.Task::(_run) Task=`93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43`::Task._run: 93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43 ('00000002-0002-0002-0002-00000000028c', 2, '6bdc67d1-4ae5-47e3-86c3-ef0916996862', 2240, {'6202bacd-28a9-4632-a28a-dd87647c353c': 'active', '124f80f5-923d-44df-9d27-93e924a706cd': 'active', '6bdc67d1-4ae5-47e3-86c3-ef0916996862': 'active', 'c3d93680-eeaf-4c91-a86e-0aa803c38495': 'active'}) {} failed - stopping task
  • The Traceback in the vdsm log seems to indicate that the validation of file path is failing. This is done by touch command to the DIRECT_IO_TEST.
fileSD.py

 80 def validateFileSystemFeatures(sdUUID, mountDir):
 81     try:
 82         # Don't unlink this file, we don't have the cluster lock yet as it
 83         # requires direct IO which is what we are trying to test for. This
 84         # means that unlinking the file might cause a race. Since we don't
 85         # care what the content of the file is, just that we managed to
 86         # open it O_DIRECT.
 87         testFilePath = os.path.join(mountDir, "__DIRECT_IO_TEST__")
 88         oop.getProcessPool(sdUUID).directTouch(testFilePath)
 89     except OSError as e:            
 90         if e.errno == errno.EINVAL:
 91             log = logging.getLogger("Storage.fileSD")
 92             log.error("Underlying file system doesn't support"
 93                       "direct IO")
 94             raise se.StorageDomainTargetUnsupported()
 95 
 96         raise
outOfProcess.py

349 def directTouch(ioproc, path, mode=0o777):
350     flags = os.O_CREAT | os.O_DIRECT
351     ioproc.touch(path, flags, mode)
  • When checking the /var/log/glusterfs log, split-brain and SETATTR to DIRECT_IO_TEST errors are logged.
[2016-07-14 11:33:02.598174] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-gluster_rhevtest_data_01-replicate-0: Failing SETATTR on gfid 3144136d-935c-4241-be44-0f120236a7c1: split-brain observed. [Input/output error]
[2016-07-14 11:33:02.598234] W [fuse-bridge.c:1080:fuse_setattr_cbk] 0-glusterfs-fuse: 23: SETATTR() /__DIRECT_IO_TEST__ => -1 (Input/output error)
[2016-07-14 11:35:01.996149] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-gluster_rhevtest_data_01-replicate-0: Failing SETATTR on gfid 3144136d-935c-4241-be44-0f120236a7c1: split-brain observed. [Input/output error]
[2016-07-14 11:35:01.996213] W [fuse-bridge.c:1080:fuse_setattr_cbk] 0-glusterfs-fuse: 63: SETATTR() /__DIRECT_IO_TEST__ => -1 (Input/output error)

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.