Hosts fails to activate with error "Host rhevhost.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational.
Environment
-
Red Hat Enterprise Virtualization 3.5.8
-
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20160219.0.el6ev)
vdsm-4.16.35-2.el6ev.x86_64
glusterfs-3.7.1-16.el6.x86_64
Issue
- RHEV-H host was rebooted and fails to activate with the following error:
Hosts fails to activate with error "Host rhevhost.example.com cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational.
Resolution
- The gluster nodes are not Red Hat Storage. Customer will need work with their gluster support to correct the problem.
Root Cause
- Split brain situation on the glusterfs node.
Diagnostic Steps
- [Errno 5] Input/output error is logged in the host VDSM log file when host activation fails.
Thread-13::DEBUG::2016-07-14 11:50:04,424::resourceManager::652::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.HsmDomainMonitorLock', Clearing records.
Thread-13::ERROR::2016-07-14 11:50:04,424::task::866::Storage.TaskManager.Task::(_setError) Task=`93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
File "/usr/share/vdsm/storage/hsm.py", line 1039, in connectStoragePool
File "/usr/share/vdsm/storage/hsm.py", line 1104, in _connectStoragePool
File "/usr/share/vdsm/storage/sp.py", line 637, in connect
File "/usr/share/vdsm/storage/sp.py", line 1179, in __rebuild
File "/usr/share/vdsm/storage/sp.py", line 1387, in setMasterDomain
File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
File "/usr/share/vdsm/storage/glusterSD.py", line 32, in findDomain
File "/usr/share/vdsm/storage/fileSD.py", line 160, in __init__
File "/usr/share/vdsm/storage/fileSD.py", line 89, in validateFileSystemFeatures
File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
File "/usr/lib/python2.6/site-packages/ioprocess/__init__.py", line 507, in touch
File "/usr/lib/python2.6/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 5] Input/output error
Thread-13::DEBUG::2016-07-14 11:50:04,425::task::885::Storage.TaskManager.Task::(_run) Task=`93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43`::Task._run: 93ebcc27-9ebb-494a-bd5a-8c4f3a28ab43 ('00000002-0002-0002-0002-00000000028c', 2, '6bdc67d1-4ae5-47e3-86c3-ef0916996862', 2240, {'6202bacd-28a9-4632-a28a-dd87647c353c': 'active', '124f80f5-923d-44df-9d27-93e924a706cd': 'active', '6bdc67d1-4ae5-47e3-86c3-ef0916996862': 'active', 'c3d93680-eeaf-4c91-a86e-0aa803c38495': 'active'}) {} failed - stopping task
- The Traceback in the vdsm log seems to indicate that the validation of file path is failing. This is done by touch command to the DIRECT_IO_TEST.
fileSD.py
80 def validateFileSystemFeatures(sdUUID, mountDir):
81 try:
82 # Don't unlink this file, we don't have the cluster lock yet as it
83 # requires direct IO which is what we are trying to test for. This
84 # means that unlinking the file might cause a race. Since we don't
85 # care what the content of the file is, just that we managed to
86 # open it O_DIRECT.
87 testFilePath = os.path.join(mountDir, "__DIRECT_IO_TEST__")
88 oop.getProcessPool(sdUUID).directTouch(testFilePath)
89 except OSError as e:
90 if e.errno == errno.EINVAL:
91 log = logging.getLogger("Storage.fileSD")
92 log.error("Underlying file system doesn't support"
93 "direct IO")
94 raise se.StorageDomainTargetUnsupported()
95
96 raise
outOfProcess.py
349 def directTouch(ioproc, path, mode=0o777):
350 flags = os.O_CREAT | os.O_DIRECT
351 ioproc.touch(path, flags, mode)
- When checking the /var/log/glusterfs log, split-brain and SETATTR to DIRECT_IO_TEST errors are logged.
[2016-07-14 11:33:02.598174] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-gluster_rhevtest_data_01-replicate-0: Failing SETATTR on gfid 3144136d-935c-4241-be44-0f120236a7c1: split-brain observed. [Input/output error]
[2016-07-14 11:33:02.598234] W [fuse-bridge.c:1080:fuse_setattr_cbk] 0-glusterfs-fuse: 23: SETATTR() /__DIRECT_IO_TEST__ => -1 (Input/output error)
[2016-07-14 11:35:01.996149] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-gluster_rhevtest_data_01-replicate-0: Failing SETATTR on gfid 3144136d-935c-4241-be44-0f120236a7c1: split-brain observed. [Input/output error]
[2016-07-14 11:35:01.996213] W [fuse-bridge.c:1080:fuse_setattr_cbk] 0-glusterfs-fuse: 63: SETATTR() /__DIRECT_IO_TEST__ => -1 (Input/output error)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
