Linux drive letter drift problem
Hi,when the system is restarted, the storage lun drive letter is /dev/sda
, and after the restart, it becomes /dev/sdc
or other.
We tried to use the Udev strategy to create the 58-storage.rule
rule to solve this problem, but when the system restarts, it prompts that the udev version does not support renaming kernel device nodes, such as "NAME=/sda%n ignored, kernel device nodes can not be renamed; please fix it in /etc/udev/rules.d/10-test.rules:2"
. We want to fix the drive letter under /dev/sda
, so how do we solve this problem?
Among them, the linux version number is Linux nas 3.10.0-693.el7.x86_64
, and the query result of using the rpm -qa | grep udev
command is libgudev1-219-42.el7_4.1.x86_64
, python-pyudev-0.15- 9.el7.noarch
, and the rules information is as follows:
[root@RHEL7X rules.d]# cat 58-storage.rule
SUBSYSTEMS=="scsi",SUBSYSTEM=="block",KERNELS=="2:0:0:1",NAME="/sda%n"
Thank you advance for sharing your experience here.
Responses
sdX
SCSI device names are not expected to be consistent between reboots, or even across add/remove of devices during the one uptime.
You should address the device via some other means, such as by its UUID which you can find with the blkid
command.
If you can say where you are intending to use the letter like /dev/sda
then we may be able to give some advice.
Here is an example usage in /etc/fstab
:
$ sudo blkid | grep sda1
/dev/sda1: UUID="268a25bf-db56-4fc9-edbd-458fc9c9ab61" TYPE="ext4" PARTUUID="51f8e6a4-01"
$ grep 268a25bf /etc/fstab
UUID=268a25bf-db56-4fc9-edbd-458fc9c9ab61 /boot ext4 defaults,noatime 0 2
Why specifically do you need /dev/sda
? What application or user experience is changed by this SCSI device letter being different?
If you have some application or script which depends on /dev/sdX
being consistent between boots, that is the incorrect way to address underlying storage devices.
Applications and users should not care what SCSI device letter a particular storage gets, because those sdX
letters can change and are expected to change.
Instead, the storage should be addressed by some unique and permanent property, such as the LUN WWID or filesystem UUID.
hi Shui,
Using SAN via a single path approach?
I would advise you to reconsider the architecture, I would setup device-mapper-multipath.
In the configuration file you can setup specific names.
using classic scsi device names I would only for a system with internal disks.
I do not understand your remark about user experience.
An user should use the mountpoints, not the partitions, except if they write raw data (which I expect only from an Oracle DBA or a scientist using a near realtime data flow).
Regards,
Jan Gerrit Kootstra
Yes, could be possible. First gather the LUN/Disk specific unique details by using the command 'udevadm info -a -n /dev/sdX' and then create a rule by adding LUN/Disk specific attributes in your custom file under /etc/udev/rules.d/ directory. May be a SYMLINK attribute would get you create a custom/unique device symbolic link and which can be used later as a reference. Please refer this KB for more details https://access.redhat.com/solutions/1135513
We started seeing this on RHEL9. Sometimes (< 10% chance?) the kernel gives the first-found disk the sdb
assignment.
Journal reports something like "sd 0:0:0:0: [sdb] Attached SCSI disk"
The inconsistency in drive-letter names is an issue when you kickstart a system that has been operating and you would like to keep data on sdb yet have a "fresh" os install:
# Partition clearing information
clearpart --drives=sda --all
# Only these disks
ignoredisk --only-use=sda
# Disk partitioning information
part /boot --fstype="xfs" --ondisk=sda --size=1024
part pv.01 --fstype="lvmpv" --size=10000 --ondisk=sda --grow
volgroup vg_sys pv.01
logvol / --fstype="xfs" --size=4096 --name=lv_root --vgname=vg_sys
logvol /tmp --fstype="xfs" --size=512 --name=lv_tmp --vgname=vg_sys
..etc..
Also redhat.rhel_system_roles.storage
require sdX disknames for partitioning.
I confirm this exact same scenario (kickstart a server, need to be certain that /dev/sda will be the OS drive, to protect data drive /dev/sdb) is happening for us too.
To state the obvious, an installed system will have many proper unique and meaningful identifiers for each disk that are easy to find, and thus easy to use for any application which might require them. However it is not practical to sort out these identifiers (by WWN, or by path on the PCI bus) to discover the right one to use during the kickstart process for building/rebuilding a server, where a reusable approach is preferred (one kickstart file for a whole plant of servers, making it non desirable to hardcode a disk model, a controller model or a PCI bus ID).
This is why people have relied and still rely on being able to refer to the "first disk" as /dev/sda, so as to use it as the boot disk, and then retain this assignment for operational procedures, for the same reason of domain isolation (OS and data).
To give further details for what we have observed in our case, we can even say that the problem happens at the level of the sdX device detection, and that the processes of PCI enumeration, SCSI host adapter enumeration, and SCSI generic devices (/dev/sgX) are consistent, as can be verified via lsscsi -g.
Most of the time we get:
sd 0:3:111:0: [sda] Attached SCSI disk
sd 1:3:111:0: [sdb] Attached SCSI disk
But some times we also see :
sd 1:3:111:0: [sda] Attached SCSI disk
sd 0:3:111:0: [sdb] Attached SCSI disk
The problem basically stems from the change highlighted in https://lore.kernel.org/lkml/59eedd28-25d4-7899-7c3c-89fe7fdd4b43@acm.org/t/
Based on feedback from other distributions upon receiving this change, it seems most advantages are reaped on desktop environments, but as is obvious from this thread, at the cost of server environments, and that the response of many administrators has been to simply roll out their custom kernels for the server world since there is no other practical way to revert to the previous stable behavior.
It seems this problem was given attention in the mainline kernel, as documentation (https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt) now reflects the following options :
<module>.async_probe[=<bool>] [KNL]
module.async_probe=<bool>
(Also see commit https://github.com/torvalds/linux/commit/ae39e9ed964f8e450d0de410b5a757e19581dfc5)
However, only the option is only currently available in limited form, to force asynchronous polling on a module :
<module>.async_probe [KNL]
It is currently possible to force a driver to probe its disk drives with a headstart (obviously subject to a race condition by nature, but based on observation, the probing following driver load is 10x faster than automatic normal loading of additional drivers), and this provides relief for a scenario where the drive we want seen as first is on a controller requiring "driver1", while other disks would be on controllers requiring "driver2".
Backporting the mainline option to deactivate asynchronous probing for a given driver would greatly help, as it could enlarge the scenarios supported, making it possible to handle :
one controller having multiple drives
multiple controllers of the same make having multiple drives