Linux drive letter drift problem

Latest response

Hi,when the system is restarted, the storage lun drive letter is /dev/sda, and after the restart, it becomes /dev/sdc or other.

We tried to use the Udev strategy to create the 58-storage.rule rule to solve this problem, but when the system restarts, it prompts that the udev version does not support renaming kernel device nodes, such as "NAME=/sda%n ignored, kernel device nodes can not be renamed; please fix it in /etc/udev/rules.d/10-test.rules:2". We want to fix the drive letter under /dev/sda, so how do we solve this problem?

Among them, the linux version number is Linux nas 3.10.0-693.el7.x86_64, and the query result of using the rpm -qa | grep udev command is libgudev1-219-42.el7_4.1.x86_64, python-pyudev-0.15- 9.el7.noarch, and the rules information is as follows:

[root@RHEL7X rules.d]# cat 58-storage.rule
SUBSYSTEMS=="scsi",SUBSYSTEM=="block",KERNELS=="2:0:0:1",NAME="/sda%n"

Thank you advance for sharing your experience here.

Responses

sdX SCSI device names are not expected to be consistent between reboots, or even across add/remove of devices during the one uptime.

You should address the device via some other means, such as by its UUID which you can find with the blkid command.

If you can say where you are intending to use the letter like /dev/sda then we may be able to give some advice.

Here is an example usage in /etc/fstab:

$ sudo blkid | grep sda1
/dev/sda1: UUID="268a25bf-db56-4fc9-edbd-458fc9c9ab61" TYPE="ext4" PARTUUID="51f8e6a4-01"

$ grep 268a25bf /etc/fstab
UUID=268a25bf-db56-4fc9-edbd-458fc9c9ab61 /boot          ext4    defaults,noatime   0 2

Thank you for your reply. I know that this problem can be solved by modifying the /etc/fstab file, but it is not optimal, because although this method can be mounted, the mount drive letter may not be fixed. There are system LUNs and non-system LUNs in our storage system. When the device restarts, non-system LUNs are occasionally mounted under /dev/sdc or others, while system luns are mounted under /dev/sda. This is not what we want, because of upper-level user experience and application limitations, non-system LUNs should be only mounted under /dev/sda, and sda should be divided into sda1, sda2, sda3, and sda4 for mounting to different business directories.

I re-expressed my meaning on 29 April 2021 1:17 PM, I hope you can understand, and welcome to share your experience.

Why specifically do you need /dev/sda? What application or user experience is changed by this SCSI device letter being different?

If you have some application or script which depends on /dev/sdX being consistent between boots, that is the incorrect way to address underlying storage devices.

Applications and users should not care what SCSI device letter a particular storage gets, because those sdX letters can change and are expected to change.

Instead, the storage should be addressed by some unique and permanent property, such as the LUN WWID or filesystem UUID.

Thank you for your reply. Our upper-level business logic controlled by the cluster is relatively complicated and relatively troublesome to modify. However, through the preliminary research and your suggestions, the way to fix the drive letter in a unique partition does not seem to work. Maybe we still have to consider modifying the upper-level logic code again. Thank you very much.

That sounds like a good direction.

I re-expressed my meaning on 29 April 2021 1:17 PM, I hope you can understand, and welcome to share your experience.

hi Shui,

Using SAN via a single path approach?

I would advise you to reconsider the architecture, I would setup device-mapper-multipath.

In the configuration file you can setup specific names.

using classic scsi device names I would only for a system with internal disks.

I do not understand your remark about user experience.

An user should use the mountpoints, not the partitions, except if they write raw data (which I expect only from an Oracle DBA or a scientist using a near realtime data flow).

Regards,

Jan Gerrit Kootstra

Sorry, maybe my expression above is a bit vague. Simply put, I want to fix SYS_LUN3 to /dev/sda (here sda is divided into sda1, sda2, sda3, sd4), instead of being mounted to /dev/sdc or other after the system restarts. A wrong way to mount is as follows:

[root@nas ~]# cd /dev/disk/by-id/
[root@nas by-id]# ll
total 0
lrwxrwxrwx 1 root root  9 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN1 -> ../../sda
lrwxrwxrwx 1 root root  9 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN2 -> ../../sdb
lrwxrwxrwx 1 root root  9 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 -> ../../sdc
lrwxrwxrwx 1 root root 10 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 --part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3  --part2-> ../../sdc2
lrwxrwxrwx 1 root root 10 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 --part3 -> ../../sdc3
lrwxrwxrwx 1 root root 10 Apr 29 14:09 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 --part4 -> ../../sdc4

Ideally, the mounting result we expect should be as follows, but we don’t know how to always mount SYS_LUN3 under /dev/sda:

[root@nas ~]# cd /dev/disk/by-id/
[root@nas by-id]# ll
total 0
lrwxrwxrwx 1 root root  9 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN1 -> ../../sdb
lrwxrwxrwx 1 root root  9 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN2 -> ../../sdc
lrwxrwxrwx 1 root root  9 Apr 29 15::36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 -> ../../sda
lrwxrwxrwx 1 root root 10 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3 --part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Apr 29 15:36 scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part4 -> ../../sda4
[root@nas ~]# cat /etc/fstab
# Created by anaconda on Mon Aug 20 14:15:47 2018
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part1 /boot ext3 rw,suid,dev,exec,auto,nouser,sync 0 0
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part2 /var/log ext3 rw,suid,dev,exec,auto,nouser,sync 0 0
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part3 /nas/storage ext3 rw,suid,dev,exec,auto,nouser,sync 0 0
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_SYS_LUN3-part4 /nas/share ext3 rw,suid,dev,exec,auto,nouser,sync 0 0

We know that a similar uuid method can solve the problem that the disk can not be mounted after changing the drive letter. But now the server needs to be clustered, and the drive letters on both sides of the server need to be the same. Is there any good solution?

Yes, could be possible. First gather the LUN/Disk specific unique details by using the command 'udevadm info -a -n /dev/sdX' and then create a rule by adding LUN/Disk specific attributes in your custom file under /etc/udev/rules.d/ directory. May be a SYMLINK attribute would get you create a custom/unique device symbolic link and which can be used later as a reference. Please refer this KB for more details https://access.redhat.com/solutions/1135513

We started seeing this on RHEL9. Sometimes (< 10% chance?) the kernel gives the first-found disk the sdb assignment.

Journal reports something like "sd 0:0:0:0: [sdb] Attached SCSI disk"

The inconsistency in drive-letter names is an issue when you kickstart a system that has been operating and you would like to keep data on sdb yet have a "fresh" os install:

# Partition clearing information
clearpart --drives=sda --all

# Only these disks
ignoredisk --only-use=sda

# Disk partitioning information
part /boot --fstype="xfs" --ondisk=sda --size=1024
part pv.01 --fstype="lvmpv" --size=10000 --ondisk=sda --grow
volgroup vg_sys pv.01
logvol /  --fstype="xfs" --size=4096 --name=lv_root --vgname=vg_sys
logvol /tmp  --fstype="xfs" --size=512 --name=lv_tmp --vgname=vg_sys
..etc..

Also redhat.rhel_system_roles.storage require sdX disknames for partitioning.

I confirm this exact same scenario (kickstart a server, need to be certain that /dev/sda will be the OS drive, to protect data drive /dev/sdb) is happening for us too.

To state the obvious, an installed system will have many proper unique and meaningful identifiers for each disk that are easy to find, and thus easy to use for any application which might require them. However it is not practical to sort out these identifiers (by WWN, or by path on the PCI bus) to discover the right one to use during the kickstart process for building/rebuilding a server, where a reusable approach is preferred (one kickstart file for a whole plant of servers, making it non desirable to hardcode a disk model, a controller model or a PCI bus ID).

This is why people have relied and still rely on being able to refer to the "first disk" as /dev/sda, so as to use it as the boot disk, and then retain this assignment for operational procedures, for the same reason of domain isolation (OS and data).

To give further details for what we have observed in our case, we can even say that the problem happens at the level of the sdX device detection, and that the processes of PCI enumeration, SCSI host adapter enumeration, and SCSI generic devices (/dev/sgX) are consistent, as can be verified via lsscsi -g.

Most of the time we get:

sd 0:3:111:0: [sda] Attached SCSI disk
sd 1:3:111:0: [sdb] Attached SCSI disk

But some times we also see :

sd 1:3:111:0: [sda] Attached SCSI disk
sd 0:3:111:0: [sdb] Attached SCSI disk

The problem basically stems from the change highlighted in https://lore.kernel.org/lkml/59eedd28-25d4-7899-7c3c-89fe7fdd4b43@acm.org/t/

Based on feedback from other distributions upon receiving this change, it seems most advantages are reaped on desktop environments, but as is obvious from this thread, at the cost of server environments, and that the response of many administrators has been to simply roll out their custom kernels for the server world since there is no other practical way to revert to the previous stable behavior.

It seems this problem was given attention in the mainline kernel, as documentation (https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt) now reflects the following options :

<module>.async_probe[=<bool>] [KNL]
module.async_probe=<bool>

(Also see commit https://github.com/torvalds/linux/commit/ae39e9ed964f8e450d0de410b5a757e19581dfc5)

However, only the option is only currently available in limited form, to force asynchronous polling on a module :

<module>.async_probe [KNL]

It is currently possible to force a driver to probe its disk drives with a headstart (obviously subject to a race condition by nature, but based on observation, the probing following driver load is 10x faster than automatic normal loading of additional drivers), and this provides relief for a scenario where the drive we want seen as first is on a controller requiring "driver1", while other disks would be on controllers requiring "driver2".

Backporting the mainline option to deactivate asynchronous probing for a given driver would greatly help, as it could enlarge the scenarios supported, making it possible to handle :

  • one controller having multiple drives

  • multiple controllers of the same make having multiple drives