[semi-resolved] Red Hat 7.x ssh fails with error "System is booting up. See pam_nologin(8) \ Authentication Failed." (but system is booted)

Latest response

This is not a Red Hat solution, this is the Red Hat discussion area. I'm documenting this here because I've had at minimum 3 systems have this issue and want to document it for others who might be experiencing this (or for myself or team members when I google it).

Important: Please submit a case with Red Hat

  • And please cite this discussion in the case.

Environment

  • Red Hat Enterprise Linux 7.x Server

Note: make a comment in this discussion if you encounter this on a system that is not RHEL 7


Issue

  • ssh to a RHEL 7 server fails with the following erroneous error in the below block:
# your ip address or hostname will obviously differ
[you@yoursystem] # ssh 192.168.100.100
System is booting up. See pam_nologin(8)
Authentication failed.


  • The system you are attempting to ssh to is actually not in the boot process.

  • The user you are sshing to is not a nologin account


Resolution

  • IMPORTANT Really know that this system is not in the process of booting. If it is not then manually log into the system, become root and remove /run/nologin file

  • Log into the system using the console of the computer (physical or virtual, or web interface for a console). Then remove the /run/nologin file.

  • Virtual system example, if it is a VMware system, use the VMware interface to log in and perform this.

  • Visit the system and log into the console if it is a physical server.

  • If you have a remote management tool (Dell for example has "iDrac"), use it to gain access to your system

  • Some systems might have Red Hat Cockpit installed and running, use it if possible to attain terminal access.

  • Again, once you are logged in, switch to the root account and remove the /run/nologin file

  • Additionally check for the following files and remove if they exist:

ls  /{var/run,etc,run}/nologin && rm  /{var/run,etc,run}/nologin


Root Cause


Request any Red Hatters or anyone else post anything relevant on this topic (especially if there is a source solution or article from redhat).

Regards

RJ

Responses

Hello! I have the same issue. Do you have any result examining that problem?

What does 'systemctl status' report? Is the system in a 'degraded' state, with 1 or more services "failed"? ("systemctl list-units |grep failed' might be helpful here).

I do not know exactly which of the many systemd units is repsonsible for removing the 'nologin' file (either /etc/nologin or /run/nologin), but hunting down that service and its dependencies would be my approach to solving the original problem.

Andrew, I wish I had more info as to why. Other priority production things have gotten in the way of me digging further into it.

James, the systemctl status command is useful, nice tip above, thanks. I can't say there is a correpsonding failed service with this overall issue I originally posted.

Regards

RJ

Folks, this is a bug that needs to be addressed. It was introduced in the Sept channels. See my notes to my team on this below.

"We have a serious bug that was introduced with the patching update with the Sept channels. Changing init state is very unstable.

When you’re remotely ssh-ed into 7.7 system and drop the system to init 3 in prep to update the graphics driver, most of the time it will close out all logged in sessions (on console and remote ssh-es) and create a /run/nologin file. That prevents any logins from at least remote including as root. I did not test if we can indeed get in on the console and remove the file because in this particular case the ticket was no graphics (aka no console).

What you get after changing init state:

System is booting up. See pam_nologin(8) Authentication failed.

System is not really booting but just sitting there in init 5 state or sometime it would drop to the init 3 state. In a nutshell changing init levels in the sept channels in broken. This time It required a power cycle of the system to recover.

Other scenarios of this bug:

1) Change init state to 3 -> window hangs and does not change states) 2) Change init state to 3 -> changes state but kicks all sessions off except a session logged in as root, no /run/nologin file created You’re able to ssh back in and update the driver. Note an init 5 to bring the graphics up again will kick you out again but it does return to level 5 and ssh back in again. 3) Change init state to 3 -> changes state but kicks all sessions off except a session logged in as root, /run/nologin file is created but able to remove file from root session still logged in. 4) Change init state to 3 -> kicks all sessions off including root and creates a /run/nologin file thus locking out all future logins (at least from remote).

This problem is intermittent to the point where I have some systems at 7.7 where we can change init levels without any problems. It could be that the 7.7 systems without problems could have been updated to channels later than September. I’m guessing here.

The way this is supposed to work is that when you do an init 3, it only kicks off the console windows (turns off the graphics). All people that are ssh-ed in stay logged in. An init 5 brings the graphics back up restoring the console. This was the behavior prior to patching updates (using sept channels).

Has there been a bug created with Red Hat for this issue yet?

Regards, Steve

Steve, It's driving us nuts too. Put in a case please. By the way, I do not work for Red Hat.

Regards

RJ

All,

I have the same issue each time I reboot my system at home. Now that I see more people than I have this issue, I will take following actions:

  • I reboot my system this weekend
  • open a support case
  • will update this thread with the outcome.

Regards,

Jan Gerrit

I'm going to revisit the case we submitted. I don't believe it was resolved when I put it in.

Thanks Jan
RJ

Jan, please inform them of this discussion in your case so Red Hat can look at the others who have reported this issue here.

I reopened the case I had with this.

Thanks/Regards
RJ

My case did not get resolved, reinstallation using RHEL 8.1 I have not seen this issue any more.

I'm curious if this is isolated to RHEL 7. I'm glad you didn't incur it with RHEL 8.1