Trouble with Display Manager and NVIDIA Driver

Latest response

Hello,

I am running RHEL8 on an intel i9 Workstation with a Quadro RTX4000 GPU.

This system always was a little problematic since I configured it to come up in runlevel 5.

The GUI was sometimes not visible and I had to login remotely (via ssh) and restart GDM before Gnome came up again.

However, some time ago, I got rid of GNOME and installed XFCE4 instead. Moreover, I configured the system to come up in runlevel 3 and got used to starting XFCE4 manually, whenever I was working locally on the machine.

Recently, I tried to install lightdm as an alternative greeter. However, I did not succeed: I needed to install with the --allowerasing flag and afterwards my X was damaged (i.e. the screen was black).

To repair the machine, I logged in remotely again, un-installed X-Windows, re-installed it and I also had to reinstall nvidia-settings

Now, everything is (almost) like before again: I still have no greeter (which is not important, of course) and I boot into runlevel 3. From there, I can start XFCE4.

But when I update (via dnf upgrade) I get the following error reported:

Problem 1: package nvidia-settings-3:440.33.01-1.el8.x86_64 requires nvidia-libXNVCtrl(x86-64) = 3:440.33.01-1.el8, but none of the providers can be installed
- cannot install both nvidia-libXNVCtrl-3:440.82-1.el8.x86_64 and nvidia-libXNVCtrl-3:440.33.01-1.el8.x86_64
- cannot install the best update candidate for package nvidia-settings-3:440.33.01-1.el8.x86_64
- cannot install the best update candidate for package nvidia-libXNVCtrl-3:440.33.01-1.el8.x86_64
Problem 2: problem with installed package nvidia-settings-3:440.33.01-1.el8.x86_64
- package nvidia-settings-3:440.33.01-1.el8.x86_64 requires nvidia-libXNVCtrl(x86-64) = 3:440.33.01-1.el8, but none of the providers can be installed
- cannot install both nvidia-libXNVCtrl-3:440.82-1.el8.x86_64 and nvidia-libXNVCtrl-3:440.33.01-1.el8.x86_64
- package nvidia-libXNVCtrl-devel-3:440.82-1.el8.x86_64 requires nvidia-libXNVCtrl = 3:440.82-1.el8, but none of the providers can be installed
- cannot install the best update candidate for package nvidia-libXNVCtrl-devel-3:440.33.01-1.el8.x86_64
- package nvidia-settings-3:440.64-1.el8.x86_64 is excluded
- package nvidia-settings-3:440.82-1.el8.x86_64 is excluded
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

What can I do to resolve this?

Actually, I am a bit afraid of using nobest/allowerasing again.

Fixing the damaged X took me some hours recently - and I dramatically depend on this system currently.

Are there some safe options, which I could try to fix my RHEL8 (which will not damage my X again).

Any advice would be greatly appreciated!

Responses

Hi Jan,

Please tell us from which repositories you have installed the drivers ... negativo17 or RPM Fusion ?
As a first workaround suggestion : Remove nvidia-settings and all nvidia-libXNVCtrl packages ... :)

sudo dnf remove nvidia-settings nvidia-libXNVCtrl*
sudo dnf upgrade

Regards,
Christian

Thanks Christian. Actually, I did not try if this works so far. Are you sure this is safe?

I mean, can I always log in via ssh and repair the OS by re-installing nvidia-setting with dnf install nvidia-settings nvidia-libXNVCtrl or dnf history list/dnf undo id?

I could not do this after the dnf install lightdm --allowerasing damaged my X. Probably, my xorg.conf (or even more) was damaged. Anyway, it took me some hours to repair the OS by re-installing nvidia-settings.

Now, I am unsure, if I have a certain version of the nvidia-settings which I should not remove, since I don't know how/where to get exactly this version again.

My work currently depends substantially on this computer, so I don't want to try something which could make it unusable for hours again.

However, I will try what you suggest, when you say, it is safe.

I actually don't know wherefrom I have the nvidia-settings. Both, Fusion and negativo17 seem to be enabled. Here is my output from dnf repolist:

cuda-10-2-local-10.2.89-440.33.01 3.4 MB/s | 3.5 kB 00:00
Extra Packages for Enterprise Linux Modular 8 - x86_64 19 kB/s | 34 kB 00:01
negativo17 - Multimedia 3.5 kB/s | 3.9 kB 00:01
negativo17 - Spotify 2.9 kB/s | 3.0 kB 00:01
Extra Packages for Enterprise Linux 8 - x86_64 24 kB/s | 32 kB 00:01
Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) 3.2 kB/s | 4.1 kB 00:01
Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) 3.7 kB/s | 4.5 kB 00:01
Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs) 8.6 kB/s | 4.5 kB 00:00
Remi's Modular repository for Enterprise Linux 8 - x86_64 3.4 kB/s | 3.5 kB 00:01
Safe Remi's RPM repository for Enterprise Linux 8 - x86_64 2.3 kB/s | 3.0 kB 00:01
RPM Fusion for EL 8 - Free - Updates 2.7 kB/s | 3.7 kB 00:01
RPM Fusion for EL 8 - Nonfree - Updates 2.8 kB/s | 3.7 kB 00:01
Sublime Text - x86_64 - Stable 2.3 kB/s | 2.9 kB 00:01
repo id repo name status codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs) 1,873 cuda-10-2-local-10.2.89-440.33.01 cuda-10-2-local-10.2.89-440.33.01 74 epel Extra Packages for Enterprise Linux 8 - x86_64 5,334 epel-modular Extra Packages for Enterprise Linux Modular 8 - x86_64 0 epel-multimedia negativo17 - Multimedia 197 epel-spotify negativo17 - Spotify 1 remi-modular Remi's Modular repository for Enterprise Linux 8 - x86_64 16 remi-safe Safe Remi's RPM repository for Enterprise Linux 8 - x86_64 2,269 rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) 8,865 rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) 3,801 rpmfusion-free-updates RPM Fusion for EL 8 - Free - Updates 180 rpmfusion-nonfree-updates RPM Fusion for EL 8 - Nonfree - Updates 49 sublime-text

Hi Jan,

THAT is the culprit ... never ever enable both repositories at the same time. :) I recommend to uninstall ALL
NVIDIA related packages - then remove the negativo17 repository and clear dnf and the cache. Afterwards
reboot the machine and install the drivers, including nvidia-settings, from RPM Fusion. This is what you do :

Remove all NVIDIA related packages :

sudo dnf remove *nvidia*

Now remove the negativo17 repository.

Clean (clear) DNF :

sudo dnf clean all

Clear DNF cache :

sudo rm -r /var/cache/dnf

Reboot the system :

sudo reboot

Install the drivers :

sudo dnf upgrade
sudo dnf install akmod-nvidia nvidia-settings xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs
sudo dnf install nvidia-persistenced xorg-x11-drv-nvidia-cuda xorg-x11-drv-nvidia-cuda-libs

Reboot the system :

sudo reboot

Regards,
Christian

And this process is safe for sure?

Actually, a staff-member enabled the nvidia-repository for me, since he had to enable support for an external (thunderbolt) GPU on this computer.

I enabled the Fusion Repository, since I needed it for installing some other software (I don't remember what - perhaps VLC).

Everything was fine, with these two repos enabled until I tried to install the lightdm greeter.

Now, after repairing X+XFCE I can work again, but I'ld also like to repair dnf.

Anyway., I rather have a computer which I can work with (although I cannot update or install new software) than a computer which I cannot work with at all.

If I try out the procedure, you're suggesting and (after the second reboot) the computer comes up with a black screen again, I will probably commit suicide.

Thus, I am asking again: Is this procedure safe (i.e. reversible, if it does not work)

Hi Jan,

It is as safe "as safe can be", I have done that "thousands of times" ... The drivers are just software packages like
any other other software package. When you need and/or want to keep the negativo17 repository, then you can
also disable it when you are installing or upgrading the drivers, and afterwards re-enable the repository again. :)

Regards,
Christian

FYI Jan, the procedure is exactly what happens under the hood when you upgrade the drivers ... the old ones
get removed, and the new ones get installed. NVIDIA Settings is only a GUI application, which is not explicitly
needed (from the technical point of view). By the way, whenever you're "in trouble" after having removed the
NVIDIA drivers, you can disable the GPU by adding nouveau.modeset=0 to the kernel boot parameters ... :)

Regards,
Christian

what an adrenaline rush .. howerver, the GUI is back.

sudo dnf remove nvidia

this deleted a lot:

dnf-plugin-nvidia, kmod-nvidia-latest-dkms, nvidia-driver, nvidia-driver-NvFBCOpenGL, nvidia-driver-cuda, nvidia-driver-cuda-libs, nvidia-driver-devel, nvidia-driver-libs, nvidia-kmod-common, nvidia-libXNVCtrl, nvidia-libXNVCtrl-devel, nvidia-persistenced, nvidia-settings, cuda, cuda-libs, egl-wayland, mesa-vulkan-drivers, opencl-filesystem vulkan-loader

sudo dnf install akmod-nvidia nvidia-settings xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs sudo dnf install nvidia-persistenced xorg-x11-drv-nvidia-cuda xorg-x11-drv-nvidia-cuda-libs

not all of this is working:

e.g. dnf install akmod-nvidia gives me:

Problem: package akmod-nvidia-3:440.64-1.el8.x86_64 requires nvidia-kmod-common >= 3:440.64, but none of the providers can be installed, conflicting requests

dnf install xorg-x11-drv-nvidia gives me: No match for argument: xorg-x11-drv-nvidia

and dnf install cuda leads to: Transaction check error: file .. from install of nvidia-driver-cuda-libs-3:440.33.01-1.el8.x86_64 conflicts with file from package ..

however, I can log in and update again. THANKS A LOT - I don't know what this akmod-stuff is (and if I need it) but if you could help me to repair CUDA, I'ld appreciate it.

Cheers

Jan

You're welcome, Jan ! You may want to check out RPM Fusion Howto CUDA ... :)

Regards,
Christian

Hi Jan,

The "akmod-stuff" means that you don't have to reinstall the drivers after a kernel upgrade. :)

Regards,
Christian

Hi Jan,

If you want to learn more, it would be a good idea to check out RPM Fusion Howto NVIDIA ... :)

Regards,
Christian

Hi Christian,

thanks again,

RPM Fusion Howto CUDA I had a quick look and found:

RHEL -> sudo dnf config-manager --add-repo http://..cuda-rhel8.repo sudo dnf clean all sudo dnf install cuda

I tried this but I still get a transaction check error - I'll probably have a deeper look into the docs later.

RPM Fusion Howto NVIDIA I also checked out this and successfully installed the "akmod-stuff"

Have a great Weekend!

You're welcome, Jan ! Have a great weekend too. :)

Thanks.

Unfortunately, with the akmod installation, I've done damage again.

Here's what I did:

sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo sudo dnf clean all sudo dnf -y install cuda sudo dnf install akmod-nvidia

As I told you, the CUDA installation did not work, but the akmod installtion succeeded (after adding the CUDA repo and completing the dnf clean all).

However, after a reboot, the maximum (and only) screen resolution, I can select in settings->display is 1024x768 (on a 5K screen).

And when I try to run nvidia settings, it says:

ERROR: Unable to load info from any available system

Any idea how to fix this?

Hi Jan,

Unfortunately I don't have much experience with CUDA - but I think I recall that the CUDA version has to match a
specific drivers version and most probably the drivers being provided by RPM Fusion are not matching the CUDA
version from the NVIDIA repository. Eventually you can solve that, by removing the currently installed drivers and
installing the original NVIDIA drivers. Please read the docs carefully, and check all NVIDIA related config files ... :)

Regards,
Christian

okay, CUDA is not that urgent to me - I'll read the docs and check if the drivers/CUDA development-kit from NVIDIA help me, getting nvcc etc. ready to use.

But now, the reduced screen resolution (and the problems with nvidia-settings) are, what really worries me. Can you help me, getting a reasonable display resolution again?

Do I need to go through that full dnf remove nvidia adventure again or is there a quicker way to fix this?

Hi Jan,

No, there is no quicker way - please remove everything related to NVIDIA and CUDA, and then reinstall the drivers. :)

Regards,
Christian

Hi Christian,

I've done it again. Actually two times - unfortunately without success.

First try: I did it exactly like on Friday. However, this time with a different result. Probably due to the

sudo dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo

This time, I could install akmod-nvidia but I could not install nvidia-persistenced

After the second reboot, things where like yesterday: useless screen resolution 1024x768 and nvidia-setting tells me: Unable to load info from any available system

Second try: I wanted to disable the cuda-repos to repair, what was already working on Friday. So I did sudo dnf config-manager --disablerepo cuda-10-2-local-10.2.89-440.33.01 and sudo dnf config-manager --disablerepo cuda

Then I did the painfull dnf remove nvidia again and re-installed. But after the second reboot, the situation is still the same: useless screen resolution 1024x768 and nvidia-setting tells me: Unable to load info from any available system

dnf repolist tells me, disabling the cuda repos did not work:

repo id repo name status

codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder 1,873

cuda cuda 164 cuda-10-2-local-10.2.89-440.33.01 cuda-10-2-local-10.2.89-440.33.0 74

*epel Extra Packages for Enterprise Li 5,351

*epel-modular Extra Packages for Enterprise Li 0

remi-modular Remi's Modular repository for En 16

remi-safe Safe Remi's RPM repository for E 2,269

rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x 8,865

rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x 3,801

rpmfusion-free-updates RPM Fusion for EL 8 - Free - Upd 180

rpmfusion-nonfree-updates RPM Fusion for EL 8 - Nonfree - 49

sublime-text Sublime Text - x86_64 - Stable 2

What should I try next?

Hi Jan,

It is the same thing what I told you in my first response ... conflicting repositories. Please remove the CUDA
repository and every single trace being related to NVIDIA. After having uninstalled everything execute this :

sudo updatedb
sudo locate cuda
sudo locate nvidia

Check carefully what gets returned and then delete the selected files ... seems you have messed up a lot. :)

Regards,
Christian

I could not read your message, since I had no X for the last 2 hours.

I've done two more re-installations of everything nvidia-related, then, I disabled RPM Fusion and enabled negativo-17 again. After another re-installtion and re-boot, I could run XFCE again and the display resolution could also be set to 4096x2160.

CUDA still does not want to install, but I'll take care about this, next week.

Hi Jan,

That's great news from your side ! Glad to read that you could solve the most important thing. :)

Regards,
Christian

Hi Jan,

What I didn't mention so far ... please note that Red Hat does not support XFCE - the only supported DE on RHEL 8 is GNOME. :)

Regards,
Christian

okay, when there is no official support for XFCE I will probably need to get used to not having a greeter, when I startup this computer (actually, that's not a problem). I will need to solve that CUDA problem sometime in the future, but for now, my guiding principle is "do not touch a running system"

Well Jan, you may want to consider sticking with the default GNOME desktop environment - it's not "that bad" ... :)

Regards,
Christian

Today, I had the opportunity to ask my assistant, who did the original installation, wherefrom he got the NVIDIA drivers. He did not add a repository himself, but he ran the run.sh skript from the NVIDIA Web site. Do you know what source is used by this skript? If the drivers where not installed via dnf, can I be sure that the dnf remove *nvidia* deletes everything that was installed by the NVIDIA-installation skript? If not, maybe this is the source of the conflicts, when I try to install the latest CUDA development kit.

Hey Jan, that might be the main problem ... please remove the original NVIDIA drivers and start over from scratch. :)

sudo nvidia-installer --uninstall

sudo nano /etc/default/grub
GRUB_CMDLINE_LINUX="<remove-nvidia-entries>"

sudo rm /etc/X11/xorg.conf

sudo rm -r /home/c<user>/.nv
sudo rm /home/<user>/.nvidia-settings-rc

sudo rm /var/log/nvidia-installer.log
sudo rm /var/log/nvidia-uninstall.log

sudo dracut -f /boot/initramfs-*<kernel-version>.img
sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

sudo reboot  

Regards,
Christian

start over from scratch

.. again, this will get me into trouble. Maybe, I can try this in 2-3 weeks - but currently, I won't find the hours required for fixing X when the PC is in an unusable state again (no DE). I need that computer for my work now.

Any other suggestions for fixing

sudo dnf upgrade

Updating Subscription Management repositories.

Last metadata expiration check: 0:21:11 ago on Wed 06 May 2020 11:08:11 AM CEST.

Error:

Problem 1: package nvidia-settings-3:440.64.00-1.el8.x86_64 requires nvidia-libXNVCtrl(x86-64) = 3:440.64.00-1.el8, but none of the providers can be installed

Problem 2: package qt5-qtwebengine-5.12.4-5.el8.1.x86_64 requires qt5-qtbase(x86-64) = 5.11.1, but none of the providers can be installed

Problem 3: problem with installed package qt5-qtwebengine-5.12.4-5.el8.1.x86_64

Are the three problems related to each other or can I fix problem 2 & 3 (Qt) independently from problem 1?

Concerning problem 1)

Can't I just update the system and leave the NVIDIA-stuff untouched?

--allowerasing will kill my DE again, what will --skip-broken and --nobest do?

--nobest sound good to me ... will it interfere with my NVIDIA installation? (which works fine, except for the fact that I currently cannot run an update).

Hi Jan,

Just execute sudo dnf upgrade --exclude=nvidia* --exclude=qt5* ... :)

Regards,
Christian

[user@computer ~]$ sudo dnf upgrade --exclude=nvidia* --exclude=qt5* 
[sudo] password for user: 
Updating Subscription Management repositories.
Last metadata expiration check: 0:16:44 ago on Sun 07 Jun 2020 03:34:11 PM CEST.
Error: 
 Problem 1: package qgnomeplatform-0.4-3.el8.x86_64 requires libQt5Gui.so.5(Qt_5.12.5_PRIVATE_API)(64bit), but none of the providers can be installed
  - cannot install the best update candidate for package qgnomeplatform-0.4-2.el8.x86_64
  - package qt5-qtbase-gui-5.12.5-4.el8.x86_64 is excluded
 Problem 2: package kmod-nvidia-latest-dkms-3:450.36.06-1.el8.x86_64 requires nvidia-kmod-common = 3:450.36.06, but none of the providers can be installed
  - cannot install the best update candidate for package kmod-nvidia-latest-dkms-3:440.64.00-1.el8.x86_64
  - package nvidia-kmod-common-3:450.36.06-1.el8.noarch is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)