After installing Nvidia and Cuda drivers on rhel 7.6, boot process is very slow

Latest response

Hello Everyone,

It appears something got broken in the process of installing the nvidia drivers on rhel 7.6, but im wondering what I might be able to do to improve/investigate this. Last time I installed the drivers with centos 7.5 they seemed fine, but I switched to rhel 7.6 yesterday.

Responses

if I install the nvidia driver as root i see this error- "An Incomplete installation of libglvnd was found. all of the essential libglvnd libraries are present, but one or more optional components are missing. do you want to install a full copy of libglvnd? this will overwrite any existing libglvnd libraries".

It doesn't matter if i select to install and overwrite thugh because if I run the installer again, I would expect this message to dissapear if it was successful, and it always shows up.

Hi Andrew,

You need to install libglvnd-devel before you install the NVIDIA drivers ... :)
sudo yum install libglvnd-devel prevents the installer from overwriting.

Regards,
Christian

Thanks Christian, ill give that a shot!

You're welcome, Andrew ! :) It should work, I've done that several times - please give us a feedback.

Regards,
Christian

This round (after reimaging to pre cuda install), after installation, I was able to build and test the cuda examples. good sign. However - when rebooted to log in, it will display a console for a moment, and then ask me to log in again.

I followed these steps, with alterations- https://access.redhat.com/solutions/1453633

I ran these commands instead of the wget instructions to get the epel release, and also installed lbglnvd-devel to avoid the nvidia driver complaining about this.

sudo yum install epel-release sudo yum install libglvnd-devel

when installing the cuda driver, I did not allow it to replace my x config. I also said yes to creating the symlinks. I installed the nvidia driver with cuda.

after installing the cuda driver, I ran nvidia-smi straight away to initialise the files that exist at path /dev/nvidia*

also on this instruction point below, its not specified that should be replaced with the actual version (eg 10). is it possible to omit version altogether though since the cuda install creates siymlinks from /usr/local/cuda/ to /usr/local/cuda-10/ ? thats what I did-

export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

I added these changes to the /etc/environment file to make them permanent.

I tried another iteration with some success - this time I did allow the cuda installer to update the x config. I also decided to not touch any envirnment variables since I suspected they were causing problems, and I can boot with the nvidia driver functioning. so the next step I have to figure out is what steps do I take to set the environment permanently and correctly?

Thanks again for any help.

Hi Andrew,

Good to read that you've got the drivers running properly ... which was the most important part.
Unfortunately I don't have enough experience with CUDA to give you best possible instructions.
But I think you will get that figured out on your own - right by testing and trying things out ... :)

Regards,
Christian

cheers Christian, Ill get there :)

I don't work for RHT support but thought I'd post my procedure to install CUDA and nvidia drivers on RHEL7.6 server (minimal install).

# yum -y update

# reboot 

# yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r) pciutils

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum -y install https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm

# yum clean all

# yum -y install cuda

Thanks for sharing your approach, ill keep that in mind for next time. at what stage is nouveau disabled there?

I suspect the nouveau kernel module is unloaded when yum installs cuda and the nvidia drivers. At least I did not have to manually disable or remove nouveau.

in rhel 7.5 after installing the epel repo, I can't install libglvnd-devel for some reason.

[root@workstation tmp]# sudo yum install libglvnd-devel Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager No package libglvnd-devel available. Error: Nothing to do [root@workstation tmp]#

Hi Andrew,

The package is available in the rhel-7-server-rpms repository. Eventually you need to enable the extended update support rhel-7-server-eus-rpms repository to gain access to the libglvnd-devel package for RHEL 7.5 - please check out if that works. :)

Regards,
Christian

Thanks Christian. does this look correct to you? it appears to not be available

[user@workstation ~]$ sudo yum install yum-utils -y Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Package yum-utils-1.1.31-46.el7_5.noarch already installed and latest version Nothing to do [user@workstation ~]$ sudo yum-config-manager --enable rhel-7-server-eus-rpms Loaded plugins: langpacks, product-id, subscription-manager [user@workstation ~]$ sudo yum install libglvnd-devel Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager No package libglvnd-devel available. Error: Nothing to do [user@workstation ~]$

Hi Andrew, are you using the workstation edition ? In case you do, please check if it is available there too : Red Hat Package Browser ... on my server system the libglvnd-devel package is available. :)

$ sudo yum list libglvnd-devel
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Available Packages
libglvnd-devel.i686          1:1.0.1-0.8.git5baa1e5.el7         rhel-7-server-rpms
libglvnd-devel.x86_64        1:1.0.1-0.8.git5baa1e5.el7         rhel-7-server-rpms  

Regards,
Christian

I'm using rhel workstation. a while back i was on server because I downloaded the wrong iso, but im on workstation now.

[user@workstation ~]$ rpm -q libglvnd-devel package libglvnd-devel is not installed [user@workstation ~]$ sudo yum list libglvnd-devel [sudo] password for user: Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Error: No matching Packages to list

Did you check with the Package Browser ? The package might be not available for workstation.
Workstation is only a sub set of Server - can you attach a valid Server subscription, Andrew ? :)

Regards,
Christian

Hi Andrew,

Did you install RHEL server 7.5 or RHEL workstation 7.5?

In the later case you should not activate any server repositories.

Still the installation should work.

Maybe the rpm is already installed, what does the following command show?

rpm -q libglvnd-devel

Regards,

Jan Gerrit Kootstra

Hey jan - i get this on RHEL workstation 7.5 [user@workstation ~]$ rpm -q libglvnd-devel package libglvnd-devel is not installed

I had to remove the Nvidia card compeltely because it would periodically freeze up. I originally installed it on a 7.5 workstation. Have you had any luck post libglvnd package? I might give it a another shot if that's the case.

my problem on RHEL 7.6 is that even if I blacklisted noveau both in GRUB and also in the blacklist conf file... Its still opening in Graphical mode instead of text or multiuser.target mode.. I will try the guide that Andrew was able to do with..

Hi Jatin,

Execute sudo systemctl set-default multi-user.target - reboot and you reach run level 3 ("text mode") :)

Regards,
Christian

Now after following the CUDA environment page, I am stuck on GRUB when i boot its black screen :( Please make RPM fusion driver to work, i know it dont work with RHEL only with fedora but please :(

Hi Jatin,

You may want to check out the original NVIDIA drivers ... probably they work better than the RPM Fusion drivers. :)

Regards,
Christian

Been the RPM Fusion project coordinator, I'm not very pleased by this answer.

RPM Fusion is a RHEL and derivates 3rd party Community repository. The purpose is to make packages to install things earsier than if If you had to handle by hand. If you need something please report to bugzilla.rpmfusion.org

Right now I confirm that either RPM Fusion or nvidia cuda repository package does not enable nvidia driver on RHEL 7.6 (or derivates). This is a regression form ealier RHEL. On Fedora with the same xorg-server, the driver operates fine.

Hi Nicolas,

First things first : I very much appreciate the work being done by the RPM Fusion team - thank
you for that. Secondly, no reason to be 'not very pleased', I generally recommend to install the
RPM Fusion drivers in first place, as you can see in many posts from me in other discussions. :)

You might have missed the word probably in my response to Jatin ? I agree with you, that the
drivers from RPM Fusion are more convenient to install and that installing the original NVIDIA
driver requires some advanced knowledge, especially the correct configuration of the /etc/X11
/xorg.conf file is quite important to get the NVIDIA drivers running properly without problems.

But there are some hardware related cases where the original NVIDIA drivers work better than
pre-packaged drivers from RPM Fusion or other repositories, such like negativo17 for example.
I have long-time experience with supporting users of Linux systems to get the graphics drivers
installed and running on debian and Red Hat based systems - it's the reason for my suggestion.

Regards,
Christian

Thx for your clarifications.

The RPM Fusion packaged driver is the same binary as distributed by NVIDIA, so It is feature parity. But Indeed, for example, there is an option in the Nvidia installer that allows to not install the Nvidia libGL so cuda users on optimus could run cuda applications while still using the intel GPU on desktop. We don't have such an option, but that's still something that can be achieved easily with a small post-configuration file once using libglvnd and at the same time, there is a reasonable working default.

Anyway, if you think there is any miss in the documentation or thing you would like us to support, please fill a RFE on bugzilla.rf.org

Thx for your comprehension.

No Nicolas, all is good - nothing to complain about ... I meant what I said : the RPM Fusion team does an excellent job there. The only thing I'd like to see would be a dkms version of the NVIDIA drivers as an additional (alternative installation method to akmod for "surviving" the kernel upgrades) offering.

Cheers :)
Christian

Hi Christian,

In my case I use a laptop with a Nvidia Optimus GeForce GT 750M. I can't initialize the driver on kernel 3.10.0-957.5.1.el7.x86_64 (with bumblebee). I need to install a unsupported kernel from elrepo. There are any way to use my graphic card on a supported kernel?

Thanks. Igor Bajo. PS.: Sorry about English, I'm from Brazil, it's not my first language.

Hi Igor,

First of all : Absolutely no reason to apologize - you've expressed yourself fine ! :) The main reason
why you are having trouble might be the presence of bumblebee. The project hasn't been updated
since years and also, this is not the recommended way to use the graphics adapter - I recommend
to remove bumblebee, uninstall all NVIDIA software, remove all configuration files and to reinstall
the drivers.

Regards,
Christian

Christian,

I will try. Will I need to remove the intel driver? I ask because there are no option on BIOS to disable only the integraded Intel graphics and leave working the dedicate Nvidia.

Thanks, Igor Bajo.

No Igor, you don't have to remove the intel driver and you should not do it. The intel driver is
needed to display what the NVIDIA drivers render. But what you need to do is to blacklist the
nouveau drivers, just add the following related parameters to the /etc/default/grub file :

GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau"

Cheers :)
Christian

Christian,

OK, I will do it.

Thanks, Igor Bajo.

You're welcome, Igor ! :)

Please do not forget to update the GRUB configuration afterwards :
sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Regards,
Christian

Hi Guys, I am in the same loop. I am having trouble installing the nvidia drivers. I get this following error: unable to load 'nvidia-drm' kernel module. What does it mean?

Need help!

Thanks everyone for the input, I'm likely to help some with NVIDIA Cuda.

I think it would be great if RedHat recognised the need to GPU's to work out of the box. its true for any operating system. I've heard the process is easier with mint, and I'll probably try it out when my subscription ends for this year.

I know Nvidia don't make it easy, still, a RedHat ordaned Ansible based installation could eat this pain up for breakfast.

It also would mean other graphics applications can have an assumed base configuration.

Hi Andrew,

Most NVIDIA GPU's are supported out-of-the-box - Red Hat ships with the open source nouveau
drivers. The original NVIDIA drivers are proprietary software and are only needed when you want
to do some "heavy gaming" or, when you need to perform tasks demanding high performance. :)

Regards,
Christian

When I said that, I recognise that you are saying.

But, when you buy any graphics card like a 1080 ti, you don't expect to run it at a fraction of its performance, and you probably aren't doing it for gaming either if you are running RHEL. It's not acceptable to have to put aside a day of downtime to run it at full performance. installing these drivers with multiple reboots is not a quick affair, and if you make a mistake, a beginner might need to reinstall again. Installation should be smooth.

Mind you with my 1080 ti anyway, I do need to enable custom options during installation of RHEL, or I can't even see a display to install the os.

These problems make RHEL a major pain. Despite the good support, I'm not likely to want to continue with the product, or recommend it to other professionals in the visual effects industry. I sincerely hope people up there in charge of RHEL are listening. There is much room for improvement to make this a better experience for people. I want to like it, but I cannot recommend it right now.

Hi Andrew,

As much as I understand your frustration ... please don't blame Red Hat for the situation.
NVIDIA provides up to nearly no information about the firmware, so the developers have
to "dig in the mud" and have to build the open source drivers by trial and error research.

AMD on the other hand shows what's possible with proper collaboration. In the end, the
most important thing is that it is possible to get the proprietary NVIDIA drivers working.

The only downside (and here I agree with you) is that it needs advanced skills. But RHEL
not being a good choice generally ? Well, I don't know - me personally, I have never seen
such a reliable and stable system before and that makes it absolutely recommendable. :)

Regards,
Christian

NVIDIA don't make it easy, i've heard. But if there is indeed best practice that exists in the form of documentation, it could be extended into implementation/best practice in the form of something like an Ansible installer. This could serve to validate instructions by going one step further and ease the burden.

Yes Andrew, but the problem is that each machine is different, one with Optimus, one without,
the other one with a modern adapter (like your one), the next with a legacy adapter and so on.

The most easy approach would be to add the RPM Fusion repository and install the drivers by
running a simple yum command right as we do it with other apps ... and : this option exists ! :)

Regards,
Christian