After installing Nvidia and Cuda drivers on rhel 7.6, boot process is very slow

Latest response

Hello Everyone,

It appears something got broken in the process of installing the nvidia drivers on rhel 7.6, but im wondering what I might be able to do to improve/investigate this. Last time I installed the drivers with centos 7.5 they seemed fine, but I switched to rhel 7.6 yesterday.

Responses

if I install the nvidia driver as root i see this error- "An Incomplete installation of libglvnd was found. all of the essential libglvnd libraries are present, but one or more optional components are missing. do you want to install a full copy of libglvnd? this will overwrite any existing libglvnd libraries".

It doesn't matter if i select to install and overwrite thugh because if I run the installer again, I would expect this message to dissapear if it was successful, and it always shows up.

Hi Andrew,

You need to install libglvnd-devel before you install the NVIDIA drivers ... :)
sudo yum install libglvnd-devel prevents the installer from overwriting.

Regards,
Christian

Thanks Christian, ill give that a shot!

You're welcome, Andrew ! :) It should work, I've done that several times - please give us a feedback.

Regards,
Christian

This round (after reimaging to pre cuda install), after installation, I was able to build and test the cuda examples. good sign. However - when rebooted to log in, it will display a console for a moment, and then ask me to log in again.

I followed these steps, with alterations- https://access.redhat.com/solutions/1453633

I ran these commands instead of the wget instructions to get the epel release, and also installed lbglnvd-devel to avoid the nvidia driver complaining about this.

sudo yum install epel-release sudo yum install libglvnd-devel

when installing the cuda driver, I did not allow it to replace my x config. I also said yes to creating the symlinks. I installed the nvidia driver with cuda.

after installing the cuda driver, I ran nvidia-smi straight away to initialise the files that exist at path /dev/nvidia*

also on this instruction point below, its not specified that should be replaced with the actual version (eg 10). is it possible to omit version altogether though since the cuda install creates siymlinks from /usr/local/cuda/ to /usr/local/cuda-10/ ? thats what I did-

export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

I added these changes to the /etc/environment file to make them permanent.

I tried another iteration with some success - this time I did allow the cuda installer to update the x config. I also decided to not touch any envirnment variables since I suspected they were causing problems, and I can boot with the nvidia driver functioning. so the next step I have to figure out is what steps do I take to set the environment permanently and correctly?

Thanks again for any help.

Hi Andrew,

Good to read that you've got the drivers running properly ... which was the most important part.
Unfortunately I don't have enough experience with CUDA to give you best possible instructions.
But I think you will get that figured out on your own - right by testing and trying things out ... :)

Regards,
Christian

cheers Christian, Ill get there :)

I don't work for RHT support but thought I'd post my procedure to install CUDA and nvidia drivers on RHEL7.6 server (minimal install).

# yum -y update

# reboot 

# yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r) pciutils

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum -y install https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm

# yum clean all

# yum -y install cuda

Thanks for sharing your approach, ill keep that in mind for next time. at what stage is nouveau disabled there?

I suspect the nouveau kernel module is unloaded when yum installs cuda and the nvidia drivers. At least I did not have to manually disable or remove nouveau.

in rhel 7.5 after installing the epel repo, I can't install libglvnd-devel for some reason.

[root@workstation tmp]# sudo yum install libglvnd-devel Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager No package libglvnd-devel available. Error: Nothing to do [root@workstation tmp]#

Hi Andrew,

The package is available in the rhel-7-server-rpms repository. Eventually you need to enable the extended update support rhel-7-server-eus-rpms repository to gain access to the libglvnd-devel package for RHEL 7.5 - please check out if that works. :)

Regards,
Christian

Thanks Christian. does this look correct to you? it appears to not be available

[user@workstation ~]$ sudo yum install yum-utils -y Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Package yum-utils-1.1.31-46.el7_5.noarch already installed and latest version Nothing to do [user@workstation ~]$ sudo yum-config-manager --enable rhel-7-server-eus-rpms Loaded plugins: langpacks, product-id, subscription-manager [user@workstation ~]$ sudo yum install libglvnd-devel Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager No package libglvnd-devel available. Error: Nothing to do [user@workstation ~]$

Hi Andrew, are you using the workstation edition ? In case you do, please check if it is available there too : Red Hat Package Browser ... on my server system the libglvnd-devel package is available. :)

$ sudo yum list libglvnd-devel
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Available Packages
libglvnd-devel.i686          1:1.0.1-0.8.git5baa1e5.el7         rhel-7-server-rpms
libglvnd-devel.x86_64        1:1.0.1-0.8.git5baa1e5.el7         rhel-7-server-rpms  

Regards,
Christian

I'm using rhel workstation. a while back i was on server because I downloaded the wrong iso, but im on workstation now.

[user@workstation ~]$ rpm -q libglvnd-devel package libglvnd-devel is not installed [user@workstation ~]$ sudo yum list libglvnd-devel [sudo] password for user: Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Error: No matching Packages to list

Did you check with the Package Browser ? The package might be not available for workstation.
Workstation is only a sub set of Server - can you attach a valid Server subscription, Andrew ? :)

Regards,
Christian

Hi Andrew,

Did you install RHEL server 7.5 or RHEL workstation 7.5?

In the later case you should not activate any server repositories.

Still the installation should work.

Maybe the rpm is already installed, what does the following command show?

rpm -q libglvnd-devel

Regards,

Jan Gerrit Kootstra

Hey jan - i get this on RHEL workstation 7.5 [user@workstation ~]$ rpm -q libglvnd-devel package libglvnd-devel is not installed

I had to remove the Nvidia card compeltely because it would periodically freeze up. I originally installed it on a 7.5 workstation. Have you had any luck post libglvnd package? I might give it a another shot if that's the case.

my problem on RHEL 7.6 is that even if I blacklisted noveau both in GRUB and also in the blacklist conf file... Its still opening in Graphical mode instead of text or multiuser.target mode.. I will try the guide that Andrew was able to do with..

Hi Jatin,

Execute sudo systemctl set-default multi-user.target - reboot and you reach run level 3 ("text mode") :)

Regards,
Christian

Now after following the CUDA environment page, I am stuck on GRUB when i boot its black screen :( Please make RPM fusion driver to work, i know it dont work with RHEL only with fedora but please :(

Hi Jatin,

You may want to check out the original NVIDIA drivers ... probably they work better than the RPM Fusion drivers. :)

Regards,
Christian

Been the RPM Fusion project coordinator, I'm not very pleased by this answer.

RPM Fusion is a RHEL and derivates 3rd party Community repository. The purpose is to make packages to install things earsier than if If you had to handle by hand. If you need something please report to bugzilla.rpmfusion.org

Right now I confirm that either RPM Fusion or nvidia cuda repository package does not enable nvidia driver on RHEL 7.6 (or derivates). This is a regression form ealier RHEL. On Fedora with the same xorg-server, the driver operates fine.

Hi Nicolas,

First things first, I very much appreciate the work being done by the RPM Fusion team, thanks a
lot for that. Secondly, no reason to be "not very pleased" - generally I recommend to install the
RPM Fusion drivers in first place, as you can see in many posts from me in other discussions. :)

You might have missed the word probably in my response to Jatin ? I agree with you, that the
drivers from RPM Fusion are more convenient to install and that installing the original NVIDIA
driver requires some advanced knowledge, especially the correct configuration of the /etc/X11
/xorg.conf file is quite important to get the NVIDIA drivers running properly without problems.

But there are some hardware related cases where the original NVIDIA drivers work better than
pre-packaged drivers from RPM Fusion or other repositories, such like negativo17 for example.
I have long-time experience with supporting users of Linux systems to get the graphics drivers
installed and running on debian and Red Hat based systems - it's the reason for my suggestion.

Regards,
Christian

Thx for your clarifications.

The RPM Fusion packaged driver is the same binary as distributed by NVIDIA, so It is feature parity. But Indeed, for example, there is an option in the Nvidia installer that allows to not install the Nvidia libGL so cuda users on optimus could run cuda applications while still using the intel GPU on desktop. We don't have such an option, but that's still something that can be achieved easily with a small post-configuration file once using libglvnd and at the same time, there is a reasonable working default.

Anyway, if you think there is any miss in the documentation or thing you would like us to support, please fill a RFE on bugzilla.rf.org

Thx for your comprehension.

No Nicolas, all is good - nothing to complain about ... I meant what I said : the RPM Fusion team does an excellent job there. The only thing I'd like to see would be a dkms version of the NVIDIA drivers as an additional (alternative installation method to akmod for "surviving" the kernel upgrades) offering.

Cheers :)
Christian