After installing Nvidia and Cuda drivers on rhel 7.6, boot process is very slow

Latest response

Hello Everyone,

It appears something got broken in the process of installing the nvidia drivers on rhel 7.6, but im wondering what I might be able to do to improve/investigate this. Last time I installed the drivers with centos 7.5 they seemed fine, but I switched to rhel 7.6 yesterday.

Responses

if I install the nvidia driver as root i see this error- "An Incomplete installation of libglvnd was found. all of the essential libglvnd libraries are present, but one or more optional components are missing. do you want to install a full copy of libglvnd? this will overwrite any existing libglvnd libraries".

It doesn't matter if i select to install and overwrite thugh because if I run the installer again, I would expect this message to dissapear if it was successful, and it always shows up.

Hi Andrew,

You need to install libglvnd-devel before you install the NVIDIA drivers ... :)
sudo yum install libglvnd-devel prevents the installer from overwriting.

Regards,
Christian

Thanks Christian, ill give that a shot!

You're welcome, Andrew ! :) It should work, I've done that several times - please give us a feedback.

Regards,
Christian

This round (after reimaging to pre cuda install), after installation, I was able to build and test the cuda examples. good sign. However - when rebooted to log in, it will display a console for a moment, and then ask me to log in again.

I followed these steps, with alterations- https://access.redhat.com/solutions/1453633

I ran these commands instead of the wget instructions to get the epel release, and also installed lbglnvd-devel to avoid the nvidia driver complaining about this.

sudo yum install epel-release sudo yum install libglvnd-devel

when installing the cuda driver, I did not allow it to replace my x config. I also said yes to creating the symlinks. I installed the nvidia driver with cuda.

after installing the cuda driver, I ran nvidia-smi straight away to initialise the files that exist at path /dev/nvidia*

also on this instruction point below, its not specified that should be replaced with the actual version (eg 10). is it possible to omit version altogether though since the cuda install creates siymlinks from /usr/local/cuda/ to /usr/local/cuda-10/ ? thats what I did-

export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

I added these changes to the /etc/environment file to make them permanent.

I tried another iteration with some success - this time I did allow the cuda installer to update the x config. I also decided to not touch any envirnment variables since I suspected they were causing problems, and I can boot with the nvidia driver functioning. so the next step I have to figure out is what steps do I take to set the environment permanently and correctly?

Thanks again for any help.

Hi Andrew,

Good to read that you've got the drivers running properly ... which was the most important part.
Unfortunately I don't have enough experience with CUDA to give you best possible instructions.
But I think you will get that figured out on your own - right by testing and trying things out ... :)

Regards,
Christian

cheers Christian, Ill get there :)

I don't work for RHT support but thought I'd post my procedure to install CUDA and nvidia drivers on RHEL7.6 server (minimal install).

# yum -y update

# reboot 

# yum -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r) pciutils

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum -y install https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm

# yum clean all

# yum -y install cuda

Thanks for sharing your approach, ill keep that in mind for next time. at what stage is nouveau disabled there?

I suspect the nouveau kernel module is unloaded when yum installs cuda and the nvidia drivers. At least I did not have to manually disable or remove nouveau.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.