Triumphing the vicious cycle of NVidia-Cuda-Cudnn Installions
So this post is for all those people who have been facing the NVidia-Cuda-Cudnn pain in the.. basically facing this pain everywhere.
In this vicious cycle, (if I may quote so), if one is able to install NVidia successfully then gets stuck at Cuda, then goes backs to reinstalling NVidia and stuck there on uninstalling it and the worst (as in my case), if you happen to install everything correct and *latest*, you get to know while model training TensorFlow has been upgrading itself as fast as you’ve upgrading your system overnight.
So if you’re in the same boat, follow along.
We’ll traverse through the usual way (which seems to be working for everybody but not for me, though I’m using Ubuntu 20 dual booted on windows 10, if that makes a difference!) and a not so usual way to get things done anyhow!
Step 1: Check if NVidia is installed (command for the same is “nvidia-smi”)
If yes, congratulations! Go straight to Step 3.
For others, check for the compatible drivers for your system with the command: “sudo ubuntu-drivers devices”
You may/ may not go for the recommended one. Since each nvidia driver is dependent on a specific Cuda version which in turn is connected to cudnn version and ultimately boils down to TensorFlow/ PyTorch dependence.
So before going for the recommended one, do check if TensorFlow/ PyTorch has a version compatible to the one you’re selecting.
For instance, driver recommended for mine is 460, (latest of all choices available for me) but it requires Cuda 11.2 which doesn’t have a compatible version of TensorFlow. So I went for NVidia 455.
Step 2: Apt install the chosen version. (the usual way!)
sudo apt update
sudo apt upgrade
sudo apt install nvidia-driver-455
If you’re lucky enough, it would be done in one go. But I kept getting some unmet dependencies issues. Tried resolving them individually but kept on getting stuck deeper in the loop. Worse, my Ubuntu boot loader was stuck at NVidia installation.
Tip: If you find yourself in this situation, for the recovery mode, select network and then root command option and execute the following command to free up the boot loader:
sudo apt-get autoremove — purge ‘^nvidia-.*’
Now let’s try the alternate approach:
Get your NVidia drivers downloaded from here [https://www.nvidia.com/en-us/drivers/unix/] and execute below command:
sudo sh NVIDIA-Linux-x86_64–455.45.01.run
[Depending on what driver you installed]
Now try nvidia-smi! Bingo! Stage 1 succeeded.
This gives you the clue for second stage! Get the cuda version exactly the one mentioned in nvidia-smi summary.
Step 3: Check if Cuda is already installed with the command “nvcc — version”
If yes, and the version is same as the required one, skip to Step 5.
If yes but version is different. Execute the below command to get all traces removed from your system and prepare it for fresh installation.
sudo apt-get purge — auto-remove nvidia-cuda-toolkit
Also remove the Cuda directory from /usr/local
Step 4: Apt Install Cuda
sudo apt install nvidia-cuda-toolkit
If you’re facing similar unmet dependencies issues, get required version from here [https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=2004&target_type=runfilelocal] and select installer type runfile (local) and execute the commands you get at the bottom of your screen:
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
sudo sh cuda_11.1.0_455.23.05_linux.run
Uncheck nvidia installation, since we’ve already installed it separately.
After following any of approach, add below environment variables.
export PATH=/usr/local/cuda-11.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH
Step 5: We just need to install Cudnn (the wrapper over Cuda to be used by deep learning frameworks)
No more choices here, one simple step:
Download Cudnn from here [https://developer.nvidia.com/cudnn]
And execute below commands:
sudo tar -xzvf cudnn-11.1-linux-x64-v8.0.5.39.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
And voila! You’re ready to get started on your deep learning journey! You may thank me later! :)