Ubuntu20.04安装Cuda driver + toolkit + cudnn

ubuntu20.04和ubuntu21.10上安装cuda driver貌似有所不同。可以直接使用sudo apt install命令安装。我以前用ubuntu16.04和ubuntu18.04时一直是在nvidia官网下载安装包自己安装，好像那个时候还不支持apt直接安装，所以也不清楚什么时候开始ubuntu可以使用apt安装驱动包了。

自己下载安装包安装的方法参考这里，

Ubuntu18.04查看显卡信息并安装NVDIA显卡驱动driver + Cuda + Cudnn_tanmx219的博客-CSDN博客_ubuntu查看显卡驱动

直接apt安装的方法参考这里，

Install or Upgrade Nvidia Drivers on Ubuntu 21.10 Impish Indri - LinuxCapable

how to properly install nvidia 470 drivers on ubuntu 20.04? - Ask Ubuntud

上面说的大致意思是这样的一些指令，

sudo apt update
sudo apt remove '^nvidia'
sudo apt autoremove

sudo apt-get --purge 'nvidia*'
sudo apt-get remove --purge nvidia\*
sudo sh cuda-*.run --silient --override

sudo apt install nvidia-driver-460

参考说明

apt-get remove packagename will remove the binaries, but not the configuration or data files of the package packagename.
apt-get purge packagename, orapt-get remove --purge packagename will remove about everything regarding the package packagename, [...] Particularly useful when you want to 'start all over' with an application because you messed up the configuration.

说完这些基本信息，下面我们开始一步步安装cuda驱动和cudnn。

安装前的检查（1）检查合适的驱动版本

~$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == modalias : pci:v000010DEd000025A2sv00001043sd000013FCbc03sc00i00 vendor : NVIDIA Corporation driver : nvidia-driver-510 - distro non-free recommended driver : nvidia-driver-470 - distro non-free driver : nvidia-driver-470-server - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin

这里可以看到nvidia-driver-510是推荐的版本，也是最新版，到其官方检查一下，

Release Notes :: CUDA Toolkit Documentation

可以发现，510支持的是cuda 11.6，这个版本实在太新了，对于很多python版本而言，目前还没在相关的支持包。本人常用的pytorch目前也只支持到cuda 11.3，所以后面我实际选择了470这个版本的cuda进行安装，

Table 3. CUDA Toolkit and Corresponding Driver Versions CUDA ToolkitToolkit Driver VersionLinux x86_64 Driver VersionWindows x86_64 Driver VersionCUDA 11.6 Update 1>=510.47.03>=511.65CUDA 11.6 GA>=510.39.01>=511.23CUDA 11.5 Update 2>=495.29.05>=496.13CUDA 11.5 Update 1>=495.29.05>=496.13CUDA 11.5 GA>=495.29.05>=496.04CUDA 11.4 Update 4>=470.82.01>=472.50CUDA 11.4 Update 3>=470.82.01>=472.50CUDA 11.4 Update 2>=470.57.02>=471.41CUDA 11.4 Update 1>=470.57.02>=471.41CUDA 11.4.0 GA>=470.42.01>=471.11CUDA 11.3.1 Update 1>=465.19.01>=465.89CUDA 11.3.0 GA>=465.19.01>=465.89CUDA 11.2.2 Update 2>=460.32.03>=461.33CUDA 11.2.1 Update 1>=460.32.03>=461.09CUDA 11.2.0 GA>=460.27.03>=460.82CUDA 11.1.1 Update 1>=455.32>=456.81CUDA 11.1 GA>=455.23>=456.38CUDA 11.0.3 Update 1>= 450.51.06>= 451.82CUDA 11.0.2 GA>= 450.51.05>= 451.48CUDA 11.0.1 RC>= 450.36.06>= 451.22CUDA 10.2.89>= 440.33>= 441.22CUDA 10.1 (10.1.105 general release, and updates)>= 418.39>= 418.96CUDA 10.0.130>= 410.48>= 411.31CUDA 9.2 (9.2.148 Update 1)>= 396.37>= 398.26CUDA 9.2 (9.2.88)>= 396.26>= 397.44CUDA 9.1 (9.1.85)>= 390.46>= 391.29CUDA 9.0 (9.0.76)>= 384.81>= 385.54CUDA 8.0 (8.0.61 GA2)>= 375.26>= 376.51CUDA 8.0 (8.0.44)>= 367.48>= 369.30CUDA 7.5 (7.5.16)>= 352.31>= 353.66CUDA 7.0 (7.0.28)>= 346.46>= 347.62 （2）检查硬件信息

~$ lshw -c video WARNING: you should run this program as super-user. *-display description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: vga_controller cap_list rom configuration: driver=nouveau latency=0 resources: iomemory:600-5ff iomemory:610-60f irq:164 memory:78000000-78ffffff memory:6000000000-60ffffffff memory:6100000000-6101ffffff ioport:4000(size=128) memory:79000000-7907ffff *-display description: VGA compatible controller product: Intel Corporation vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 01 width: 64 bits clock: 33MHz capabilities: vga_controller bus_master cap_list configuration: driver=i915 latency=0 resources: iomemory:610-60f iomemory:400-3ff irq:163 memory:6106000000-6106ffffff memory:4000000000-400fffffff ioport:5000(size=64) memory:c0000-dffff memory:4010000000-4016ffffff memory:4020000000-40ffffffff WARNING: output may be incomplete or inaccurate, you should run this program as super-user.

可以看到我有两个硬件显卡，驱动分别是nouveau和intel-i915。

（3）检查linux内核版本并安装头文件

具体介绍可以参考本贴，

https://spacevision.blog.csdn.net/article/details/123510743

卸载nouveau驱动程序

驱动nouveau如果不卸载的话，会带来一些不可预测的后果，有时Nvidia驱动还无法安装成功，所以我还是选择了卸载。

How to remove Nouveau kernel driver (fix Nvidia install error)

这是因为，在安装前，Nouveau可能已经启动，所以无法安装Nvidia成功。

#---open a terminal--- sudo apt-get remove nvidia* sudo apt autoremove sudo apt-get install dkms build-essential sudo apt-get install linux-headers-$(uname -r) #非升级安装 sudo apt-get install linux-headers-generic #升级安装

sudo vim /etc/modprobe.d/blacklist.conf #---save the following info into file blacklist.conf--- blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off #---end of the info saved---- #---go back to the terminal--- echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf sudo update-initramfs -u reboot

安装驱动

这一步不是必须的，如果你是单独只安装驱动的话，那就按下面的操作安装驱动程序就可以了。如果你后面还要安装cuda，那就没有必要在这里安装驱动，因为cuda会自带驱动一起安装，所以直接跳过这一小节即可。

在ubuntu20.04上，我实际上只使用了两条指令：

sudo apt update && sudo apt upgrade -y
sudo ubuntu-drivers autoinstall

安装过程中涉及到要不要生成Secure Boot MOK(machine owner key)的问题，当然是选择生成。生成后要输入一个临时密码，随便输入一个，比如Key123456，记住这个密码，因为重启的时候要输入。

重启后，MOK界面会弹出来，要记得选择Enroll Key --> Key on disk之类的，就是把你刚才生成的Key注册到linux内核。已经有了这个Key之后，Secure Boot 是Enable还是Disable就不再那么重要了。

重启后用nvidia-smi命令查看，正常的话会输出相关信息，如下图，

安装cuda-toolkit 自动安装cuda-toolkit

最简单的办法是

sudo apt install nvidia-cuda-toolkit

卸载cuda指令

sudo apt-get autoremove nvidia-cuda-toolkit

我发现自动安装时cuda的版本和driver的版本不一致，不知道为什么；而且有时候，安装的版本并不是我想要的，所以选择了手动安装。

手动安装cuda-toolkit

先看一下安装的驱动版本

照老规矩，去网站上下载了再安装。

下载地址在这里，

https://developer.nvidia.com/cuda-toolkit-archive

官网给出了好几种包，我选的手动安装包，根据官网的提示，安装方法是这样的，例如

版本：11.4.0_470.42.01

wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run sudo sh cuda_11.4.0_470.42.01_linux.run

版本：11.4.3_470.82.01

wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run sudo sh cuda_11.4.3_470.82.01_linux.run

下载的时候注意看版本，比如我的电脑提示驱动是470.86，但我没找到这个470.86的toolkit，所有cuda11.4的包我全部列在下面，我只能用一个最相近的包，版本是470.82，也就是列出来的最后一个包，

https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda_11.4.1_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run

安装时会弹出一些信息，注意，如果你前面没有安装driver的话，这里一定要勾选driver，也就是我们要装的驱动程序，通常我是全部都选择上，如下图所示，

安装完了一般还会显示一段话，就是告诉你接下来还要做哪些工作，

~$ sudo sh cuda_11.4.3_470.82.01_linux.run =========== = Summary = ===========

Driver: Installed Toolkit: Installed in /usr/local/cuda-11.4/ Samples: Installed in /home/mc/, but missing recommended libraries

Please make sure that - PATH includes /usr/local/cuda-11.4/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log

根据上面的输出添加环境变量（现在不知道是不是可以不添加，没去查，反正我加上了），

sudo gedit ~/.bashrc 在 ~/.bashrc 的最后添加 (网上共看到三种添加内容，请根据你的版本添加，例如，11.4的就是下面这个样子，我都是用的第三种，前面两种尤其是第二种配置貌似不正确，列在这里备忘吧):

第一种
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
第二种
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda11.4/lib64

第三种（* $PATH表示添加到PATH）
export PATH=/usr/local/cuda-11.4/bin:$PATH  
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda

配置完环境变量之后，一定要更新一下，否则不能立即生效。也可以通过重启电脑使得环境变量生效： $source ~/.bashrc 或者： $source /etc/profile

最后检查一下，

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

~$ nvidia-smi
Tue Mar 15 21:34:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P0    16W /  N/A |      0MiB /  3910MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

安装cudnn

Installation Guide :: NVIDIA Deep Learning cuDNN DocumentationcuDNN Archive | NVIDIA Developer

Navigate to your directory containing the cuDNN tar file.

Unzip the cuDNN package.

$ tar -xvf cudnn-linux-x86_64-8.x.x.x_cudaX.Y-archive.tar.xz

Copy the following files into the CUDA toolkit directory.

$ sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include 
$ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

我根据自己的版本，操作如下

$ tar -xzvf cudnn-11.4-linux-x64-v8.2.4.15.tgz

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

===================================================================

这里最重要的就是要安装好驱动，参考文件的原文内容就不抄了，网络上不去的就看下面的图，

Ubuntu20.04安装Cuda driver + toolkit + cudnn

[ 申请 ]友情链接：