ubuntu20.04和ubuntu21.10上安装cuda driver貌似有所不同。可以直接使用sudo apt install命令安装。我以前用ubuntu16.04和ubuntu18.04时一直是在nvidia官网下载安装包自己安装,好像那个时候还不支持apt直接安装,所以也不清楚什么时候开始ubuntu可以使用apt安装驱动包了。
自己下载安装包安装的方法参考这里,
Ubuntu18.04查看显卡信息并安装NVDIA显卡驱动driver + Cuda + Cudnn_tanmx219的博客-CSDN博客_ubuntu查看显卡驱动
直接apt安装的方法参考这里,
Install or Upgrade Nvidia Drivers on Ubuntu 21.10 Impish Indri - LinuxCapable
how to properly install nvidia 470 drivers on ubuntu 20.04? - Ask Ubuntud
上面说的大致意思是这样的一些指令,
sudo apt update
sudo apt remove '^nvidia'
sudo apt autoremove
sudo apt-get --purge 'nvidia*'
sudo apt-get remove --purge nvidia\*
sudo sh cuda-*.run --silient --override
sudo apt install nvidia-driver-460
参考说明
apt-get remove packagename
will remove the binaries, but not the configuration or data files of the packagepackagename
.apt-get purge packagename
, orapt-get remove --purge packagename
will remove about everything regarding the packagepackagename
, [...] Particularly useful when you want to 'start all over' with an application because you messed up the configuration.
说完这些基本信息,下面我们开始一步步安装cuda驱动和cudnn。
安装前的检查 (1)检查合适的驱动版本~$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == modalias : pci:v000010DEd000025A2sv00001043sd000013FCbc03sc00i00 vendor : NVIDIA Corporation driver : nvidia-driver-510 - distro non-free recommended driver : nvidia-driver-470 - distro non-free driver : nvidia-driver-470-server - distro non-free driver : xserver-xorg-video-nouveau - distro free builtin
这里可以看到nvidia-driver-510是推荐的版本,也是最新版,到其官方检查一下,
Release Notes :: CUDA Toolkit Documentation
可以发现,510支持的是cuda 11.6,这个版本实在太新了,对于很多python版本而言,目前还没在相关的支持包。本人常用的pytorch目前也只支持到cuda 11.3,所以后面我实际选择了470这个版本的cuda进行安装,
Table 3. CUDA Toolkit and Corresponding Driver Versions CUDA ToolkitToolkit Driver VersionLinux x86_64 Driver VersionWindows x86_64 Driver VersionCUDA 11.6 Update 1>=510.47.03>=511.65CUDA 11.6 GA>=510.39.01>=511.23CUDA 11.5 Update 2>=495.29.05>=496.13CUDA 11.5 Update 1>=495.29.05>=496.13CUDA 11.5 GA>=495.29.05>=496.04CUDA 11.4 Update 4>=470.82.01>=472.50CUDA 11.4 Update 3>=470.82.01>=472.50CUDA 11.4 Update 2>=470.57.02>=471.41CUDA 11.4 Update 1>=470.57.02>=471.41CUDA 11.4.0 GA>=470.42.01>=471.11CUDA 11.3.1 Update 1>=465.19.01>=465.89CUDA 11.3.0 GA>=465.19.01>=465.89CUDA 11.2.2 Update 2>=460.32.03>=461.33CUDA 11.2.1 Update 1>=460.32.03>=461.09CUDA 11.2.0 GA>=460.27.03>=460.82CUDA 11.1.1 Update 1>=455.32>=456.81CUDA 11.1 GA>=455.23>=456.38CUDA 11.0.3 Update 1>= 450.51.06>= 451.82CUDA 11.0.2 GA>= 450.51.05>= 451.48CUDA 11.0.1 RC>= 450.36.06>= 451.22CUDA 10.2.89>= 440.33>= 441.22CUDA 10.1 (10.1.105 general release, and updates)>= 418.39>= 418.96CUDA 10.0.130>= 410.48>= 411.31CUDA 9.2 (9.2.148 Update 1)>= 396.37>= 398.26CUDA 9.2 (9.2.88)>= 396.26>= 397.44CUDA 9.1 (9.1.85)>= 390.46>= 391.29CUDA 9.0 (9.0.76)>= 384.81>= 385.54CUDA 8.0 (8.0.61 GA2)>= 375.26>= 376.51CUDA 8.0 (8.0.44)>= 367.48>= 369.30CUDA 7.5 (7.5.16)>= 352.31>= 353.66CUDA 7.0 (7.0.28)>= 346.46>= 347.62 (2)检查硬件信息~$ lshw -c video WARNING: you should run this program as super-user. *-display description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: vga_controller cap_list rom configuration: driver=nouveau latency=0 resources: iomemory:600-5ff iomemory:610-60f irq:164 memory:78000000-78ffffff memory:6000000000-60ffffffff memory:6100000000-6101ffffff ioport:4000(size=128) memory:79000000-7907ffff *-display description: VGA compatible controller product: Intel Corporation vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 01 width: 64 bits clock: 33MHz capabilities: vga_controller bus_master cap_list configuration: driver=i915 latency=0 resources: iomemory:610-60f iomemory:400-3ff irq:163 memory:6106000000-6106ffffff memory:4000000000-400fffffff ioport:5000(size=64) memory:c0000-dffff memory:4010000000-4016ffffff memory:4020000000-40ffffffff WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
可以看到我有两个硬件显卡,驱动分别是nouveau和intel-i915。
(3)检查linux内核版本并安装头文件具体介绍可以参考本贴,
https://spacevision.blog.csdn.net/article/details/123510743
卸载nouveau驱动程序驱动nouveau如果不卸载的话,会带来一些不可预测的后果,有时Nvidia驱动还无法安装成功,所以我还是选择了卸载。
How to remove Nouveau kernel driver (fix Nvidia install error)
这是因为,在安装前,Nouveau可能已经启动,所以无法安装Nvidia成功。
#---open a terminal--- sudo apt-get remove nvidia* sudo apt autoremove sudo apt-get install dkms build-essential sudo apt-get install linux-headers-$(uname -r) #非升级安装 sudo apt-get install linux-headers-generic #升级安装
sudo vim /etc/modprobe.d/blacklist.conf #---save the following info into file blacklist.conf--- blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off #---end of the info saved---- #---go back to the terminal--- echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf sudo update-initramfs -u reboot
安装驱动这一步不是必须的,如果你是单独只安装驱动的话,那就按下面的操作安装驱动程序就可以了。如果你后面还要安装cuda,那就没有必要在这里安装驱动,因为cuda会自带驱动一起安装,所以直接跳过这一小节即可。
在ubuntu20.04上,我实际上只使用了两条指令:
sudo apt update && sudo apt upgrade -y
sudo ubuntu-drivers autoinstall
安装过程中涉及到要不要生成Secure Boot MOK(machine owner key)的问题,当然是选择生成。生成后要输入一个临时密码,随便输入一个,比如Key123456,记住这个密码,因为重启的时候要输入。
重启后,MOK界面会弹出来,要记得选择Enroll Key --> Key on disk之类的,就是把你刚才生成的Key注册到linux内核。已经有了这个Key之后,Secure Boot 是Enable还是Disable就不再那么重要了。
重启后用nvidia-smi命令查看,正常的话会输出相关信息,如下图,
最简单的办法是
sudo apt install nvidia-cuda-toolkit
卸载cuda指令
sudo apt-get autoremove nvidia-cuda-toolkit
我发现自动安装时cuda的版本和driver的版本不一致,不知道为什么;而且有时候,安装的版本并不是我想要的,所以选择了手动安装。
手动安装cuda-toolkit先看一下安装的驱动版本
照老规矩,去网站上下载了再安装。
下载地址在这里,
https://developer.nvidia.com/cuda-toolkit-archive
官网给出了好几种包,我选的手动安装包,根据官网的提示,安装方法是这样的,例如
版本:11.4.0_470.42.01wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run sudo sh cuda_11.4.0_470.42.01_linux.run
版本:11.4.3_470.82.01
wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run sudo sh cuda_11.4.3_470.82.01_linux.run
下载的时候注意看版本,比如我的电脑提示驱动是470.86,但我没找到这个470.86的toolkit,所有cuda11.4的包我全部列在下面,我只能用一个最相近的包,版本是470.82,也就是列出来的最后一个包,
https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda_11.4.1_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run
安装时会弹出一些信息,注意,如果你前面没有安装driver的话,这里一定要勾选driver,也就是我们要装的驱动程序,通常我是全部都选择上,如下图所示,
安装完了一般还会显示一段话,就是告诉你接下来还要做哪些工作,
~$ sudo sh cuda_11.4.3_470.82.01_linux.run =========== = Summary = ===========
Driver: Installed Toolkit: Installed in /usr/local/cuda-11.4/ Samples: Installed in /home/mc/, but missing recommended libraries
Please make sure that - PATH includes /usr/local/cuda-11.4/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log
根据上面的输出添加环境变量(现在不知道是不是可以不添加,没去查,反正我加上了),
sudo gedit ~/.bashrc 在 ~/.bashrc 的最后添加 (网上共看到三种添加内容,请根据你的版本添加,例如,11.4的就是下面这个样子,我都是用的第三种,前面两种尤其是第二种配置貌似不正确,列在这里备忘吧):
第一种
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
第二种
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda11.4/lib64
第三种(* $PATH表示添加到PATH)
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
配置完环境变量之后,一定要更新一下,否则不能立即生效。也可以通过重启电脑使得环境变量生效: $source ~/.bashrc 或者: $source /etc/profile
最后检查一下,
~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0
~$ nvidia-smi
Tue Mar 15 21:34:42 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 54C P0 16W / N/A | 0MiB / 3910MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
安装cudnn
Installation Guide :: NVIDIA Deep Learning cuDNN DocumentationcuDNN Archive | NVIDIA Developer
- Navigate to your directory containing the cuDNN tar file.
- Unzip the cuDNN package.
$ tar -xvf cudnn-linux-x86_64-8.x.x.x_cudaX.Y-archive.tar.xz
- Copy the following files into the CUDA toolkit directory.
$ sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include $ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 $ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
我根据自己的版本,操作如下
$ tar -xzvf cudnn-11.4-linux-x64-v8.2.4.15.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
===================================================================
这里最重要的就是要安装好驱动,参考文件的原文内容就不抄了,网络上不去的就看下面的图,