您当前的位置: 首页 >  ubuntu
  • 0浏览

    0关注

    483博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

Ubuntu20.04安装Cuda driver + toolkit + cudnn

高精度计算机视觉 发布时间:2022-02-01 00:22:22 ,浏览量:0

ubuntu20.04和ubuntu21.10上安装cuda driver貌似有所不同。可以直接使用sudo apt install命令安装。我以前用ubuntu16.04和ubuntu18.04时一直是在nvidia官网下载安装包自己安装,好像那个时候还不支持apt直接安装,所以也不清楚什么时候开始ubuntu可以使用apt安装驱动包了。

自己下载安装包安装的方法参考这里,

Ubuntu18.04查看显卡信息并安装NVDIA显卡驱动driver + Cuda + Cudnn_tanmx219的博客-CSDN博客_ubuntu查看显卡驱动

直接apt安装的方法参考这里,

Install or Upgrade Nvidia Drivers on Ubuntu 21.10 Impish Indri - LinuxCapable

how to properly install nvidia 470 drivers on ubuntu 20.04? - Ask Ubuntud

上面说的大致意思是这样的一些指令,

sudo apt update
sudo apt remove '^nvidia'
sudo apt autoremove

sudo apt-get --purge 'nvidia*'
sudo apt-get remove --purge nvidia\*
sudo sh cuda-*.run --silient --override

sudo apt install nvidia-driver-460

 参考说明

  • apt-get remove packagename will remove the binaries, but not the configuration or data files of the package packagename.
  • apt-get purge packagename, orapt-get remove --purge packagename will remove about everything regarding the package packagename, [...] Particularly useful when you want to 'start all over' with an application because you messed up the configuration. 

说完这些基本信息,下面我们开始一步步安装cuda驱动和cudnn。

安装前的检查 (1)检查合适的驱动版本

~$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == modalias : pci:v000010DEd000025A2sv00001043sd000013FCbc03sc00i00 vendor   : NVIDIA Corporation driver   : nvidia-driver-510 - distro non-free recommended driver   : nvidia-driver-470 - distro non-free driver   : nvidia-driver-470-server - distro non-free driver   : xserver-xorg-video-nouveau - distro free builtin

这里可以看到nvidia-driver-510是推荐的版本,也是最新版,到其官方检查一下,

Release Notes :: CUDA Toolkit Documentation

可以发现,510支持的是cuda 11.6,这个版本实在太新了,对于很多python版本而言,目前还没在相关的支持包。本人常用的pytorch目前也只支持到cuda 11.3,所以后面我实际选择了470这个版本的cuda进行安装,

Table 3. CUDA Toolkit and Corresponding Driver Versions CUDA ToolkitToolkit Driver VersionLinux x86_64 Driver VersionWindows x86_64 Driver VersionCUDA 11.6 Update 1>=510.47.03>=511.65CUDA 11.6 GA>=510.39.01>=511.23CUDA 11.5 Update 2>=495.29.05>=496.13CUDA 11.5 Update 1>=495.29.05>=496.13CUDA 11.5 GA>=495.29.05>=496.04CUDA 11.4 Update 4>=470.82.01>=472.50CUDA 11.4 Update 3>=470.82.01>=472.50CUDA 11.4 Update 2>=470.57.02>=471.41CUDA 11.4 Update 1>=470.57.02>=471.41CUDA 11.4.0 GA>=470.42.01>=471.11CUDA 11.3.1 Update 1>=465.19.01>=465.89CUDA 11.3.0 GA>=465.19.01>=465.89CUDA 11.2.2 Update 2>=460.32.03>=461.33CUDA 11.2.1 Update 1>=460.32.03>=461.09CUDA 11.2.0 GA>=460.27.03>=460.82CUDA 11.1.1 Update 1>=455.32>=456.81CUDA 11.1 GA>=455.23>=456.38CUDA 11.0.3 Update 1>= 450.51.06>= 451.82CUDA 11.0.2 GA>= 450.51.05>= 451.48CUDA 11.0.1 RC>= 450.36.06>= 451.22CUDA 10.2.89>= 440.33>= 441.22CUDA 10.1 (10.1.105 general release, and updates)>= 418.39>= 418.96CUDA 10.0.130>= 410.48>= 411.31CUDA 9.2 (9.2.148 Update 1)>= 396.37>= 398.26CUDA 9.2 (9.2.88)>= 396.26>= 397.44CUDA 9.1 (9.1.85)>= 390.46>= 391.29CUDA 9.0 (9.0.76)>= 384.81>= 385.54CUDA 8.0 (8.0.61 GA2)>= 375.26>= 376.51CUDA 8.0 (8.0.44)>= 367.48>= 369.30CUDA 7.5 (7.5.16)>= 352.31>= 353.66CUDA 7.0 (7.0.28)>= 346.46>= 347.62 (2)检查硬件信息

~$ lshw -c video WARNING: you should run this program as super-user.   *-display                         description: VGA compatible controller        product: NVIDIA Corporation        vendor: NVIDIA Corporation        physical id: 0        bus info: pci@0000:01:00.0        version: a1        width: 64 bits        clock: 33MHz        capabilities: vga_controller cap_list rom        configuration: driver=nouveau latency=0        resources: iomemory:600-5ff iomemory:610-60f irq:164 memory:78000000-78ffffff memory:6000000000-60ffffffff memory:6100000000-6101ffffff ioport:4000(size=128) memory:79000000-7907ffff   *-display        description: VGA compatible controller        product: Intel Corporation        vendor: Intel Corporation        physical id: 2        bus info: pci@0000:00:02.0        version: 01        width: 64 bits        clock: 33MHz        capabilities: vga_controller bus_master cap_list        configuration: driver=i915 latency=0        resources: iomemory:610-60f iomemory:400-3ff irq:163 memory:6106000000-6106ffffff memory:4000000000-400fffffff ioport:5000(size=64) memory:c0000-dffff memory:4010000000-4016ffffff memory:4020000000-40ffffffff WARNING: output may be incomplete or inaccurate, you should run this program as super-user.

可以看到我有两个硬件显卡,驱动分别是nouveau和intel-i915。

(3)检查linux内核版本并安装头文件

具体介绍可以参考本贴, 

https://spacevision.blog.csdn.net/article/details/123510743

卸载nouveau驱动程序

驱动nouveau如果不卸载的话,会带来一些不可预测的后果,有时Nvidia驱动还无法安装成功,所以我还是选择了卸载。

How to remove Nouveau kernel driver (fix Nvidia install error)

这是因为,在安装前,Nouveau可能已经启动,所以无法安装Nvidia成功。

#---open a terminal--- sudo apt-get remove nvidia* sudo apt autoremove sudo apt-get install dkms build-essential sudo apt-get install linux-headers-$(uname -r)   #非升级安装 sudo apt-get install linux-headers-generic          #升级安装

  sudo vim /etc/modprobe.d/blacklist.conf #---save the following info into file blacklist.conf--- blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off #---end of the info saved----   #---go back to the terminal--- echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf sudo update-initramfs -u reboot  

安装驱动

这一步不是必须的,如果你是单独只安装驱动的话,那就按下面的操作安装驱动程序就可以了。如果你后面还要安装cuda,那就没有必要在这里安装驱动,因为cuda会自带驱动一起安装,所以直接跳过这一小节即可。

在ubuntu20.04上,我实际上只使用了两条指令:

sudo apt update && sudo apt upgrade -y
sudo ubuntu-drivers autoinstall

安装过程中涉及到要不要生成Secure Boot MOK(machine owner key)的问题,当然是选择生成。生成后要输入一个临时密码,随便输入一个,比如Key123456,记住这个密码,因为重启的时候要输入。

重启后,MOK界面会弹出来,要记得选择Enroll Key --> Key on disk之类的,就是把你刚才生成的Key注册到linux内核。已经有了这个Key之后,Secure Boot 是Enable还是Disable就不再那么重要了。

重启后用nvidia-smi命令查看,正常的话会输出相关信息,如下图,

安装cuda-toolkit 自动安装cuda-toolkit

最简单的办法是

sudo apt install nvidia-cuda-toolkit

卸载cuda指令

sudo apt-get autoremove nvidia-cuda-toolkit

我发现自动安装时cuda的版本和driver的版本不一致,不知道为什么;而且有时候,安装的版本并不是我想要的,所以选择了手动安装。

手动安装cuda-toolkit

先看一下安装的驱动版本

照老规矩,去网站上下载了再安装。

下载地址在这里,

https://developer.nvidia.com/cuda-toolkit-archive

官网给出了好几种包,我选的手动安装包,根据官网的提示,安装方法是这样的,例如

版本:11.4.0_470.42.01

wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run sudo sh cuda_11.4.0_470.42.01_linux.run

版本:11.4.3_470.82.01 

wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run sudo sh cuda_11.4.3_470.82.01_linux.run

下载的时候注意看版本,比如我的电脑提示驱动是470.86,但我没找到这个470.86的toolkit,所有cuda11.4的包我全部列在下面,我只能用一个最相近的包,版本是470.82,也就是列出来的最后一个包,

https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda_11.4.1_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run

安装时会弹出一些信息,注意,如果你前面没有安装driver的话,这里一定要勾选driver,也就是我们要装的驱动程序,通常我是全部都选择上,如下图所示,

安装完了一般还会显示一段话,就是告诉你接下来还要做哪些工作,

~$ sudo sh cuda_11.4.3_470.82.01_linux.run =========== = Summary = ===========

Driver:   Installed Toolkit:  Installed in /usr/local/cuda-11.4/ Samples:  Installed in /home/mc/, but missing recommended libraries

Please make sure that  -   PATH includes /usr/local/cuda-11.4/bin  -   LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log

 根据上面的输出添加环境变量(现在不知道是不是可以不添加,没去查,反正我加上了),

sudo gedit ~/.bashrc 在 ~/.bashrc 的最后添加 (网上共看到三种添加内容,请根据你的版本添加,例如,11.4的就是下面这个样子,我都是用的第三种,前面两种尤其是第二种配置貌似不正确,列在这里备忘吧):

第一种
export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
第二种
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda11.4/lib64

第三种(* $PATH表示添加到PATH)
export PATH=/usr/local/cuda-11.4/bin:$PATH  
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda

配置完环境变量之后,一定要更新一下,否则不能立即生效。也可以通过重启电脑使得环境变量生效: $source ~/.bashrc 或者: $source /etc/profile

最后检查一下,

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

~$ nvidia-smi
Tue Mar 15 21:34:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P0    16W /  N/A |      0MiB /  3910MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

安装cudnn

Installation Guide :: NVIDIA Deep Learning cuDNN DocumentationcuDNN Archive | NVIDIA Developer

  1. Navigate to your directory containing the cuDNN tar file.
  2. Unzip the cuDNN package.
    $ tar -xvf cudnn-linux-x86_64-8.x.x.x_cudaX.Y-archive.tar.xz
  3. Copy the following files into the CUDA toolkit directory.
    $ sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include 
    $ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 
    $ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
    

 我根据自己的版本,操作如下

$ tar -xzvf cudnn-11.4-linux-x64-v8.2.4.15.tgz

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

===================================================================

这里最重要的就是要安装好驱动,参考文件的原文内容就不抄了,网络上不去的就看下面的图,

关注
打赏
1661664439
查看更多评论
立即登录/注册

微信扫码登录

0.0397s