基础环境
root@server1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
root@server1:~# lspci | grep -i nvidia
34:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
35:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
36:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
37:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
9b:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
9c:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
9d:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
9e:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 80GB] (rev a1)
显卡驱动安装
1.环境准备
1.1 删除之前安装的驱动
可以通过指令sudo apt purge nvidia*删除以前安装的NVIDIA驱动版本,重新安装。
sudo apt purge nvidia*
1.2 关闭系统自带的nouveau
在安装NVIDIA驱动以前需要禁止系统自带显卡驱动nouveau:
可以先通过指令lsmod | grep nouveau查看nouveau驱动的启用情况,如果有输出表示nouveau驱动正在工作,如果没有内容输出则表示已经禁用了nouveau。
如果有则按照下面操作禁用:
在终端输入sudo vim /etc/modprobe.d/blacklist.conf弹出blacklist.conf文件:在blacklist.conf文件末尾加上这两行,并保存:
blacklist nouveau
options nouveau modeset=0
在终端中输入以下指令,使修改生效:
sudo update-initramfs -u #应用更改
重启,就禁止了ubuntu22.04自带的nouveau显卡驱动了,接下来我们就可以安心的安装驱动程序了
如果重启后,光标闪烁,无法开机,则需要,在重启的时候,按住ESC或者F2,进入recover 模式,进行下面的步骤。
2.安装显卡驱动
2.1.查询系统建议安装的nvidia版本
root@server1:~# ubuntu-drivers devices
ERROR:root:aplay command not found
== /sys/devices/pci0000:30/0000:30:02.0/0000:31:00.0/0000:32:04.0/0000:37:00.0 ==
modalias : pci:v000010DEd000020B5sv000010DEsd00001533bc03sc02i00
vendor : NVIDIA Corporation
model : GA100 [A100 PCIe 80GB]
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-550-open - distro non-free
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-535-server-open - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-545-open - distro non-free
driver : nvidia-driver-550 - distro non-free recommended #推荐安装nvidia-driver-550
driver : xserver-xorg-video-nouveau - distro free builtin
2.2.安装推荐驱动
- 使用 ubuntu-drivers 工具:
sudo ubuntu-drivers autoinstall
该命令将自动安装系统推荐的驱动。
- 手动安装:
sudo apt install nvidia-driver-550
安装完驱动后,必须重启系统才能生效。
2.3.查看Nvidia Driver 信息以及显卡信息
root@server1:~# nvidia-smi
Thu Dec 12 02:20:47 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:34:00.0 Off | 0 |
| N/A 34C P0 52W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe Off | 00000000:35:00.0 Off | 0 |
| N/A 37C P0 54W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100 80GB PCIe Off | 00000000:36:00.0 Off | 0 |
| N/A 36C P0 53W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100 80GB PCIe Off | 00000000:37:00.0 Off | 0 |
| N/A 36C P0 55W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100 80GB PCIe Off | 00000000:9B:00.0 Off | 0 |
| N/A 35C P0 51W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100 80GB PCIe Off | 00000000:9C:00.0 Off | 0 |
| N/A 36C P0 54W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100 80GB PCIe Off | 00000000:9D:00.0 Off | 0 |
| N/A 35C P0 51W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100 80GB PCIe Off | 00000000:9E:00.0 Off | 0 |
| N/A 34C P0 51W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
注意:这里右上角所显示的 cuda version 是指当前 nvidia 所支持的 cuda 的最高版本,也就是说是可以兼容 12.4 的
nvidia-smi 显示的的 cuda version 是当前驱动支持的最大 cuda toolkit 的版本。
2.4.安装 CUDA
注意:驱动在上面已经装过了,只需要安装cuda-toolkit-12-4即可
设置环境变量并验证cuda是否配置成功:
root@server1:~# echo "export PATH=$PATH:/usr/local/cuda/bin" >> /etc/profile.d/cuda.sh
root@server1:~# echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64" >> /etc/profile.d/cuda.sh
root@server1:~# echo "export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64" >> /etc/profile.d/cuda.sh
root@server1:~# source /etc/profile.d/cuda.sh
root@server1:~# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
3.卸载 nvidia 驱动方法
sudo apt-get purge nvidia-*
sudo apt-get update
sudo apt-get autoremove
最好重启下系统,清理卸载残留
安装Anaconda
1.下载Anaconda
下载地址:https://www.anaconda.com/download
2.部署Anaconda
root@server1:~# bash Anaconda3-2024.10-1-Linux-x86_64.sh
Welcome to Anaconda3 2024.10-1
In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>> #摁回车
ANACONDA TERMS OF SERVICE
...... #太多省略 摁q取消查看
Do you accept the license terms? [yes|no] #是否接受许可条款
>>> yes
Anaconda3 will now be installed into this location: #选择安装路径 不可事先创建
/root/anaconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
[/root/anaconda3] >>> /anaconda
PREFIX=/anaconda
Unpacking payload ...
Installing base environment...
Downloading and Extracting Packages: #下载并解压软件包
...... #太多省略
Downloading and Extracting Packages:
Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
run the following command when conda is activated:
conda config --set auto_activate_base false
#是否希望 conda 在每次打开终端时自动激活。如果您选择 "yes",每次打开终端时,conda 环境就会自动启动,方便直接使用 conda 命令。
You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
no change /anaconda/condabin/conda
no change /anaconda/bin/conda
no change /anaconda/bin/conda-env
no change /anaconda/bin/activate
no change /anaconda/bin/deactivate
no change /anaconda/etc/profile.d/conda.sh
no change /anaconda/etc/fish/conf.d/conda.fish
no change /anaconda/shell/condabin/Conda.psm1
no change /anaconda/shell/condabin/conda-hook.ps1
no change /anaconda/lib/python3.12/site-packages/xontrib/conda.xsh
no change /anaconda/etc/profile.d/conda.csh
modified /root/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
Thank you for installing Anaconda3!
3.设置环境变量
echo "export PATH=$PATH:/anaconda/bin/" >> /etc/profile.d/anaconda.sh && source /etc/profile.d/anaconda.sh
Anaconda的使用
1.创建虚拟环境
conda create --name newenv python=3.10
示例:
root@server1:~# conda create --name DeepSeek-V2.5 python=3.10
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /anaconda/envs/DeepSeek-V2.5
added / updated specs:
- python=3.10
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2024.11.26 | h06a4308_0 131 KB
pip-24.2 | py310h06a4308_0 2.3 MB
python-3.10.16 | he870216_1 26.9 MB
setuptools-75.1.0 | py310h06a4308_0 1.7 MB
wheel-0.44.0 | py310h06a4308_0 109 KB
------------------------------------------------------------
Total: 31.1 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
ca-certificates pkgs/main/linux-64::ca-certificates-2024.11.26-h06a4308_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl pkgs/main/linux-64::openssl-3.0.15-h5eee18b_0
pip pkgs/main/linux-64::pip-24.2-py310h06a4308_0
python pkgs/main/linux-64::python-3.10.16-he870216_1
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-75.1.0-py310h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0
tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0
tzdata pkgs/main/noarch::tzdata-2024b-h04d1e81_0
wheel pkgs/main/linux-64::wheel-0.44.0-py310h06a4308_0
xz pkgs/main/linux-64::xz-5.4.6-h5eee18b_1
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1
Proceed ([y]/n)? y #是否确认安装
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate DeepSeek-V2.5
#
# To deactivate an active environment, use
#
# $ conda deactivate
2.查看虚拟环境
conda env list
示例:
root@server1:~# conda env list
# conda environments:
#
base /anaconda
DeepSeek-V2.5 /anaconda/envs/DeepSeek-V2.5
3.使用虚拟环境
conda activate newenv
示例:
root@server1:~# conda activate DeepSeek-V2.5
(DeepSeek-V2.5) root@server1:~#
可能会出现 CondaError: Run 'conda init' before 'conda activate' 报错
root@server1:~# conda activate DeepSeek-V2.5
CondaError: Run 'conda init' before 'conda activate'
解决办法:
source .bashrc #进入conda (base) 环境
conda deactivate #退出conda (base) 环境
4.退出虚拟环境
conda deactivate
示例:
(DeepSeek-V2.5) root@server1:~# conda deactivate
root@server1:~#
5.删除虚拟环境
conda remove --name mynewenv --all
示例:
root@server1:~# conda remove --name DeepSeek-V2.5 --all
Remove all packages in environment /anaconda/envs/DeepSeek-V2.5:
## Package Plan ##
environment location: /anaconda/envs/DeepSeek-V2.5
The following packages will be REMOVED:
_libgcc_mutex-0.1-main
_openmp_mutex-5.1-1_gnu
bzip2-1.0.8-h5eee18b_6
ca-certificates-2024.11.26-h06a4308_0
ld_impl_linux-64-2.40-h12ee557_0
libffi-3.4.4-h6a678d5_1
libgcc-ng-11.2.0-h1234567_1
libgomp-11.2.0-h1234567_1
libstdcxx-ng-11.2.0-h1234567_1
libuuid-1.41.5-h5eee18b_0
ncurses-6.4-h6a678d5_0
openssl-3.0.15-h5eee18b_0
pip-24.2-py310h06a4308_0
python-3.10.16-he870216_1
readline-8.2-h5eee18b_0
setuptools-75.1.0-py310h06a4308_0
sqlite-3.45.3-h5eee18b_0
tk-8.6.14-h39e8969_0
tzdata-2024b-h04d1e81_0
wheel-0.44.0-py310h06a4308_0
xz-5.4.6-h5eee18b_1
zlib-1.2.13-h5eee18b_1
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Everything found within the environment (/anaconda/envs/DeepSeek-V2.5), including any conda environment configurations and any non-conda files, will be deleted. Do you wish to continue?
#是否删除环境 (/anaconda/envs/DeepSeek-V2.5) 中的所有内容
(y/[n])? y
6.查看虚拟环境的默认目录
conda config --show envs_dirs
示例:
root@server1:~# conda config --show envs_dirs
envs_dirs:
- /anaconda/envs
- /root/.conda/envs
7.修改虚拟环境的默认目录
conda config --add envs_dirs <new_directory_path>
envs_dirs 列表中的第一个路径是 Conda 创建新的虚拟环境时默认使用的目录。
要想修改这个默认目录,只需添加一个新的目录,这个新添加的目录就会排在列表最前面,成为新的默认目录。
仅登录用户可评论,点击 登录