问题描述#
在 Linux 上,非 root 用户使用 conda 创建虚拟环境并安装 deepspeed 时,报如下错误:
(kg2rag) yuduoyi@4090:~/PycharmProjects$ pip install deepspeed
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting deepspeed
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8c/45/1ddcf8f8500d90fad9f3396bfb1295462d46a747ddcbfafcd7714d46827e/deepspeed-0.17.2.tar.gz (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 5.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
[2025-07-10 10:41:24,771] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /data/yuduoyi/.triton/autotune: No such file or directory
Warning: The cache directory for DeepSpeed Triton autotune, /data/yuduoyi/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
[2025-07-10 10:41:25,090] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
...
...
File "/tmp/pip-install-lnmi2acf/deepspeed_052f5df039484015b1e34a9fa83af1b9/op_builder/builder.py", line 51, in installed_cuda_version
raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
安装 nvidia-cuda-toolkit 后,又出现以下报错:
(kg2rag) yuduoyi@4090:~$ pip install deepspeed
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting deepspeed
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/8c/45/1ddcf8f8500d90fad9f3396bfb1295462d46a747ddcbfafcd7714d46827e/deepspeed-0.17.2.tar.gz (1.6 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [29 lines of output]
[2025-07-10 10:50:09,878] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The cache directory for DeepSpeed Triton autotune, /data/yuduoyi/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
[2025-07-10 10:50:10,502] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
...
...
FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
解决方法#
- 针对
op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)报错,参考 是丝豆呀 使用conda install -c nvidia cuda-compiler解决。此时使用nvcc -V可以检查是否成功安装。 - 针对
FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc',参考 鳗小鱼 博客解决:which nvcc查看具体路径,输出为/data/yuduoyi/anaconda3/envs/kg2rag/bin/nvcc;- Linux 下修改环境变量
export CUDA_HOME=/data/yuduoyi/anaconda3/envs/kg2rag/,再运行pip install deepspeed,成功安装!