跳过正文

Linux 非 root 用户安装 deepspeed

·706 字·2 分钟
Python LLM DeepSpeed

问题描述
#

在 Linux 上,非 root 用户使用 conda 创建虚拟环境并安装 deepspeed 时,报如下错误:

(kg2rag) yuduoyi@4090:~/PycharmProjects$ pip install deepspeed
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting deepspeed
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8c/45/1ddcf8f8500d90fad9f3396bfb1295462d46a747ddcbfafcd7714d46827e/deepspeed-0.17.2.tar.gz (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 5.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
exit code: 1
  ╰─> [22 lines of output]
      [2025-07-10 10:41:24,771] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      df: /data/yuduoyi/.triton/autotune: No such file or directory
      Warning: The cache directory for DeepSpeed Triton autotune, /data/yuduoyi/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
      [2025-07-10 10:41:25,090] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      Traceback (most recent call last):
        ...
        ...
        File "/tmp/pip-install-lnmi2acf/deepspeed_052f5df039484015b1e34a9fa83af1b9/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

安装 nvidia-cuda-toolkit 后,又出现以下报错:

(kg2rag) yuduoyi@4090:~$ pip install deepspeed                                                                                                                 
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple                                                                                                                   
Collecting deepspeed                                                                                                                                                           
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/8c/45/1ddcf8f8500d90fad9f3396bfb1295462d46a747ddcbfafcd7714d46827e/deepspeed-0.17.2.tar.gz (1.6 MB)                  
  Preparing metadata (setup.py) ... error                                                                                                                                      
  error: subprocess-exited-with-error                                                                                                                                          
                                                                                                                                                                               
  × python setup.py egg_info did not run successfully.                                                                                                                         
exit code: 1                                                                                                                                                               
  ╰─> [29 lines of output]                                                                                                                                                     
      [2025-07-10 10:50:09,878] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)                                                  
      Warning: The cache directory for DeepSpeed Triton autotune, /data/yuduoyi/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.                                        
      [2025-07-10 10:50:10,502] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)                                                  
      Traceback (most recent call last):                                                                                                                                       
        ...
        ...
      FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

解决方法
#

  1. 针对 op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s) 报错,参考 是丝豆呀 使用 conda install -c nvidia cuda-compiler 解决。此时使用 nvcc -V 可以检查是否成功安装。
  2. 针对 FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc',参考 鳗小鱼 博客解决:
    • which nvcc 查看具体路径,输出为 /data/yuduoyi/anaconda3/envs/kg2rag/bin/nvcc
    • Linux 下修改环境变量 export CUDA_HOME=/data/yuduoyi/anaconda3/envs/kg2rag/,再运行 pip install deepspeed,成功安装!