Detected non-NVML platform: could not load NVML: libnvidia-ml.so.1: cannot open shared object

发布时间 2023-10-18 21:44:26作者: 牛奔

前言

kubernetes 中配置 https://github.com/NVIDIA/k8s-device-plugin 时,
报错:Detected non-NVML platform: could not load NVML: libnvidia-ml.so.1: cannot open shared object

解决

kubernetes 使用运行时 docker,需要编辑通常存在的配置文件 /etc/docker/daemon.json, 以设置 nvidia-container-runtime 为默认的低级运行时:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

然后重新启动 docker

$ sudo systemctl restart docker