centos tensorflow gpu docker

发布时间 2023-10-09 22:46:50作者: emanlee

 

locate nvidia-docker

 

 下载Tensorflow Docker映像

您可以一次使用多个变体。例如,以下命令会将 TensorFlow 版本映像下载到计算机上:

docker pull tensorflow/tensorflow                     # latest stable release
docker pull tensorflow/tensorflow:devel-gpu           # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter  # latest release w/ GPU support and Jupyter

 

 

如果有下面的错误,说明没有启动相关服务:
[root@ourui]# nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
docker: Error response from daemon: create nvidia_driver_367.48: create nvidia_driver_367.48: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
See 'docker run --help'.

#  nvidia-docker
/usr/bin/nvidia-docker:行34: /usr/bin/docker: 权限不够
/usr/bin/nvidia-docker:行34: /usr/bin/docker: 成功

 

解决方法:

关闭selinux 系统

setenforce 0
原因:nvidia-docker 是docker的一层封装,docker 在root账号下,应用程序安装在/usr/bin/docker 下,linux系统自带的selinux安全机制,为保护系统安全,严格控制调用系统程序的权限,即使在root账号下,也不允许一个系统程序调用另一个系统程序。因此,调用nvidia-docker run/images 等指令,会调用docker 指令,系统提示 权限问题。

解决方法:关闭selinux 安全系统,setenforce 0  临时关闭,系统重启后,恢复启动

如果启动,调用 setenforce 1

查询,selinux 

永久关闭,查看文件 /etc/selinux/config

SELINUX=disabled

保存后,重启 reboot



使用下面命令查看nvidia-docker 是否启动
root@ourui]# systemctl status nvidia-docker
● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/usr/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
[root@ourui]# systemctl start nvidia-docker
[root@ourui]# systemctl status nvidia-docker

Unit nvidia-docker.service could not be found.

● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/usr/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-03-27 10:39:16 CST; 2s ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 51649 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 51644 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 51643 (nvidia-docker-p)
   Memory: 13.9M
   CGroup: /system.slice/nvidia-docker.service
           └─51643 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Mar 27 10:39:16 ctum2e1302005.idc.wanda-group.net systemd[1]: Starting NVIDIA Docker plugin...
Mar 27 10:39:16 ctum2e1302005.idc.wanda-group.net systemd[1]: Started NVIDIA Docker plugin.

这一步就把基本的nvidia docker 环境安装好。需要注意,nvidia没有提供最新发布docker的版本,如果需要测试最新的docker release版本需要使用别的方法。

下载docker images
tensorflow 社区在docker hub 提供了一套images下载地址:
https://hub.docker.com/r/tensorflow/tensorflow/

国内很多docker hub ,当然可以直接使用国内的docker hub,同时也提供了一些加速器,下面我们看看使用阿里云加速器:
https://yq.aliyun.com/articles/29941
设置好了过后就可以直接下载docker iamges 了
nvidia-docker pull tensorflow/tensorflow:latest-gpu


启动container

[root@ourui]# nvidia-docker run -it -d -p  9999:9999 tensorflow/tensorflow:latest-gpu
69fede4460082f3e4aa847fc34ac0f58e797dc44b10d65643a70d2a1e7e4ba03
[root@ourui]# nvidia-docker logs 69fede4460082f3e4aa847fc34ac0f58e797dc44b10d65643a70d2a1e7e4ba03
[I 02:45:08.016 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W 02:45:08.031 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 02:45:08.037 NotebookApp] Serving notebooks from local directory: /notebooks
[I 02:45:08.037 NotebookApp] 0 active kernels
[I 02:45:08.037 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb
[I 02:45:08.038 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 02:45:08.038 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb

测试
打开web:
http://ip:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb


链接:https://blog.csdn.net/xiaomin1991222/article/details/84908877

 

[root@aaa]# firewall-cmd --query-port=9999/tcp  # 查看9999端口是否开启
no
[root@aaa]# firewall-cmd --zone=public --add-port=9999/tcp --permanent  #
success
[root@aaa]# firewall-cmd --reload  # 重启防火墙
success
[root@aaa]# firewall-cmd --query-port=9999/tcp  #
yes   

 

[root@]# nvidia-docker run -it -d -p  9999:9999 tensorflow/tensorflow:latest-gpu
b953cdfddbff648292f99eeb114bc0fd4315aa6e1a3442ad71312db51ca01bef
(base) [root@]# nvidia-docker logs b953cdfddbff648292f99eeb114bc0fd4315aa6e1a3442ad71312db51ca01bef
[I 14:17:37.662 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:17:37.763 NotebookApp] Serving notebooks from local directory: /notebooks
[I 14:17:37.764 NotebookApp] The Jupyter Notebook is running at:
[I 14:17:37.764 NotebookApp] http://(b953cdfddbff or 127.0.0.1):8888/?token=21d2cd061a147b15df933e7517c71c77f0ca2efaaee60014
[I 14:17:37.764 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:17:37.765 NotebookApp]
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://(b953cdfddbff or 127.0.0.1):8888/?token=21d2cd061a147b15df933e7517c71c77f0ca2efaaee60014
您在 /var/spool/mail/root 中有邮件

查看 # nvidia-docker 的版本

# nvidia-docker -v
Docker version 19.03.12, build 48a66213fe

 

启动Docker后
http:// // $ {HOST_IP}:8888 /
您可以通过访问查看jupyter屏幕。

jupyter_home.png

 

# docker run -it --rm tensorflow/tensorflow:latest-devel-py3 python -c "import tensorflow as tf;"
Unable to find image 'tensorflow/tensorflow:latest-devel-py3' locally
latest-devel-py3: Pulling from tensorflow/tensorflow
5bed26d33875: Pull complete
f11b29a9c730: Pull complete
930bda195c84: Pull complete
78bf9a5ad49e: Pull complete
4031529457c8: Pull complete
967a60cbd045: Pull complete
12e364189879: Pull complete
b52627e00e36: Pull complete
987834e14f2a: Pull complete
2ab521fd4d38: Pull complete
68db855b5bf6: Pull complete
731ea1aef3b5: Pull complete
91e84ec4856c: Pull complete
3f8b6feb1b39: Pull complete
bf92243594c8: Pull complete
Digest: sha256:3d308272e6045de92423908e47776317a11d05ae41d7dfd399d7cca59adc8910
Status: Downloaded newer image for tensorflow/tensorflow:latest-devel-py3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'tensorflow'

 

 

REF

http://t.zoukankan.com/zhouxiaosong-p-11098695.html

https://blog.csdn.net/xiaomin1991222/article/details/84908877

https://tensorflow.google.cn/install/docker?hl=zh-cn

https://www.cnblogs.com/yangyuxia/p/14693069.html

https://github.com/NVIDIA/nvidia-docker?utm_source=tuicool&utm_medium=referral

https://www.codenong.com/5a180f61647750eb8d70/