Ubuntu 18.04系统下安装tensorflow-gpu 1.9_Linux教程

以下使用的操作系统是Ubuntu 18.04，安装tensorflow-gpu 1.9，CUDA 9.0及对应cuDNN版本。

安装前说明

1.前提是机器上必须有Nvidia显卡，不太老就好（配置不好的也没必要玩这个了吧，费电），在Nvidia官网可以在这里查到显卡支持情况安装过程中的命令都需要root身份，请使用su root切换或者每次加 sudo，编译运行测试代码使用普通用户就好。

2.必须按tensorflow 官网提示的版本安装 1.9 对应 CUDA 9.0，CUDA 9.0 要下载相应版本的cuDNN。

3.如果喜欢折腾，建议使用没有重要数据的硬盘。

4.安装包最好下载到其他电脑上，使用scp拷贝到安装机上，重装了几遍ubuntu，下一次包就2个G。

下载主要安装文件

1.CUDA 工具包

http://nvidia.com/cuda

#我选的是16.04的run文件，其他的不保证可行

cuda_9.0.176_384.81_linux.run

2.cuDNN 深度神经网络（DNN）开发环境，需要网站注册

http://developer.nvidia.com/cudnn

libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb

libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb

libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb

准备环境

1.看CUDA自带的驱动版本，这里是384.81，低于这个版本就要先卸载，>= 跳过

#建议run文件卸载，即你之前下载的Nvidia驱动run文件

chmod +x *.run

./NVIDIA-Linux-x86_64-384.59.run --uninstall

# 不建议采取这种，不知道为什么没尝试过

apt-get remove --purge nvidia*

2.禁用自带的nouveau驱动，如果你连Nvidia驱动都装过了，这一步也免了

运行：vi /etc/modprobe.d/blacklist.conf

#加两行

blacklist nouveau

options nouveau modeset=0

#生效配置

update-initramfs -u

#重启，后分辨率变低了，毕竟没有显卡驱动了

reboot

#检查是否生效

lsmod | grep nouveau

#如果屏幕没有输出则禁用nouveau成功

3.安装必要的编译环境否者自带网卡驱动安装不上

apt install gcc g++ make make-guile

4.针对CUDA 9.0，必须将GCC降级为gcc5，也是安装CUDA时发现的

apt install gcc-5 g++-5

update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50

update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50

安装 CUDA 工具包

1.一定要根据tensorflow版本安装对应版本的CUDA 1.9对应9.0，这点很重要

chmod +x cuda_9.0.176_384.81_linux.run

sh ./cuda_9.0.176_384.81_linux.run

#会有说明，需要看的自己看，看了几页不想看/条款看不懂的按q键

1].如果安装过程中提示失败，根据提示查看log排错

2].安装成功后的log

Do you accept the previously read EULA?

accept/decline/quit: accept

You are attempting to install on an unsupported configuration. Do you wish to continue?

(y)es/(n)o [ default is no ]: y

#这里384.81表示显卡驱动版本，如果本机安装的显卡驱动版本比它高就不需要安装

#选no主要是前面有问题的时候安了CUDA9.2

#正常应该是yes

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?

(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?

(y)es/(n)o/(q)uit: y

Enter Toolkit Location

[ default is /usr/local/cuda-9.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?

(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?

(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location

[ default is /root ]:

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...

Missing recommended library: libGLU.so

Missing recommended library: libX11.so

Missing recommended library: libXi.so

Missing recommended library: libXmu.so

Missing recommended library: libGL.so

Installing the CUDA Samples in /root ...

Copying samples to /root/NVIDIA_CUDA-9.0_Samples now...

Finished copying samples.

===========

= Summary =

===========

Driver: Not Selected

Toolkit: Installed in /usr/local/cuda-9.0

Samples: Installed in /root, but missing recommended libraries

Please make sure that

- PATH includes /usr/local/cuda-9.0/bin

- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.

To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:

sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_7657.log

/root/NVIDIA_CUDA-9.0_Samples

2.设置环境变量

运行：vi /etc/ld.so.conf.d/cuda.conf

#写入两行

/usr/local/cuda/lib64

/usr/local/cuda/extras/CUPTI/lib64

运行：vi /etc/profile

#加入两行

export CUDA_HOME=/usr/local/cuda/bin

export PATH=$PATH:$CUDA_HOME

3.重启，使用reboot命令。

测试安装情况

没有报错就表示安装成功

cd /root/NVIDIA_CUDA-9.0_Samples/samples/1_Utilities/deviceQuery

make

./deviceQuery

# Result = PASS 成功

cd ../bandwidthTest

make

./bandwidthTest

#Result = PASS 成功

cuDNN 安装

NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks.

#cuDNN v7.1.4 Runtime Library for Ubuntu16.04 (Deb)

dpkg -i libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb

#cuDNN v7.1.4 Developer Library for Ubuntu16.04 (Deb)

dpkg -i libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb

#cuDNN v7.1.4 Code Samples and User Guide for Ubuntu16.04 (Deb)

libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb

# 锁定版本，免得自动更新破坏环境

apt-mark hold libcudnn7 libcudnn7-dev

测试

#Copy the cuDNN sample to a writable path.

$cp -r /usr/src/cudnn_samples_v7/ $HOME

#Go to the writable path.

$ cd $HOME/cudnn_samples_v7/mnistCUDNN

#Compile the mnistCUDNN sample.

$make clean && make

#Run the mnistCUDNN sample.

$ ./mnistCUDNN

#If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:

#Test passed!

安装 tensorflow-gpu 以python3为例

sudo apt-get install python3-pip python3-dev

pip3 install tensorflow-gpu

测试安装

#测试代码，保存到比如test.py

import tensorflow as tf

hello = tf.constant('Hello, TensorFlow!')

sess = tf.Session()

print(sess.run(hello))

#执行 python3 test.py

#第一次有点慢

#没报错，有显卡信息，b'Hello, TensorFlow!'，表示成功。

本文结束了，还得继续学习Tensorflow了。