In this note, I detail a step-by-step instruction I followed to setup software on NVIDIA-based “Deep Learning Box”.

I’m using Ubuntu 18.04 LTS. This is a machines I’ve dedicated for experimentation. It is only running Ubuntu Linux - no dual booting.

CUDA Installation

This machine, named tchalla, has two GPUs: a GTX 1050ti and a GTX 960:

$ nvidia-smi
Thu Jan 24 22:02:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:0F:00.0  On |                  N/A |
| 30%   29C    P8    N/A /  75W |    553MiB /  4038MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 960     Off  | 00000000:42:00.0 Off |                  N/A |
|  0%   35C    P8     8W / 160W |      1MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1047      G   /usr/lib/xorg/Xorg                           205MiB |
|    0      1325      G   /usr/bin/gnome-shell                         119MiB |
|    0      7749      G   ...uest-channel-token=14858601992556674804    51MiB |
|    0     21931      G   /home/sconde/TPL/paraview-5.6/lib/paraview   162MiB |
|    0     22128      G   gnome-control-center                          11MiB |
+-----------------------------------------------------------------------------+

As you can, I’m running NVIDIA Driver Version: 410.48. I’m using Lmod: A New Environment Module System. With this, I’m able to experiment with different CUDA versions: 10.0 and 9.2.

$ module load cuda/
cuda/10.0  cuda/9.2

$ module load cuda/10.0
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

$ module load cuda/9.2

The following have been reloaded with a version change:
  1) cuda/10.0 => cuda/9.2

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

Python Installation

Using modules, I have both python2 and python3 installed on tchalla. However for this setup, I’m running python3. After all, countdown to the end of life of python2 is on the way.

cuDNN Installation

  • Sign up for an NVIDIA account (if new)
  • Download the Cudnn version supported by installed CUDA Version
$ cat build-and-install-cudnn.sh
#!/bin/bash
# note: should be source
# Assert that the installer script (from CUDA) is provided
#if ([ $# -gt 2 ]); then
    #echo 'illegal number of parameter'
    #echo 'usage: bash build-and-install-cuda.sh [cuda_installer.sh]'
#fi

. ../setup-machine/setup_functions.sh
if [ -z "$TPL_ROOT" ]; then TPL_ROOT=$HOME/TPL ; fi

this_pwd=$PWD
#cuda_root=$TPL_CUDA_ROOT
cuda_root=/home/sconde/TPL/cuda/10.0/install
cuda_version=$CUDA_VERSION
cudnn_tar=cudnn-10.0-linux-x64-v7.4.2.24.tgz
cudnn_tar_fullpath=$this_pwd/$cudnn_tar
cudnn_short_name=`echo "$cudnn_tar" | cut -d "-" -f1,2`
echo $cudnn_short_name

mkdir $cudnn_short_name && tar zxf $cudnn_tar_fullpath -C $cudnn_short_name --strip-components 1

cp -Pv $cudnn_short_name/lib64/* $cuda_root/lib64/
cp  -v $cudnn_short_name/include/* $cuda_root/include
chmod a+r $cuda_root/include/cudnn.h

Setup FastAI

For now, I’m using conda’s virtual environment

  • conda create -n fastai
  • source activate fastai
  • conda install -c pytorch -c fastai fastai

Testing

$ cat test_torch_cuda.py
'''
Purpose: verify the torch installation is good
        Check if CUDA devices are accessible inside a Library.
'''
import torch

assert torch.cuda.is_available(), 'something went wrong'
print("Pytorch CUDA is Good!!")

$ python test_torch_cuda.py
Pytorch CUDA is Good!!

Update

Lately I’ve been having issue with this installation procedure. Specially with tensorflow-dataset. Perhaps it’s due to TF 2.0.

Anyway, PyTorch locally is pretty nifty.