In this note, I detail a step-by-step instruction I followed to setup software on NVIDIA-based “Deep Learning Box”.
I’m using Ubuntu 18.04 LTS. This is a machines I’ve dedicated for experimentation. It is only running Ubuntu Linux - no dual booting.
CUDA Installation
This machine, named tchalla, has two GPUs: a GTX 1050ti and a GTX 960:
$ nvidia-smi
Thu Jan 24 22:02:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:0F:00.0 On | N/A |
| 30% 29C P8 N/A / 75W | 553MiB / 4038MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 960 Off | 00000000:42:00.0 Off | N/A |
| 0% 35C P8 8W / 160W | 1MiB / 2002MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1047 G /usr/lib/xorg/Xorg 205MiB |
| 0 1325 G /usr/bin/gnome-shell 119MiB |
| 0 7749 G ...uest-channel-token=14858601992556674804 51MiB |
| 0 21931 G /home/sconde/TPL/paraview-5.6/lib/paraview 162MiB |
| 0 22128 G gnome-control-center 11MiB |
+-----------------------------------------------------------------------------+
As you can, I’m running NVIDIA Driver Version: 410.48. I’m using Lmod: A New Environment Module System.
With this, I’m able to experiment with different CUDA versions: 10.0 and 9.2.
$ module load cuda/
cuda/10.0 cuda/9.2
$ module load cuda/10.0
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
$ module load cuda/9.2
The following have been reloaded with a version change:
1) cuda/10.0 => cuda/9.2
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
Python Installation
Using modules, I have both python2 and python3 installed on tchalla.
However for this setup, I’m running python3.
After all, countdown to the end of life of python2 is on the way.
cuDNN Installation
- Sign up for an NVIDIA account (if new)
- Download the Cudnn version supported by installed CUDA Version
$ cat build-and-install-cudnn.sh
#!/bin/bash
# note: should be source
# Assert that the installer script (from CUDA) is provided
#if ([ $# -gt 2 ]); then
#echo 'illegal number of parameter'
#echo 'usage: bash build-and-install-cuda.sh [cuda_installer.sh]'
#fi
. ../setup-machine/setup_functions.sh
if [ -z "$TPL_ROOT" ]; then TPL_ROOT=$HOME/TPL ; fi
this_pwd=$PWD
#cuda_root=$TPL_CUDA_ROOT
cuda_root=/home/sconde/TPL/cuda/10.0/install
cuda_version=$CUDA_VERSION
cudnn_tar=cudnn-10.0-linux-x64-v7.4.2.24.tgz
cudnn_tar_fullpath=$this_pwd/$cudnn_tar
cudnn_short_name=`echo "$cudnn_tar" | cut -d "-" -f1,2`
echo $cudnn_short_name
mkdir $cudnn_short_name && tar zxf $cudnn_tar_fullpath -C $cudnn_short_name --strip-components 1
cp -Pv $cudnn_short_name/lib64/* $cuda_root/lib64/
cp -v $cudnn_short_name/include/* $cuda_root/include
chmod a+r $cuda_root/include/cudnn.h
Setup FastAI
For now, I’m using conda’s virtual environment
conda create -n fastaisource activate fastaiconda install -c pytorch -c fastai fastai
Testing
$ cat test_torch_cuda.py
'''
Purpose: verify the torch installation is good
Check if CUDA devices are accessible inside a Library.
'''
import torch
assert torch.cuda.is_available(), 'something went wrong'
print("Pytorch CUDA is Good!!")
$ python test_torch_cuda.py
Pytorch CUDA is Good!!
Update
Lately I’ve been having issue with this installation procedure. Specially with tensorflow-dataset. Perhaps it’s due to TF 2.0.
Anyway, PyTorch locally is pretty nifty.