Cluster Nodes¶
Complete List of Nodes¶
Name |
GPU |
CPUs |
Sockets |
Cores/Socket |
Threads/Core |
Memory (Gb) |
TmpDisk (Tb) |
Arch |
Features |
|||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Primary |
Secondary |
|||||||||||
GPU Arch and Memory |
||||||||||||
Model |
# |
Model |
# |
|||||||||
|
||||||||||||
kepler[2-3] |
k80 |
8 |
16 |
2 |
4 |
2 |
256 |
3.6 |
x86_64 |
tesla,12gb |
||
kepler4 |
m40 |
4 |
16 |
2 |
4 |
2 |
256 |
3.6 |
x86_64 |
maxwell,24gb |
||
kepler5 |
v100 |
2 |
m40 |
1 |
16 |
2 |
4 |
2 |
256 |
3.6 |
x86_64 |
volta,12gb |
|
||||||||||||
mila01 |
v100 |
8 |
80 |
2 |
20 |
2 |
512 |
7 |
x86_64 |
tesla,16gb |
||
mila02 |
v100 |
8 |
80 |
2 |
20 |
2 |
512 |
7 |
x86_64 |
tesla,32gb |
||
mila03 |
v100 |
8 |
80 |
2 |
20 |
2 |
512 |
7 |
x86_64 |
tesla,32gb |
||
|
||||||||||||
power9[1-2] |
v100 |
4 |
128 |
2 |
16 |
4 |
586 |
0.88 |
power9 |
tesla,nvlink,16gb |
||
|
||||||||||||
rtx[6,9] |
titanrtx |
2 |
20 |
1 |
10 |
2 |
128 |
3.6 |
x86_64 |
turing,24gb |
||
rtx[1-5,7-8] |
titanrtx |
2 |
20 |
1 |
10 |
2 |
128 |
0.93 |
x86_64 |
turing,24gb |
||
|
||||||||||||
apollov[01-05] |
v100 |
8 |
80 |
2 |
20 |
2 |
380 |
3.6 |
x86_64 |
tesla,nvlink,32gb |
||
apollor[06-16] |
rtx8000 |
8 |
80 |
2 |
20 |
2 |
380 |
3.6 |
x86_64 |
turing,48g |
Special Nodes¶
Power9¶
Power9 servers are using a different processor instruction set than Intel and AMD (x86_64). As such you need to setup your environment again for those nodes specifically.
Power9 Machines have 128 threads. (2 processors / 16 cores / 4 way SMT)
4 x V100 SMX2 (16 GB) with NVLink
In a Power9 machine GPUs and CPUs communicate with each other using NVLink instead of PCIe.
This allow them to communicate quickly between each other. More on LMS
Power9 have the same software stack as the regular nodes and each software should be included to deploy your environment as on a regular node.
AMD¶
Warning
As of August 20 the GPUs had to return back to AMD. Mila will get more samples. You can join the amd slack channels to get the latest information
Mila has a few node equipped with MI50 GPUs.
srun --gres=gpu -c 8 --reservation=AMD --pty bash
# first time setup of AMD stack
conda create -n rocm python=3.6
conda activate rocm
pip install tensorflow-rocm
pip install /wheels/pytorch/torch-1.1.0a0+d8b9d32-cp36-cp36m-linux_x86_64.whl