Cluster Nodes

Complete List of Nodes

Name

GPU

CPUs

Sockets

Cores/Socket

Threads/Core

Memory (Gb)

TmpDisk (Tb)

Arch

Features

Primary

Secondary

GPU Arch and Memory

Model

#

Model

#

KEPLER

kepler[2-3]

k80

8

16

2

4

2

256

3.6

x86_64

tesla,12gb

kepler4

m40

4

16

2

4

2

256

3.6

x86_64

maxwell,24gb

kepler5

v100

2

m40

1

16

2

4

2

256

3.6

x86_64

volta,12gb

MILA

mila01

v100

8

80

2

20

2

512

7

x86_64

tesla,16gb

mila02

v100

8

80

2

20

2

512

7

x86_64

tesla,32gb

mila03

v100

8

80

2

20

2

512

7

x86_64

tesla,32gb

POWER9

power9[1-2]

v100

4

128

2

16

4

586

0.88

power9

tesla,nvlink,16gb

TITAN RTX

rtx[6,9]

titanrtx

2

20

1

10

2

128

3.6

x86_64

turing,24gb

rtx[1-5,7-8]

titanrtx

2

20

1

10

2

128

0.93

x86_64

turing,24gb

**NEW** APOLLO

apollov[01-05]

v100

8

80

2

20

2

380

3.6

x86_64

tesla,nvlink,32gb

apollor[06-16]

rtx8000

8

80

2

20

2

380

3.6

x86_64

turing,48g

Special Nodes

Power9

Power9 servers are using a different processor instruction set than Intel and AMD (x86_64). As such you need to setup your environment again for those nodes specifically.

  • Power9 Machines have 128 threads. (2 processors / 16 cores / 4 way SMT)

  • 4 x V100 SMX2 (16 GB) with NVLink

  • In a Power9 machine GPUs and CPUs communicate with each other using NVLink instead of PCIe.

This allow them to communicate quickly between each other. More on LMS

Power9 have the same software stack as the regular nodes and each software should be included to deploy your environment as on a regular node.

AMD

Warning

As of August 20 the GPUs had to return back to AMD. Mila will get more samples. You can join the amd slack channels to get the latest information

Mila has a few node equipped with MI50 GPUs.

 srun --gres=gpu -c 8 --reservation=AMD --pty bash

# first time setup of AMD stack
 conda create -n rocm python=3.6
 conda activate rocm

 pip install tensorflow-rocm
 pip install /wheels/pytorch/torch-1.1.0a0+d8b9d32-cp36-cp36m-linux_x86_64.whl