Utilisation du cluster BeoWulf

The ICMP/LQM BeoWulf Cluster

This documention covers the use of the « Quantum Wolf » Cluster. It is separated into the sections:
  1. Compute nodes
  2. File-systems
  3. Batch job queue
  4. Utilities
The cluster is accessible by ssh using the address:
ssh username@lqmmaster.epfl.ch

If accessed from outside the EPFL network, you need to be logged in the EPFL VPN first.
The current status of the cluster and workload may be viewed at lqmmaster.epfl.ch/ganglia (requires being
on the EPFL VPN or the EPFL local network).
Jobs must be submitted to the SLURM queue using batch scripts (see below):
sbatch job.batch

1. Compute nodes

The cluster is made out of 96 compute nodes. The most important aspects are:
  1. The nodes are loosely connected through a LAN Gigabyte network.
  2. The nodes do not have hard drives thus « live » entirely on the RAM.
  3. The nodes have access to permanent storage through a NFS server (see file-systems below)
  4. Each node has a 4-core Intel CPU i5-3350P, thus 384 cores total.
  5. The nodes qwf001 and qwf0[05-14] have 4GB of memory.
  6. The nodes qwf0[02-04] and qwf0[15-96] have 8GB of memory.
The calculations suitable for the Quantum Wolf Cluster must meet the following criteria:
  1. The network must be used as little as possible.
  2. I/O operations should not be done concurently by a large number of processes over the network.
  3. Files written in the node local memory (in RAM) must keep small in order to not saturate the compute nodes memory.
  4. Large files may be written through the network on the NFS shares.

2. File-systems

There are 3 file-systems that are in use on the « Quantum Wolf » Cluster.
  1. /home: This is a permanent storage file-system hosted on lqmmaster. The nodes have access to it as a NFS share. This is meant for small files and binaries (typically your scripts, source code and programs. There is no backup.
  2. /data: This is a large permanent storage file-system hosted on lqmmaster. The nodes have access to it as a NFS share.This is meant to store the data-files resulting from the calculations. There is no backup.
  3. /scratch: This is a local file-system present both on the nodes and lqmmaster. On the nodes, this storage resides on the RAM and thus is erased when a node halts or reboot. The content of the lqmmaster:/scratch may be pushed on the nodes qwf#:/scratch using « qwf-sync » while the nodes qwf:/scratch content may be retrieved on permanent storage on lqmmaster using « qwf-sync-back » (see Utilities below). This is a very fast file-system where you want to put your input files and /or write the (small) output files.
The figure below summarize the « Quantum Wolf » file-systems:

3. Batch job queues

The cluster uses a job scheduling manager to distribute the computational resources across the users. The chosen manager is SLURM. Calculations must be submitted to a queue before they can be run. This is done by writing a batch script where the header tells the queuing system about what is needed to perform the calculation. The job is submitted with:
sbatch jobfile.batch

Examples of batch scripts may be found in the /cluster/jobfiles folder and are linked below.
The cluster has several partitions where you can submit your job:
  1. qwfall: Contains all the nodes and has a maximum runtime of 24 hours.
  2. qwfall-long: Contains all the nodes and has a maximum runtime of 7 days.
  3. qwfhm: Contains the 85 nodes which have 8GB of memory, qwf0[02-04] and qwf0[15-96]. Maximum runtime of 24 hours.
  4. qwfhm-long: Contains the 85 nodes which have 8GB of memory, qwf0[02-04] and qwf0[15-96]. Maximum runtime of 7 days.
  5. qwflm: Contains the 11 nodes which have 4GB of memory, qwf001 and qwf0[05-14]. Maximum runtime of 24 hours.
  6. qwflm-long: Contains the 11 nodes which have 4GB of memory, qwf001 and qwf0[05-14]. Maximum runtime of 7 days.
The priority of the jobs submitted in the long partitions (qwfall-long, qwfhm-long and qwflm-long) is lower than the one in the short ones. The longer a job is holded in the queue the higher its priority gets.


4. Utilities

To ease the use of the cluster, you can use the following utilities:
  • qwf-clear: Clears the local scratch on the compute nodes.
  • qwf-count: Returns the number of available compute nodes.
  • qwf-foreach: Execute a command sequencially on each node.
  • qwf-hosts: Returns a list of the available compute nodes.
  • qwf-list: Constructs a list of compute node-names such as « qwf001.qwf,qwf002.qwf,……,qwf016.qwf ».
  • qwf-passwd: Updates your password on all the nodes. Please do not change your password using the standard unix « passwd » as it would only change it on lqmmaster.
  • qwf-sync: Push the content of your lqmmaster:/scratch/$USER folder onto the compute nodes qwf#:/scratch/$USER folders.
  • qwf-sync-back: Pulls back the contents of the compute nodes qwf#:/scratch/$USER into your current directory.
For each tool, a more complete help message with the available options is printed using the « -h » switch:
qwf-sync -h
Push the content of lqmmaster:/scratch/$USER on the nodes local qwf#:/scratch/$USER
folder with the content. This is unidirectional. Files present on lqmmaster:/scratch/$USER
that are not present or different on the nodes are copied on the nodes. File on the nodes
absent or different on lqmmaster are not pulled back on it (see qwf-sync-back).
This is equivalent of doing:

rsync -a /scratch/$USER $USER@qwfhost:/scratch$USER

for each qwf hosts « qwfhost ».

Usage: qwf-sync [OPTIONS]

Options
-h                        Print this help message.
-H HOSTSLIST    Push only on the nodes from HOSTSLIST, a comma-separated list of nodes
(see qwf-list)