HPC

Introduction

This webpage provides support to researchers and to students that need to use the ICS cluster.
It includes some basic information about the hardware/software/network configuration and the scientific software installed on the machines.
It also explains how to access the infrastructure and how to run jobs on the cluster.
Then, it focuses on Paraview, which allows users to visualize large sets of data by exploiting parallel resources.
The final sections provide some hints on how to install additional software and the necessary support contact information.

Note:
We do not spam all cluster users with notification emails about the cluster.
If you would like to be notified about the cluster related updates, please subscribe here.

Cluster specification

ICS cluster is composed of 41 compute nodes and is managed by a master node. Each node runs CentOS 8.2.2004.x86_64 and provides a wide range of scientific applications. Resource allocation and job scheduling are performed by Slurm, which allows users to submit jobs.

The following list includes the main hardware resources provided by the cluster:
(Note that users are not allowed to access computes nodes: they can only access master node through SSH and then they can run batch jobs through Slurm.)

Login node

NODES icslogin01
CPU 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores                     
RAM RAM: 64GB DDR4 @ 2133MHz
HDD 1 x 1TB SATA 6Gb
INFINIBAND ADAPTER Intel 40Gbps QDR

 

Compute nodes

 
Fat nodes
NODES icsnode[01-04,07]
CPU 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores                       
RAM 128GB DDR4 @ 2133MHz  
Note: icsnode07 is populated by 512GB RAM
HDD 1 x 1TB SATA 6Gb
INFINIBAND ADAPTER Intel 40Gbps QDR

 

 
GPU nodes
NODES icsnode[05,06,08-16]
CPU  2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores
RAM 128GB DDR4 @ 2133MHz
Note: icsnode15 is populated by 512GB RAM                                       
HDD 1 x 1TB SATA 6Gb
INFINIBAND ADAPTER Intel 40Gbps QDR
GPU on icsnode[05-06]
1 x NVIDIA A100-PCIe-40GB
Tensor Core GPU 40GB HBM2
on icsnode[08,13]
2 x NVIDIA GeForce GTX 1080 Ti
Titan 11GB GDDR5X
3584 CUDA cores
on icsnode[09-12,14-16]
1 x NVIDIA GeForce GTX 1080 
Founders Edition 8GB GDDR5X
2560 CUDA cores

 

 
Multi GPU nodes
NODES icsnode41 to icsnode42
CPU 2 x Intel Xeon Silver 4114 CPU @ 2.20GHz, 20 (2 x 10) cores
RAM RAM: 100GB DDR4 @ 2666MHz
HDD 1 x 1TB SATA 6Gb
INFINIBAND ADAPTER Intel 40Gbps QDR
GPU on icsnode41
2 x NVIDIA GeForce RTX 2080 Ti
11GB GDDR6
4352 CUDA cores
on icsnode42
2 x NVIDIA GeForce GTX 1080 Ti
11GB GDDR5X
3584 CUDA cores

 

 
Regular nodes
NODES icsnode17 to icsnode39
CPU  2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores                       
RAM RAM: 64GB DDR4 @ 2133MHz
HDD 1 x 1TB SATA 6Gb
INFINIBAND ADAPTER Intel 40Gbps QDR

 

Network hardware

INFINIBAND           Intel True Scale Fabric 12200 Switch. 36 ports 40Gbps QDR

Network configuration

WAN Cluster accessible from the web at hpc.ics.usi.ch (SSH)
Private LAN                          Low-performance tasks, cluster management, access to nodes
Infiniband LAN High-performance tasks, jobs scheduling and synchronization, data movement

Storage hardware

24-bay storage with RAID controller
16-bay JBOD storage
16-Gb Fibre Channel controller

Disk partitioning

Scratch volume
(/scratch)                             15TB in RAID 10 with SAS3 HD and one dedicated spare much faster write speed than Data_Apps volume prefer using this volume with I/O intensive applications.
The scratch filesystem is intended to be used for temporary storage while a job is running, and important data should be moved to alternative storage facilities as soon as a job is completed.
 
Data_Apps volume
(/apps and /home) 32TB in RAID 6 with SAS3 NL and one dedicated spare.
/apps, /home are located on this volume

Warning:

We do not have any backup system for the cluster. Don't store any critical data on the cluster without making your own backups. We won't be responsible for any loss of data.

 

Disk Quotas

We have enables disk quotas on the home directories of each users in order to allow for fair usage of shared resources. Each user has a soft limit of 90GB and hard limit of 100GB.
Once you have reached hard limit you will not be able to use the resources. We recommed you to use /scratch partition, if you need more disk space.
If you need more disk space in your home directory, you can write to us with your requirements and if possible we can grant more disk space.

Supported applications

We provide a wide range of scientific applications, which are organized with environment modules.

Environment modules

To allow the user to switch between different versions of installed programs and libraries we use a module concept. A module is a user interface that provides utilities for the dynamic modification of a user's environment, i.e., users do not have to manually modify their environment variables ( PATH , LD_LIBRARY_PATH, ...) to access the compilers, loader, libraries, and utilities. For all applications, tools, libraries, etc. the correct environment can be easily set by e.g., module load matlab. If several versions are installed they can be chosen like module load matlab/2020a. A list of all modules shows module avail. Other important commands are:

Command Description
module avail lists all available modules (on the current system)
module list lists all currently loaded modules
module show display information about
module load loads module
module switch unloads, loads
module rm unloads module
module purge unloads all loaded modules

Installing New Applications

If you need any additional applications, you are welcome to install it. Necessary support and guidance will be provided for the installation process.

Access to the HPC services

  • To get the account on the cluster please send us (Cluster Administrator) an email  with  your group head in cc.
  • If you already have an active account, you can connect to the cluster via SSH:
 
 
 
 
 
  • We stongly recommend you to access the cluster using ssh-keys. You can find the information about geneating ssh keys, here.

Slurm

We have Slurm as a batch management system on the cluster (version-20.02.4). 
You can find detailed tutorials, examples, manpages, etc: https://slurm.schedmd.com/

Slurm partitions 

Partition name Max Runtime # of nodes Range Main features
slim 48 hours 22 icsnode[17-38] cpu-only, 64GB RAM
gpu 48 hours 10 icsnode[05-06,08-15] Nvidia GPUs, 128GB RAM
fat 48 hours 4 icsnode[01-04] cpu-only, 128GB RAM
bigMem 48 hours 2 icsnode[07,15] 512GB RAM
multi_gpu 4 hours 2 icsnode[41,42] 100GB RAM, 2 Nvidia GPUs each
debug-slim 4 hours 1 icsnode39 cpu-only, 64GB RAM
debug-gpu 4 hours 1 icsnode16 Nvidia GPUs, 128GB RAM

Note: The sinfo command gives you a quick status overview.

Warning:
By default, ~16 GB of memory per node will be allocated for a job. In order to utilize resources properly, make a habit of specifying memory in your slurm job file.

Job Submission

The job submission can be done with srun command or by submitting a slurm job using sbatch.

Some options of srun / sbatch are:

SLURM OPTIONS DESCRIPTION
-c or --cpus-per-task this option is needed for multithreaded (e.g. OpenMP) jobs, it tells SLURM to allocate N cores per task allocated; typically N should be equal to the number of threads you program spawns, e.g. it should be set to the same number as OMP_NUM_THREADS
-e or --error specify a file name that will be used to store all error output (stderr), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job,  per default stderr goes to "slurm-%j.out" as well
-o or --output specify a file name that will be used to store all normal output (stdout), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job, per default stdout goes to "slurm-%j.out"
-n or --ntasks set number of tasks to N(default=1). This determines how many processes will be spawned by srun (for MPI jobs)
-N or --nodes set number of nodes that will be part of a job, on each node there will be ntasks-per-node processes started, if the option ntasks-per-node is not given, 1 process per node will be started
--ntasks-per-node how many tasks per allocated node to start, as stated in the line before
-p or --partition select the type of nodes where you want to execute your job
--time <hh:mm:ss> specify the maximum runtime of your job, if you just put a single number in, it will be interpreted as minutes
-J or --job-name give your job a name which is shown in the queue
--exclusive tell SLURM that only your job is allowed on the nodes allocated to this job; please be aware that you will be charged for all CPUs/cores on the node
-a or --array submit an array job
-w node1, node2,... restrict job to run on specific nodes only
-x node1, node2,... exclude specific nodes from job
--mem=MB minimum amount of real memory
- - mem-per-cpu specify the memory need per allocated CPU in MB,  mem >= mem-per-cpu if mem is specified
--mincpus minimum number of logical processors (threads) per node
--reservation=name allocate resources from named reservation
--gres=list required generic resource

It might be more convenient to put the options directly in a job file that you can submit using sbatch options.

The following example job file shows how you can use of slurm job files:

Simple slurm job file
 
 
 
 

#!/bin/bash

#SBATCH --time=00:10:00

#SBATCH --output=simulation-m-%j.out

#SBATCH --error=simulation-m-%j.err

#SBATCH --nodes=4

#SBATCH --mem=12020

#SBATCH --ntasks-per-node=20

echo Starting Program

 

 

OpenMP slurm Job:
 
 
 
 

#!/bin/bash

#SBATCH -J OpenMP_job

#SBATCH --nodes=1

#SBATCH --tasks-per-node=1

#SBATCH --cpus-per-task=20    

#SBATCH --time=00:10:00

#SBATCH --mem=12020

export OMP_NUM_THREADS=20

./path/to/binary

 

 

MPI slurm Job:
 
 
 
 

#!/bin/bash

#SBATCH -J MPI_job

#SBATCH --ntasks=80

#SBATCH --time=00:10:00

#SBATCH --mem=12020

mpirun ./path/to/binary

 

 

GPU slurm job
 
 
 
 

#!/bin/bash

#SBATCH -J MPI_job

#SBATCH --ntasks=1

#SBATCH --partition=gpu

#SBATCH --time=00:10:00

#SBATCH --mem=12020

#SBATCH --gres=gpu:1

./path/to/binary

 

Please note, if you want to use the GPUs, it is essential to ask for gpus using "--gres=gpu:1".
 
During runtime, the environment variable SLURM_JOB_ID will be set to the id of your job.
 

Job and Slurm Monitoring

On the command line, use squeue to watch the scheduling queue. To filter only your jobs, use squeue -u $USER. This command will tell the reason, why a job is not running (job status in the last column of the output). More information about job parameters can also be determined with scontrol -d show job jobid  Here are detailed descriptions of the possible job status:

Reason Long description
Dependency This job is waiting for a dependent job to complete.
None No reason is set for this job.
PartitionDown The partition required by this job is in a DOWN state.
PartitionNodeLimit The number of nodes required by this job is outside of its partitions current limits. Can also indicate that required nodes are DOWN or DRAINED.
PartitionTimeLimit The job’s time limit exceeds its partition’s current time limit.
Priority One or more higher priority jobs exist for this partition.
Resources The job is waiting for resources to become available.
NodeDown A node required by the job is down.
BadConstraints The job’s constraints can not be satisfied.
SystemFailure Failure of the SLURM system, a file system, the network, etc.
JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc.
NonZeroExitCode The job was terminated with a non-zero exit code.
TimeLimit The job exhausted its time limit.
InactiveLimit The job reached the system InactiveLimit.

 

Killing  jobs

If you want to kill a running job you can use scancel command.
The command scancel kills a single job and removes it from the queue.
By using scancel -u you can kill all of your jobs at once.

Reservations

In case you would like to reserve some nodes for running your jobs you could ask for a reservation to Cluster Administrator.

Please add the following information to your request mail:

  • start time (note: start time has to be later than the day of the request)
  • duration or end time (note: the longest jobs run 7 days)
  • account
  • node count or cpu count
  • partition

After we agree with your requirements, we will send you an e-mail with your reservation name.
Then you could see more information about your reservation with the following command:

scontrol show res reservation_name

If you want to use your reservation, you have to add the parameter --reservation=reservation_name either in your sbatch script or to your srun or salloc command.

Ganglia

Ganglia is an open-source, scalable distributed monitoring system for HPC servers. We have employed Ganglia to monitor the resources on the compute nodes and the login nodes. Users can use Ganglia to monitor the nodes on which their jobs are submitted to get more insight.

 
 
 
 

https://ganglia.ics.usi.ch/

 

 

Matlab R2020a

For the latest version of MATLAB, the university does not have floating licenses anymore.

Hence, in order to use the new version of Matlab R2020a, users have to generate their own licenses.

How to create licenses online for a selected ICS node?

  1. Login on the MathWorks website at https://ch.mathworks.com/licensecenter/licenses (log in with USI email)
  2. Select the license number
  3. Click on 'Install and activate’
  4. Click on ‘Activate to retrieve a license’
  5. Click on ‘Activate a computer’
  6. Fill the form: Manually Activate Software on a Computer
    • Release: R2020a
    • Operating system: Linux
    • Host ID: MAC address of the icsnode (see below)
    • Computer Login Name: your username on the icscluster
    • Activation Label: the name of icsnode
  7. Download the license and put it in the directory
    • /home//.matlab/R2020a_licenses/
  8. Use a different file name for every license you create.
    • Example: license_icsnode17_40730481_R2020a.lic
  9. Repeat this process for all the nodes where you want to activate Matlab.

How to get MAC address of a given icsnode?

Use salloc to login on a node

 
 
 
 

$ salloc -w icsnode17

 

Query for the MAC address

 
 
 
 

$ ifconfig eno1 | grep ether

ether 0c:c4:7a:b3:5e:60 txqueuelen 1000 (Ethernet)

 

Here, use 0c:c4:7a:b3:5e:60 as MAC address of icsnode17.

If you want, please contact us (Cluster Administrator) to get the list of mac addresses of all compute nodes.

Data visualization with Paraview

Paraview is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The default operation mode ParaView is serial. However, it can be coupled with a parallel server, running on one or more servers in our cluster.

The following steps show how to remotely run Paraview server using your local Paraview client application:

Warning:

The version of your Paraview client application must match the version of Paraview server running on the cluster. We currently provide Paraview_5.8.1_MPI_OSMESA, therefore you need to download the corresponding Paraview client version from Paraview download page

Open a terminal and type:

 
 
 
 

ssh [email protected]

module load paraview/5.8.1

salloc -N1 --ntasks-per-node=20

mpirun pvserver --force-offscreen-rendering

 

Note: salloc command logs the user into the first node of the allocated partition.

Open a new terminal and type:

 
 
 
 

ssh -L 11111:icsnode<##>:11111 [email protected]

 

Note: "icsnode<##>" is the first node of the allocated partition (See the previous step).

Start Paraview client and create a connection with the following settings:

Name: paraview Server
Type: Client / Server
Host: localhost
Port: 11111

Connect to the remote Paraview server using "paraview" connection.

Contact us

For issues regarding support you can contact:

Cluster Administrator