Multi-Node Jobs

Multi-Node Jobs#

Large parallel Abaqus jobs can be run on across multiple nodes using MPI.

Tip

Multi-node parallelism is best-suited to Abaqus/Explicit. Abaqus/Standard (implicit) does not scale well across multiple nodes.

Job Script Template#

Here is an example job submission script for a multi-node Abaqus job:

 1#!/usr/bin/bash -l
 2# 
 3#SBATCH --job-name=my_job
 4#SBATCH --nodes=2
 5#SBATCH --ntasks-per-node=28
 6#SBATCH --cpus-per-task=1 
 7#SBATCH --time=0:10:00 
 8#SBATCH --mem-per-cpu=4000M
 9#SBATCH --account=aero012345
10
11# Load modules 
12module load apps/abaqus/2018
13module load languages/Intel-OneAPI/2022.2.0              # BlueCrystal (Phase 4) and BluePebble
14
15# Unset SLURM's Global Task ID for ABAQUS's PlatformMPI to work 
16unset SLURM_GTIDS 
17
18# Get allocated nodes for Abaqus
19env_file=abaqus_v6.env 
20node_list=$(scontrol show hostname ${SLURM_NODELIST} | sort -u) 
21mp_host_list="[" 
22for host in ${node_list}; do 
23    mp_host_list="${mp_host_list}['$host', ${SLURM_CPUS_ON_NODE}]," 
24done 
25mp_host_list=$(echo ${mp_host_list} | sed -e "s/,$/]/") 
26echo "mp_host_list=${mp_host_list}"  >> ${env_file} 
27
28# Launch Abaqus 
29abaqus job=<job-name> cpus=$((SLURM_NTASKS_PER_NODE*SLURM_NNODES)) user=<usub-file> mp_mode=mpi double=both interactive

There are number of important differences with the single-node job script example:

  • More than one node is requested

  • We request multiple tasks per node (distributed parallelism), instead of multiple cpus per task (thread-based parallelism)

  • We have extra lines to inform Abaqus platform MPI about the nodes that have been allocated via the job scheduler

  • We must use mp_mode=mpi for multi-node parallelism

How to use#

Follow the same steps are described for the single-node example except you can scale the paralellism by changing the number of nodes (line 4).