Host Configuration In-Depth#
The hosts.cfg file defines execution environments where your tasks run. This includes local machines, HPC clusters, cloud resources, and how to access software environments on each.
Structure Overview#
A host configuration defines:
[hostname]
scheduler = slurm
scratch_dir = /scratch/$USER
[[patterns]]
# Auto-detect this host
[[queues]]
[[[normal]]]
# Queue definitions
[[envs]]
[[[myenv]]]
# Environment configurations
Host Basics#
Minimal Host#
[local]
scheduler = background
scratch_dir = /tmp
Required Fields:
Host section name (e.g.,
[local],[hpc_cluster])scheduler- How to run jobs (background, slurm, pbspro)scratch_dir- Temporary working directory
Host Selection#
Explicit Selection:
woom run --host hpc_cluster
Auto-Detection:
Configure patterns to auto-detect:
[datarmor]
scheduler = slurm
[[patterns]]
hostname = datarmor.*
When hostname matches pattern, this host is automatically used.
Multiple Hosts#
Define different configurations for different systems:
[laptop]
scheduler = background
scratch_dir = /tmp
[workstation]
scheduler = background
scratch_dir = /scratch
[university_cluster]
scheduler = slurm
scratch_dir = /scratch/$USER
[national_supercomputer]
scheduler = pbspro
scratch_dir = /work/$USER
Schedulers#
Background Scheduler#
For local execution without a batch scheduler:
[local]
scheduler = background
scratch_dir = /tmp
Characteristics:
Jobs run as background processes
No queuing system
Immediate execution
Good for development and testing
Limited parallelism
SLURM Scheduler#
For systems using SLURM Workload Manager:
[slurm_cluster]
scheduler = slurm
scratch_dir = /scratch/$USER
[[queues]]
[[[normal]]]
partition = compute
qos = normal
account = myproject
[[[gpu]]]
partition = gpu
gres = gpu:4
account = myproject
SLURM-Specific Options:
partition- SLURM partition nameqos- Quality of serviceaccount- Billing accountreservation- Reservation namegres- Generic resources (like GPUs)constraint- Node constraints
PBS Pro Scheduler#
For systems using PBS Professional:
[pbspro_cluster]
scheduler = pbspro
scratch_dir = /work/$USER
[[queues]]
[[[normal]]]
queue_name = workq
project = PROJ001
PBS-Specific Options:
queue_name- PBS queue nameproject- Project code for accounting
Queue Configuration#
Queues define resource pools and policies.
Basic Queue#
[[queues]]
[[[normal]]]
partition = compute
account = myaccount
Generic vs Scheduler-Specific#
Generic Options (work across schedulers):
[[[normal]]]
# Translated automatically to scheduler syntax
Scheduler-Specific Options (SLURM example):
[[[gpu_queue]]]
partition = gpu
gres = gpu:4
qos = high
Multiple Queues#
Define different queues for different resource needs:
[[queues]]
[[[debug]]]
partition = debug
# Fast queue with limited time
[[[normal]]]
partition = compute
account = project123
[[[highmem]]]
partition = highmem
account = project123
[[[gpu]]]
partition = gpu
gres = gpu:4
account = project123
[[[long]]]
partition = compute
qos = long
account = project123
Queue Inheritance#
Avoid repetition with inheritance:
[[queues]]
[[[base]]]
account = myproject
partition = compute
[[[normal]]]
inherit = base
[[[highmem]]]
inherit = base
partition = highmem
[[[gpu]]]
inherit = base
partition = gpu
gres = gpu:4
Environment Configuration#
Environments define software stacks for tasks.
No Environment#
Tasks can run without special environment:
# No [[envs]] section needed
# Tasks with env = None use default environment
Module Environment#
Load software via environment modules:
[[envs]]
[[[ocean_model]]]
modules = netcdf/4.8.1, hdf5/1.12.0, openmpi/4.1.1
[[[python_env]]]
modules = python/3.9, scipy-stack/2023a
Multiple Modules:
Comma-separated list loads in order.
Conda/Mamba Environment#
Activate conda environments:
[[envs]]
[[[analysis]]]
conda = analysis_env
[[[forecast]]]
mamba = forecast_env
Virtualenv/venv#
Activate Python virtual environments:
[[envs]]
[[[python_analysis]]]
venv = /home/user/envs/analysis
UV Virtual Environment#
Activate UV environments:
[[envs]]
[[[fast_python]]]
uv_venv = /home/user/.venv
Combined Environments#
Combine multiple environment types:
[[envs]]
[[[full_stack]]]
modules = gcc/11.2, openmpi/4.1
conda = scientific_py
exports = OMP_NUM_THREADS=8, MKL_NUM_THREADS=8
Raw Text Environments#
For complex setup, provide raw shell commands:
[[envs]]
[[[custom]]]
raw_text = '''
source /opt/custom/setup.sh
export CUSTOM_VAR=value
module load special_software
'''
Environment Variables#
Set variables in the environment:
[[envs]]
[[[model_env]]]
modules = netcdf/4.8
exports = '''
OMP_NUM_THREADS=16
DATA_ROOT=/data/ocean
MODEL_VERSION=v2.3
'''
Auto-Detection Patterns#
Configure pattern matching to automatically select hosts.
Hostname Pattern#
[university_hpc]
scheduler = slurm
[[patterns]]
hostname = login[0-9]+.hpc.university.edu
Matches: login1.hpc.university.edu, login2.hpc.university.edu, etc.
Environment Variables#
[national_center]
scheduler = pbspro
[[patterns]]
env_vars = CLUSTER_NAME=national_hpc
Matches when environment variable is set.
Multiple Patterns#
Combine patterns (AND logic):
[specific_cluster]
scheduler = slurm
[[patterns]]
hostname = compute.*
env_vars = SITE=facility_a
Complete Host Examples#
Example 1: Development Laptop#
[laptop]
scheduler = background
scratch_dir = /tmp/woom
[[envs]]
[[[python]]]
conda = dev_env
Example 2: University SLURM Cluster#
[university_hpc]
scheduler = slurm
scratch_dir = /scratch/$USER
[[patterns]]
hostname = login.*.hpc.edu
[[queues]]
[[[debug]]]
partition = debug
account = course101
# 30 min limit
[[[normal]]]
partition = compute
account = research_proj
qos = normal
[[[highmem]]]
partition = highmem
account = research_proj
[[envs]]
[[[ocean_model]]]
modules = gcc/11, netcdf-fortran/4.5, openmpi/4.1
[[[python_analysis]]]
modules = python/3.10, scipy/1.9
exports = PYTHONUNBUFFERED=1
Example 3: National Supercomputer (PBS)#
[national_hpc]
scheduler = pbspro
scratch_dir = /work/$USER/scratch
[[patterns]]
hostname = login.*.national.gov
[[queues]]
[[[standard]]]
queue_name = standard
project = ATMO12345
[[[large]]]
queue_name = capability
project = ATMO12345
[[envs]]
[[[intel_mpi]]]
raw_text = '''
module purge
module load intel/2023
module load impi/2021
module load netcdf/4.9
'''
[[[analysis]]]
modules = python/3.11
venv = /work/$USER/venvs/analysis
Example 4: Multi-Site Configuration#
# Site A - SLURM
[site_a]
scheduler = slurm
scratch_dir = /scratch/$USER
[[patterns]]
hostname = login-a.*
[[queues]]
[[[normal]]]
partition = compute
account = proj_a
[[envs]]
[[[model]]]
modules = netcdf/4.8, openmpi/4.1
# Site B - PBS
[site_b]
scheduler = pbspro
scratch_dir = /work/$USER
[[patterns]]
hostname = login-b.*
[[queues]]
[[[normal]]]
queue_name = standard
project = proj_b
[[envs]]
[[[model]]]
modules = netcdf/4.7, mpt/2.25
# Local development
[local]
scheduler = background
scratch_dir = /tmp
[[envs]]
[[[model]]]
conda = ocean_dev
Advanced Features#
Host Inheritance#
Share configuration between similar hosts:
[base_slurm]
scheduler = slurm
[[queues]]
[[[normal]]]
account = myproject
[cluster_a]
inherit = base_slurm
scratch_dir = /scratch/a/$USER
[[patterns]]
hostname = login-a.*
[cluster_b]
inherit = base_slurm
scratch_dir = /scratch/b/$USER
[[patterns]]
hostname = login-b.*
Custom Scheduler Options#
Pass extra options to scheduler:
[[queues]]
[[[special]]]
partition = compute
# Custom SLURM options
extra_options = --constraint=ib&haswell
Parameter Overrides#
Override workflow parameters per host:
[laptop]
scratch_dir = /tmp
[[params]]
nprocs = 4 # Laptop has fewer cores
[supercomputer]
scratch_dir = /scratch/$USER
[[params]]
nprocs = 1024 # Use many cores
Access from templates:
mpirun -n {{ params.nprocs }} ./model
Best Practices#
Use Auto-Detection: Configure patterns for automatic host selection
Organize by Purpose: Group queues logically (debug, normal, long, highmem, gpu)
Document Requirements: Comment what modules/software are needed
Test Locally First: Have a local/background host for testing
Use Inheritance: Avoid repeating common configurations
Keep Secrets Out: Don’t put passwords or keys in configuration
Environment Modules: Prefer modules over hardcoded paths
Validate Accounts: Ensure account/project codes are correct
Check Queue Limits: Know walltime and resource limits
Version Control: Track host configs in git (without secrets)
Common Patterns#
Pattern 1: Development + Production#
[dev]
scheduler = background
scratch_dir = /tmp
[prod]
scheduler = slurm
scratch_dir = /scratch/$USER
[[queues]]
[[[normal]]]
partition = compute
Pattern 2: Multi-Tier Queues#
[[queues]]
[[[debug]]]
# Fast, limited
partition = debug
[[[normal]]]
# Standard
partition = compute
[[[long]]]
# Extended time
partition = compute
qos = long
[[[highmem]]]
# More memory
partition = highmem
[[[gpu]]]
# GPU access
partition = gpu
gres = gpu:4
Pattern 3: Software Stacks#
[[envs]]
[[[gnu_stack]]]
modules = gcc/11, openmpi/4, netcdf/4.8
[[[intel_stack]]]
modules = intel/2023, impi/2021, netcdf/4.9
[[[python_stack]]]
modules = python/3.10
conda = analysis_env
Troubleshooting#
Auto-Detection Not Working#
Check:
Pattern matches actual hostname:
echo $HOSTNAMEEnvironment variables are set:
env | grep CLUSTERNo typos in pattern syntax
Multiple hosts don’t match (creates ambiguity)
Module Load Fails#
Check:
Module exists:
module avail modulenameModule dependencies loaded first
Correct module version specified
Module system initialized
Environment Not Activated#
Check:
Conda/venv path is correct
Environment exists:
conda env listCorrect environment type specified (conda vs mamba vs venv vs uv_venv)
Shell initialization allows activation
Jobs Not Submitting#
Check:
Queue/partition exists:
sinfo(SLURM) orqstat -q(PBS)Account/project is valid
Resource requests within limits
User has access to queue
Migration Guide#
Moving Between Systems#
When moving workflow to new system:
Create new host configuration
Test with simple task:
woom run --host newhpcAdjust paths (scratch_dir, data locations)
Update queue/partition names
Verify environment modules/software
Test full workflow
Scheduler Migration#
Moving from SLURM to PBS (or vice versa):
Change
schedulersettingUpdate queue configuration (partition → queue_name, etc.)
Test submission with simple job
Adjust resource request translations if needed
Update any scheduler-specific custom options
See Also#
Task Configuration In-Depth - Configure tasks to run on hosts
Hosts configuration specifications - Complete configuration reference
Input environment variables - Environment variables available