Task Configuration In-Depth#
Tasks are the fundamental units of work in woom. The tasks.cfg file defines what each task does, where it runs, what resources it needs, and what outputs it produces.
Structure Overview#
A task configuration consists of several sections:
[task_name]
[[content]]
commandline = echo "Hello World"
run_dir = {{ scratch_dir }}/run
env = myenv
template = job.sh
[[artifacts]]
[[[output_file]]]
path = output.txt
check = True
[[fill]]
[[[config]]]
template = model.cfg.j2
destination = {{ task_run_dir }}/model.cfg
[[submit]]
queue = normal
nnodes = 1
ncpus = 16
time = 02:00:00
blocking = True
Task Content#
The [[content]] section defines what the task executes and where.
Command Line#
The commandline specifies what to execute:
[[content]]
commandline = ./my_model input.nml
Simple Commands:
commandline = python analyze_data.py
Multiple Commands:
Use shell syntax (&&, ;, ||):
commandline = cd {{ task_run_dir }} && ./prepare.sh && ./run_model.exe
Multi-Line Commands:
commandline = '''
set -e
export DATA_DIR=/scratch/data
python preprocess.py
mpirun -n 128 ./ocean_model
python postprocess.py
'''
With Template Variables:
commandline = mpirun -n {{ params.nprocs }} ./model -i {{ cycle.begin_date }}
Run Directory#
The run_dir is where the command executes:
[[content]]
run_dir = /scratch/{{ app.name }}/{{ task_path }}
Special Values:
current- Use current working directoryNoneor empty - No cd before execution
Common Patterns:
# Unique per task, cycle, and member
run_dir = {{ scratch_dir }}/{{ task_path }}
# Shared run directory
run_dir = {{ workflow_dir }}/run
# Organized by date
run_dir = /scratch/runs/{{ cycle.date }}
Available Variables:
{{ scratch_dir }}- Scratch directory from host config{{ workflow_dir }}- Where workflow.cfg is located{{ task_path }}- Path including app/cycle/task/member{{ app.name }},{{ cycle.date }},{{ member.label }}
Environment#
The env specifies which environment configuration to use (defined in hosts.cfg):
[[content]]
env = python_env
No Environment:
env = None
Multiple Environments:
Use task inheritance:
[base_python_task]
[[content]]
env = python_env
[my_task]
inherit = base_python_task
[[content]]
commandline = python my_script.py
Template#
The template specifies which Jinja2 template renders the job script (defaults to job.sh):
[[content]]
template = custom_job.sh
Create custom templates in templates/ directory to override the default.
Task Inheritance#
Tasks can inherit from other tasks to share configuration:
[base_model_task]
[[content]]
env = ocean_env
run_dir = {{ scratch_dir }}/{{ task_path }}
[[submit]]
queue = compute
time = 04:00:00
ncpus = 16
[run_hindcast]
inherit = base_model_task
[[content]]
commandline = ./ocean_model hindcast.nml
[run_forecast]
inherit = base_model_task
[[content]]
commandline = ./ocean_model forecast.nml
[[submit]]
time = 02:00:00 # Override with shorter time
Inheritance Rules:
Child tasks override parent values
Deeply nested sections are merged
Set value to None to unset inherited value
Artifacts#
Artifacts are output files that woom tracks and validates.
Basic Artifacts#
[[artifacts]]
[[[output_data]]]
path = output.nc
check = True
[[[log_file]]]
path = model.log
check = True
Fields:
path- File path (absolute or relative to run_dir)check- If True, woom verifies file exists after task completioncallable- If True, path is a function name that generates the path
Multiple Files#
Specify a list of files:
[[artifacts]]
[[[outputs]]]
path = file1.nc, file2.nc, file3.nc
check = True
Template Paths#
Use template variables in paths:
[[artifacts]]
[[[model_output]]]
path = {{ task_run_dir }}/output_{{ cycle.token }}.nc
check = True
[[[restart_file]]]
path = {{ task_run_dir }}/restart_{{ cycle.end_date_str }}.nc
check = True
Dynamic Paths with Callables#
For complex path generation, use a callable:
[[artifacts]]
[[[ensemble_outputs]]]
path = generate_ensemble_paths
check = True
callable = True
[[[[kwargs]]]]
base_dir = {{ task_run_dir }}
pattern = member_{:03d}.nc
Register the generator function in an extension file:
# ext/artifacts_generators.py
from woom.tasks import ARTIFACTS_GENERATORS
def generate_ensemble_paths(context, base_dir, pattern):
"""Generate paths for all ensemble members"""
if context['member'] is None:
return []
paths = []
for i in range(1, 51): # 50 members
paths.append(f"{base_dir}/{pattern.format(i)}")
return paths
ARTIFACTS_GENERATORS['generate_ensemble_paths'] = generate_ensemble_paths
Optional Artifacts#
Set check = False for optional outputs:
[[artifacts]]
[[[required_output]]]
path = results.nc
check = True
[[[optional_log]]]
path = debug.log
check = False
Template Filling#
The [[fill]] section defines template files to fill before task execution.
Basic Template Filling#
[[fill]]
[[[namelist]]]
template = ocean.nml.j2
destination = {{ task_run_dir }}/ocean.nml
[[[config]]]
template = config.xml.j2
destination = {{ task_run_dir }}/config.xml
How it Works:
Template file is loaded from
templates/directoryRendered with current context (task, cycle, member, params)
Written to destination path
Happens automatically before task command executes
Template Example#
Create templates/ocean.nml.j2:
&time_control
start_date = "{{ cycle.begin_date_str }}"
end_date = "{{ cycle.end_date_str }}"
dt = {{ params.timestep }}
/
&grid
nx = {{ params.grid_nx }}
ny = {{ params.grid_ny }}
/
&output
output_file = "{{ task_run_dir }}/output.nc"
output_freq = {{ params.output_frequency }}
/
Configure in tasks.cfg:
[run_model]
[[content]]
commandline = ./ocean_model ocean.nml
[[fill]]
[[[namelist]]]
template = ocean.nml.j2
destination = {{ task_run_dir }}/ocean.nml
Multiple Templates#
Fill multiple configuration files:
[[fill]]
[[[main_config]]]
template = model.cfg.j2
destination = {{ task_run_dir }}/model.cfg
[[[forcing_list]]]
template = forcings.txt.j2
destination = {{ task_run_dir }}/forcings.txt
[[[submission_script]]]
template = post_process.sh.j2
destination = {{ task_run_dir }}/post_process.sh
Member-Specific Configurations#
Generate different configurations for ensemble members:
[[fill]]
[[[member_config]]]
template = ensemble_config.j2
destination = {{ task_run_dir }}/config_{{ member.label }}.cfg
Template:
member_id = {{ member.id }}
perturbation = {{ member.perturbation }}
seed = {{ member.seed }}
Submission Configuration#
The [[submit]] section controls how and where tasks execute.
Queue Selection#
[[submit]]
queue = normal
Queues are defined in hosts.cfg. Common names:
normal- Standard compute queuehigh_mem- High memory nodesgpu- GPU nodesdebug- Fast debug queue with limitslong- Extended time limit queue
Resource Requirements#
[[submit]]
nnodes = 2
ncpus = 32
ngpus = 4
memory = 128GB
pmem = 4GB
time = 06:00:00
Fields:
nnodes- Number of compute nodesncpus- Number of CPU cores per taskngpus- Number of GPUsmemory- Total memory limitpmem- Per-process memory limittime- Walltime limit (HH:MM:SS format)
Scheduler Translation:
Woom translates these to scheduler-specific options:
- SLURM:
nnodes→--nodes=2ncpus→--ntasks-per-node=32time→--time=06:00:00
- PBS Pro:
nnodes=2, ncpus=32→-l select=2:ncpus=32time→-l walltime=06:00:00
Task Blocking#
[[submit]]
blocking = True
- blocking = True (default):
Task must complete before dependent tasks start
Status is tracked
Failures stop the workflow
- blocking = False:
Task runs but doesn’t block dependents
Used for monitoring, logging, non-critical tasks
Gracefully terminated when workflow completes
Example: Monitoring Task#
[monitor_progress]
[[content]]
commandline = watch -n 60 'ls -lh {{ task_run_dir }}/output*'
run_dir = {{ workflow_dir }}
[[submit]]
blocking = False # Don't wait for this
queue = debug
Email Notifications#
[[submit]]
mail = user@example.com
Sends email on task completion/failure (if scheduler supports it).
Complete Task Examples#
Example 1: Simple Python Script#
[analyze_data]
[[content]]
commandline = python analyze.py {{ cycle.date }}
run_dir = {{ workflow_dir }}/analysis
env = python_data
[[artifacts]]
[[[results]]]
path = results_{{ cycle.token }}.csv
check = True
[[submit]]
queue = normal
ncpus = 1
memory = 8GB
time = 00:30:00
Example 2: MPI Simulation#
[run_ocean_model]
[[content]]
commandline = mpirun -n {{ ncpus }} ./ocean_model ocean.nml
run_dir = {{ scratch_dir }}/{{ task_path }}
env = ocean_env
template = mpi_job.sh
[[fill]]
[[[namelist]]]
template = ocean.nml.j2
destination = {{ task_run_dir }}/ocean.nml
[[artifacts]]
[[[output]]]
path = output_{{ cycle.end_date_str }}.nc
check = True
[[[restart]]]
path = restart_{{ cycle.end_date_str }}.nc
check = True
[[submit]]
queue = compute
nnodes = 4
ncpus = 128
time = 08:00:00
memory = 256GB
Example 3: Data Download#
[download_forcing]
[[content]]
commandline = '''
wget https://data.example.com/forcing_{{ cycle.date }}.nc
mv forcing_{{ cycle.date }}.nc {{ task_run_dir }}/
'''
run_dir = {{ params.forcing_dir }}
[[artifacts]]
[[[forcing_file]]]
path = {{ params.forcing_dir }}/forcing_{{ cycle.date }}.nc
check = True
[[submit]]
queue = debug
ncpus = 1
time = 00:15:00
Example 4: Post-Processing with Ensemble#
[compute_ensemble_mean]
[[content]]
commandline = python ensemble_mean.py --input {{ task_run_dir }} --output mean.nc
run_dir = {{ scratch_dir }}/postprocess
env = python_analysis
[[fill]]
[[[file_list]]]
template = ensemble_files.txt.j2
destination = {{ task_run_dir }}/files.txt
[[artifacts]]
[[[mean_output]]]
path = mean_{{ cycle.token }}.nc
check = True
[[[std_output]]]
path = std_{{ cycle.token }}.nc
check = True
[[submit]]
queue = normal
ncpus = 8
memory = 64GB
time = 01:00:00
Example 5: Conditional Execution#
[conditional_analysis]
[[content]]
commandline = '''
if [ -f {{ task_run_dir }}/trigger.flag ]; then
python special_analysis.py
else
echo "Skipping - no trigger file"
fi
'''
run_dir = {{ workflow_dir }}/analysis
[[submit]]
queue = debug
ncpus = 1
time = 00:10:00
Skipping Tasks#
A task can be excluded from submission while remaining in the task tree. This is useful when a task has already produced its artifacts in a previous run and you want downstream tasks to reference those artifacts without re-running the task itself.
A skipped task:
is never submitted to the scheduler
is never cleaned (its submission directory and artifacts are preserved)
contributes no scheduler dependencies to downstream tasks (they run immediately)
still appears in
woom show statuswith statusSKIPPEDstill appears in
woom show artifactswith its artifact paths
Static Skip (in tasks.cfg)#
Set skip = True directly on a task to permanently exclude it from submission
within a given configuration:
[preprocess]
skip = True
[[content]]
commandline = python preprocess.py
[[artifacts]]
[[[output]]]
path = {{ task_run_dir }}/preprocessed.nc
check = True
[run_model]
[[content]]
commandline = ./model preprocessed.nc
# can still read preprocess artifacts even though it was skipped
Combined with task inheritance, this lets you activate or deactivate tasks without restructuring the workflow:
[base_preprocess]
[[content]]
commandline = python preprocess.py
[[artifacts]]
[[[output]]]
path = {{ task_run_dir }}/preprocessed.nc
[preprocess]
inherit = base_preprocess
skip = True # disable for this experiment
Warning
When a task is skipped its artifacts must already exist on disk. If they do not, downstream tasks that depend on those files will fail at runtime.
See Also#
Workflow Configuration In-Depth — skip tasks at runtime without editing
tasks.cfg
Task Organization Strategies#
By Purpose#
# Setup tasks
[create_workspace]
[download_inputs]
[compile_code]
# Core computation
[run_model]
[run_diagnostics]
# Post-processing
[extract_variables]
[compute_statistics]
[create_plots]
# Finalization
[merge_outputs]
[cleanup]
By Inheritance Hierarchy#
[base_task]
[[submit]]
queue = normal
time = 01:00:00
[base_python_task]
inherit = base_task
[[content]]
env = python_env
[base_model_task]
inherit = base_task
[[content]]
env = model_env
run_dir = {{ scratch_dir }}/{{ task_path }}
[specific_task]
inherit = base_python_task
[[content]]
commandline = python specific.py
Best Practices#
Use Inheritance: Define common configurations once in base tasks
Validate Artifacts: Set
check = Truefor critical outputsTemplate Configurations: Use
[[fill]]instead of hardcoding parametersRequest Appropriate Resources: Don’t over-request resources, it delays scheduling
Use Meaningful Names: Task names should describe what they do
Set Reasonable Timeouts: Add buffer but avoid excessive walltime requests
Test Locally: Use a simple host configuration to test tasks before HPC submission
Document Complex Commands: Add comments explaining non-obvious command sequences
Handle Errors: Consider exit codes and error handling in complex command sequences
Organize by Stage: Name tasks to indicate which workflow stage they belong to
Common Pitfalls#
Forgetting run_dir: Relative artifact paths need run_dir defined
Missing Environment: Tasks fail if environment doesn’t exist on host
Incorrect Resource Requests: nnodes vs ncpus confusion varies by scheduler
Template Syntax Errors: Test templates independently before workflow run
Artifact Path Mismatches: Ensure artifact paths match actual output locations
Blocking Loops: Non-blocking tasks should not create dependencies
Over-requesting Resources: Excessive requests delay scheduling
Troubleshooting#
Task Won’t Submit#
Check:
Queue exists in host configuration
Resource requests are valid for queue
Environment is available on host
Commandline syntax is correct
Task Fails Immediately#
Check:
run_dir exists or can be created
Command is executable
Environment loads correctly
Input files exist
Artifacts Not Found#
Check:
Artifact path is correct (absolute or relative to run_dir)
Task actually produces the file
Template variables render correctly
File permissions allow access
See Also#
Workflow Configuration In-Depth - Organize tasks into workflows
Artifacts In-Depth - Detailed artifact handling
Templating In-Depth - Template filling system
Tasks configuration specifications - Complete configuration reference