Workflow Configuration In-Depth#
The workflow configuration file (workflow.cfg) is the heart of your woom setup. It defines the overall structure, timing, and organization of your computational workflow.
Structure Overview#
A typical workflow.cfg contains these main sections:
[app]
name = my_ocean_model
conf = production
exp = exp001
[cycles]
begin_date = 2020-01-01
end_date = 2020-01-31
freq = 1D
indep = False
[ensemble]
size = 10
tasks = run_model, postprocess
[params]
# Custom parameters available in templates
[env_vars]
# Environment variables set for all tasks
[groups]
# Task groups for parallel execution
[stages]
[[prolog]]
setup = init_workspace
[[cycles]]
process = run_model, postprocess
[[epilog]]
cleanup = archive
Application Configuration#
The [app] section identifies your workflow and creates a hierarchical directory structure.
Basic Configuration#
[app]
name = croco
conf = benguela
exp = test01
This creates the path structure: croco/benguela/test01/
Fields:
name: Application name (defaults to workflow directory name if omitted)conf: Configuration name (optional)exp: Experiment name (optional)
Directory Impact:
The app path is used throughout woom:
Job submission directories:
jobs/app_path/task_name/Run directories can reference:
{{ app.name }}/{{ app.conf }}/{{ app.exp }}Task paths include:
{app_path}/{stage}/{task_name}
Practical Example#
[app]
name = ocean_model
conf = north_atlantic
exp = spinup_2020
Results in:
Submission dir:
jobs/ocean_model/north_atlantic/spinup_2020/prolog/setup/Available in templates as:
{{ app.name }},{{ app.conf }},{{ app.exp }}
Cycles Configuration#
Cycles allow tasks to repeat for different time periods. This is essential for time-stepping models and temporal workflows.
Date-Based Cycles#
[cycles]
begin_date = 2020-01-01T00:00:00
end_date = 2020-01-05T00:00:00
freq = 6H
as_intervals = True
indep = False
Fields:
begin_date: Start date (ISO 8601 format)end_date: End date (optional, single cycle if omitted)freq: Frequency between cycles (pandas offset string: ‘1D’, ‘6H’, ‘1M’, etc.)ncycles: Alternative to end_date - number of cycles to runas_intervals: If True, cycles represent intervals [begin, end); if False, point-in-timeindep: If True, cycles run in parallel (independent); if False, sequential (each waits for previous)round: Round dates to frequency (e.g., round=’D’ rounds to midnight)
Example 1: Daily Cycles
[cycles]
begin_date = 2020-01-01
end_date = 2020-01-10
freq = 1D
as_intervals = True
Generates 9 cycles: - 2020-01-01 to 2020-01-02 - 2020-01-02 to 2020-01-03 - … - 2020-01-09 to 2020-01-10
Example 2: 6-Hour Intervals
[cycles]
begin_date = 2020-01-01T00:00:00
ncycles = 8
freq = 6H
as_intervals = True
Generates:
2020-01-01T00:00:00 to 2020-01-01T06:00:00
2020-01-01T06:00:00 to 2020-01-01T12:00:00
…
2020-01-02T18:00:00 to 2020-01-03T00:00:00
Cycle Dependencies#
Sequential Cycles (indep=False)
[cycles]
indep = False
Each cycle waits for all tasks in the previous cycle to complete before starting. Use this for:
Time-stepping models where each step depends on the previous
Workflows where data from cycle N is needed in cycle N+1
Independent Cycles (indep=True)
[cycles]
indep = True
All cycles can run in parallel. Use this for:
Embarrassingly parallel problems
Independent ensemble members running different time periods
Post-processing different time slices
No Cycles#
[cycles]
begin_date = 2020-01-01
# No end_date, freq, or ncycles
Creates a single “cycle” with fixed date. Tasks in the cycles stage run once.
Forecast Cycles (horizon)#
The horizon option adds a forecast window to date-based cycles (as_intervals = False).
Without it, each cycle has only a begin_date and end_date is None.
With horizon, each cycle’s end_date is set to begin_date + horizon, making
{{ cycle_end_date }} and {{ cycle_duration }} available in templates — without
changing the directory structure, which remains anchored to begin_date only.
horizon accepts any pandas timedelta string (5D, 12h, 1W, …).
It is ignored when as_intervals = True (those cycles already have explicit end dates).
See Forecast cycles with horizon for a worked example.
Ensemble Configuration#
Ensembles allow running multiple realizations of tasks with different parameters.
Basic Ensemble#
[ensemble]
size = 50
tasks = run_model, analyze
label = member
Creates 50 members (member001 to member050) that run run_model and analyze tasks.
Fields:
size: Number of ensemble members (None = no ensemble)tasks: Which tasks should be ensembled (comma-separated list)skip: Members to skip (e.g., skip = 1,5,10)label: Label for members (default: “member”)
Parameterized Ensembles#
Use the iters subsection to create ensembles with varying parameters:
[ensemble]
size = 4
tasks = run_model
[[iters]]
param1 = 0.1, 0.2, 0.3, 0.4
param2 = high, high, low, low
seed = 1234, 2345, 3456, 4567
Each member gets different parameter values:
Member 1: param1=0.1, param2=high, seed=1234
Member 2: param1=0.2, param2=high, seed=2345
Member 3: param1=0.3, param2=low, seed=3456
Member 4: param1=0.4, param2=low, seed=4567
Access in templates:
parameter_1 = {{ member.param1 }}
parameter_2 = {{ member.param2 }}
random_seed = {{ member.seed }}
Practical Example#
Sensitivity analysis with different wind forcing strengths:
[ensemble]
size = 5
tasks = run_ocean
label = scenario
[[iters]]
wind_scaling = 0.8, 0.9, 1.0, 1.1, 1.2
description = weak, reduced, baseline, enhanced, strong
In your model configuration template:
! Wind forcing scale factor
wind_scale = {{ member.wind_scaling }}
! Run: {{ member.description }} winds
Workflow Stages#
Stages organize tasks into logical execution phases.
Stage Types#
Prolog
Runs once at the workflow start. Use for:
Creating directories
Downloading initial data
Compiling code
Setting up databases
[[prolog]]
preparation = setup_dirs, download_forcings
compilation = build_model
Cycles
Repeats for each cycle (if cycles are configured). Use for:
Time-stepping simulations
Iterative processing
Temporal analysis
[[cycles]]
simulation = run_model
analysis = compute_diagnostics, create_plots
Epilog
Runs once after all cycles complete. Use for:
Final analysis
Archiving results
Cleanup
Notifications
[[epilog]]
finalize = merge_outputs, create_summary
archive = backup_results
Stage Sequences and Parallelism#
Within each stage, you can define multiple sequences (substages) that run sequentially:
[[prolog]]
# Sequence 1: runs first
setup = create_dirs, copy_inputs
# Sequence 2: runs after sequence 1
prepare = compile_code, validate_inputs
# Sequence 3: runs after sequence 2
initialize = create_grid, setup_initial_conditions
Within each sequence, tasks can run in parallel by separating with commas:
[[cycles]]
# These three tasks run in parallel
process = task1, task2, task3
Task Groups#
Define reusable task groups in the [groups] section:
[groups]
preprocessing = clean_data, validate_data, transform_data
core_model = initialize, run_timesteps, finalize
postprocessing = extract_outputs, compute_stats
Use in stages:
[[prolog]]
prep = preprocessing
[[cycles]]
run = core_model, postprocessing
Skipping Tasks at Runtime#
You can prevent specific tasks from being submitted without editing
tasks.cfg, keeping them in the task tree so their artifact paths
remain accessible to downstream tasks.
In :file:`workflow.cfg` (persisted skip list for a given experiment):
[stages]
skip = preprocess, download_forcings
[[prolog]]
setup = preprocess, download_forcings, compile_model
[[cycles]]
run = run_model
Both preprocess and download_forcings will be silently bypassed on every
woom run, but run_model can still reference their artifact paths.
On the command line (one-off override):
woom run --skip preprocess download_forcings
Multiple task names are space-separated. CLI names are merged with any names
already listed in [stages] skip, so you can combine both mechanisms.
Behaviour summary:
Skipped tasks are not submitted and their submission directory is untouched
They appear as
SKIPPED(bold cyan) inwoom show statusTheir artifact paths are still displayed by
woom show artifactsDownstream tasks receive no scheduler dependency through a skipped slot (they can start immediately, assuming the artifacts already exist)
--forcedoes not override the skip; remove the task from the skip list to re-enable it
Note
For a skip that is part of the task definition rather than a runtime choice,
use the skip = True option directly in tasks.cfg
(see Task Configuration In-Depth).
Custom Parameters#
The [params] section defines custom variables available in all templates.
Global Parameters#
Flat parameters (most common):
[params]
domain = north_atlantic
resolution = 10km
grid_nx = 100
grid_ny = 200
timestep = 300
Access with underscores:
domain: {{ params.domain }}
nx: {{ params.grid_nx }}
timestep: {{ params.timestep }}
Nested parameters (using ConfigObj subsections):
[params]
domain = north_atlantic
[[grid]]
nx = 100
ny = 200
[[paths]]
forcing_dir = /data/forcings
output_dir = /scratch/outputs
Access with dots:
domain: {{ params.domain }}
nx: {{ params.grid.nx }}
forcing: {{ params.paths.forcing_dir }}
Note
For simple workflows, use flat parameters with descriptive names (grid_nx, model_timestep).
Use nested sections when you have many related parameters to organize.
Host and Task-Specific Parameters#
Override parameters for specific hosts or tasks:
[params]
scratch_dir = /scratch/default
[[hosts]]
[[[local]]]
scratch_dir = /tmp
[[[hpc_cluster]]]
scratch_dir = /scratch/users/$USER
[[tasks]]
[[[run_model]]]
threads = 8
memory = 32GB
Environment Variables#
Set environment variables for all tasks:
[env_vars]
OMP_NUM_THREADS = 4
MKL_NUM_THREADS = 4
PYTHONUNBUFFERED = 1
DATA_ROOT = /data/ocean
These are exported before task execution and available in templates.
Complete Example#
Here’s a comprehensive workflow configuration:
# Ocean model workflow
[app]
name = ocean_model
conf = tropical_pacific
exp = hindcast_2020
# Run daily cycles for January 2020
[cycles]
begin_date = 2020-01-01T00:00:00
end_date = 2020-02-01T00:00:00
freq = 1D
as_intervals = True
indep = False # Sequential - each day needs previous
# 10-member ensemble with different initial conditions
[ensemble]
size = 10
tasks = run_ocean
[[iters]]
ic_perturbation = 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10
# Custom parameters
[params]
model_timestep = 600
output_frequency = 3600
[[paths]]
forcing = /data/forcings/era5
bathymetry = /data/static/etopo1.nc
# Environment for all tasks
[env_vars]
OMP_NUM_THREADS = 8
DATA_DIR = /data/ocean
# Reusable task groups
[groups]
analysis = compute_sst, compute_currents, create_plots
# Workflow structure
[stages]
[[prolog]]
setup = create_workspace, download_bathymetry
prepare = generate_grid, compile_model
[[cycles]]
simulate = run_ocean
postprocess = analysis
[[epilog]]
finalize = merge_all_outputs
archive = backup_to_tape
Best Practices#
Start Simple: Begin with a basic workflow and add complexity (cycles, ensembles) incrementally
Use Meaningful Names: Application, configuration, and experiment names should be descriptive
Plan Your Cycles: Consider if cycles should be independent or sequential based on your science
Organize Stages Logically: Use prolog for setup, cycles for repeated work, epilog for finalization
Document Parameters: Add comments in your configuration explaining what parameters control
Test Incrementally: Test with a single cycle before running many, test with a few members before a large ensemble
Use Groups: Define task groups for commonly repeated task sequences
Common Patterns#
Pattern 1: Simple Time-Stepping Model
[cycles]
begin_date = 2020-01-01
ncycles = 30
freq = 1D
indep = False
[stages]
[[cycles]]
run = model_timestep
Pattern 2: Embarrassingly Parallel Processing
[cycles]
begin_date = 2020-01-01
ncycles = 365
freq = 1D
indep = True # All days can process in parallel
[stages]
[[cycles]]
process = analyze_day
Pattern 3: Ensemble Forecast
[ensemble]
size = 50
tasks = run_forecast
[cycles]
begin_date = 2020-01-01
ncycles = 10
freq = 1D
indep = False
[stages]
[[cycles]]
forecast = run_forecast
Pattern 4: No Cycles, Just Stages
# No [cycles] section needed
[stages]
[[prolog]]
prepare = download, preprocess
[[epilog]]
analyze = statistics, visualize
See Also#
Task Configuration In-Depth - Configure individual tasks
Context and Variables In-Depth - Variables available in templates
Workflow configuration specifications - Complete configuration reference