Workflow Configuration In-Depth#

The workflow configuration file (workflow.cfg) is the heart of your woom setup. It defines the overall structure, timing, and organization of your computational workflow.

Structure Overview#

A typical workflow.cfg contains these main sections:

[app]
name = my_ocean_model
conf = production
exp = exp001

[cycles]
begin_date = 2020-01-01
end_date = 2020-01-31
freq = 1D
indep = False

[ensemble]
size = 10
tasks = run_model, postprocess

[params]
# Custom parameters available in templates

[env_vars]
# Environment variables set for all tasks

[groups]
# Task groups for parallel execution

[stages]
    [[prolog]]
    setup = init_workspace

    [[cycles]]
    process = run_model, postprocess

    [[epilog]]
    cleanup = archive

Application Configuration#

The [app] section identifies your workflow and creates a hierarchical directory structure.

Basic Configuration#

[app]
name = croco
conf = benguela
exp = test01

This creates the path structure: croco/benguela/test01/

Fields:

name: Application name (defaults to workflow directory name if omitted)
conf: Configuration name (optional)
exp: Experiment name (optional)

Directory Impact:

The app path is used throughout woom:

Job submission directories: jobs/app_path/task_name/
Run directories can reference: {{ app.name }}/{{ app.conf }}/{{ app.exp }}
Task paths include: {app_path}/{stage}/{task_name}

Practical Example#

[app]
name = ocean_model
conf = north_atlantic
exp = spinup_2020

Results in:

Submission dir: jobs/ocean_model/north_atlantic/spinup_2020/prolog/setup/
Available in templates as: {{ app.name }}, {{ app.conf }}, {{ app.exp }}

Cycles Configuration#

Cycles allow tasks to repeat for different time periods. This is essential for time-stepping models and temporal workflows.

Date-Based Cycles#

[cycles]
begin_date = 2020-01-01T00:00:00
end_date = 2020-01-05T00:00:00
freq = 6H
as_intervals = True
indep = False

Fields:

begin_date: Start date (ISO 8601 format)
end_date: End date (optional, single cycle if omitted)
freq: Frequency between cycles (pandas offset string: ‘1D’, ‘6H’, ‘1M’, etc.)
ncycles: Alternative to end_date - number of cycles to run
as_intervals: If True, cycles represent intervals [begin, end); if False, point-in-time
indep: If True, cycles run in parallel (independent); if False, sequential (each waits for previous)
round: Round dates to frequency (e.g., round=’D’ rounds to midnight)

Example 1: Daily Cycles

[cycles]
begin_date = 2020-01-01
end_date = 2020-01-10
freq = 1D
as_intervals = True

Generates 9 cycles: - 2020-01-01 to 2020-01-02 - 2020-01-02 to 2020-01-03 - … - 2020-01-09 to 2020-01-10

Example 2: 6-Hour Intervals

[cycles]
begin_date = 2020-01-01T00:00:00
ncycles = 8
freq = 6H
as_intervals = True

Generates:

2020-01-01T00:00:00 to 2020-01-01T06:00:00
2020-01-01T06:00:00 to 2020-01-01T12:00:00
…
2020-01-02T18:00:00 to 2020-01-03T00:00:00

Cycle Dependencies#

Sequential Cycles (indep=False)

[cycles]
indep = False

Each cycle waits for all tasks in the previous cycle to complete before starting. Use this for:

Time-stepping models where each step depends on the previous
Workflows where data from cycle N is needed in cycle N+1

Independent Cycles (indep=True)

[cycles]
indep = True

All cycles can run in parallel. Use this for:

Embarrassingly parallel problems
Independent ensemble members running different time periods
Post-processing different time slices

No Cycles#

[cycles]
begin_date = 2020-01-01
# No end_date, freq, or ncycles

Creates a single “cycle” with fixed date. Tasks in the cycles stage run once.

Forecast Cycles (`horizon`)#

The horizon option adds a forecast window to date-based cycles (as_intervals = False). Without it, each cycle has only a begin_date and end_date is None. With horizon, each cycle’s end_date is set to begin_date + horizon, making {{ cycle_end_date }} and {{ cycle_duration }} available in templates — without changing the directory structure, which remains anchored to begin_date only.

horizon accepts any pandas timedelta string (5D, 12h, 1W, …). It is ignored when as_intervals = True (those cycles already have explicit end dates).

See Forecast cycles with horizon for a worked example.

Ensemble Configuration#

Ensembles allow running multiple realizations of tasks with different parameters.

Basic Ensemble#

[ensemble]
size = 50
tasks = run_model, analyze
label = member

Creates 50 members (member001 to member050) that run run_model and analyze tasks.

Fields:

size: Number of ensemble members (None = no ensemble)
tasks: Which tasks should be ensembled (comma-separated list)
skip: Members to skip (e.g., skip = 1,5,10)
label: Label for members (default: “member”)

Parameterized Ensembles#

Use the iters subsection to create ensembles with varying parameters:

[ensemble]
size = 4
tasks = run_model

    [[iters]]
    param1 = 0.1, 0.2, 0.3, 0.4
    param2 = high, high, low, low
    seed = 1234, 2345, 3456, 4567

Each member gets different parameter values:

Member 1: param1=0.1, param2=high, seed=1234
Member 2: param1=0.2, param2=high, seed=2345
Member 3: param1=0.3, param2=low, seed=3456
Member 4: param1=0.4, param2=low, seed=4567

Access in templates:

parameter_1 = {{ member.param1 }}
parameter_2 = {{ member.param2 }}
random_seed = {{ member.seed }}

Practical Example#

Sensitivity analysis with different wind forcing strengths:

[ensemble]
size = 5
tasks = run_ocean
label = scenario

    [[iters]]
    wind_scaling = 0.8, 0.9, 1.0, 1.1, 1.2
    description = weak, reduced, baseline, enhanced, strong

In your model configuration template:

! Wind forcing scale factor
wind_scale = {{ member.wind_scaling }}

! Run: {{ member.description }} winds

Workflow Stages#

Stages organize tasks into logical execution phases.

Stage Types#

Prolog

Runs once at the workflow start. Use for:

Creating directories
Downloading initial data
Compiling code
Setting up databases

[[prolog]]
preparation = setup_dirs, download_forcings
compilation = build_model

Cycles

Repeats for each cycle (if cycles are configured). Use for:

Time-stepping simulations
Iterative processing
Temporal analysis

[[cycles]]
simulation = run_model
analysis = compute_diagnostics, create_plots

Epilog

Runs once after all cycles complete. Use for:

Final analysis
Archiving results
Cleanup
Notifications

[[epilog]]
finalize = merge_outputs, create_summary
archive = backup_results

Stage Sequences and Parallelism#

Within each stage, you can define multiple sequences (substages) that run sequentially:

[[prolog]]
# Sequence 1: runs first
setup = create_dirs, copy_inputs

# Sequence 2: runs after sequence 1
prepare = compile_code, validate_inputs

# Sequence 3: runs after sequence 2
initialize = create_grid, setup_initial_conditions

Within each sequence, tasks can run in parallel by separating with commas:

[[cycles]]
# These three tasks run in parallel
process = task1, task2, task3

Task Groups#

Define reusable task groups in the [groups] section:

[groups]
preprocessing = clean_data, validate_data, transform_data
core_model = initialize, run_timesteps, finalize
postprocessing = extract_outputs, compute_stats

Use in stages:

[[prolog]]
prep = preprocessing

[[cycles]]
run = core_model, postprocessing

Skipping Tasks at Runtime#

You can prevent specific tasks from being submitted without editing tasks.cfg, keeping them in the task tree so their artifact paths remain accessible to downstream tasks.

In :file:`workflow.cfg` (persisted skip list for a given experiment):

[stages]
    skip = preprocess, download_forcings

    [[prolog]]
    setup = preprocess, download_forcings, compile_model

    [[cycles]]
    run = run_model

Both preprocess and download_forcings will be silently bypassed on every woom run, but run_model can still reference their artifact paths.

On the command line (one-off override):

woom run --skip preprocess download_forcings

Multiple task names are space-separated. CLI names are merged with any names already listed in [stages] skip, so you can combine both mechanisms.

Behaviour summary:

Skipped tasks are not submitted and their submission directory is untouched
They appear as SKIPPED (bold cyan) in woom show status
Their artifact paths are still displayed by woom show artifacts
Downstream tasks receive no scheduler dependency through a skipped slot (they can start immediately, assuming the artifacts already exist)
--force does not override the skip; remove the task from the skip list to re-enable it

Note

For a skip that is part of the task definition rather than a runtime choice, use the skip = True option directly in tasks.cfg (see Task Configuration In-Depth).

Custom Parameters#

The [params] section defines custom variables available in all templates.

Global Parameters#

Flat parameters (most common):

[params]
domain = north_atlantic
resolution = 10km
grid_nx = 100
grid_ny = 200
timestep = 300

Access with underscores:

domain: {{ params.domain }}
nx: {{ params.grid_nx }}
timestep: {{ params.timestep }}

Nested parameters (using ConfigObj subsections):

[params]
domain = north_atlantic

    [[grid]]
    nx = 100
    ny = 200

    [[paths]]
    forcing_dir = /data/forcings
    output_dir = /scratch/outputs

Access with dots:

domain: {{ params.domain }}
nx: {{ params.grid.nx }}
forcing: {{ params.paths.forcing_dir }}

Note

For simple workflows, use flat parameters with descriptive names (grid_nx, model_timestep). Use nested sections when you have many related parameters to organize.

Host and Task-Specific Parameters#

Override parameters for specific hosts or tasks:

[params]
scratch_dir = /scratch/default

    [[hosts]]
        [[[local]]]
        scratch_dir = /tmp

        [[[hpc_cluster]]]
        scratch_dir = /scratch/users/$USER

    [[tasks]]
        [[[run_model]]]
        threads = 8
        memory = 32GB

Environment Variables#

Set environment variables for all tasks:

[env_vars]
OMP_NUM_THREADS = 4
MKL_NUM_THREADS = 4
PYTHONUNBUFFERED = 1
DATA_ROOT = /data/ocean

These are exported before task execution and available in templates.

Complete Example#

Here’s a comprehensive workflow configuration:

# Ocean model workflow
[app]
name = ocean_model
conf = tropical_pacific
exp = hindcast_2020

# Run daily cycles for January 2020
[cycles]
begin_date = 2020-01-01T00:00:00
end_date = 2020-02-01T00:00:00
freq = 1D
as_intervals = True
indep = False  # Sequential - each day needs previous

# 10-member ensemble with different initial conditions
[ensemble]
size = 10
tasks = run_ocean

    [[iters]]
    ic_perturbation = 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10

# Custom parameters
[params]
model_timestep = 600
output_frequency = 3600

    [[paths]]
    forcing = /data/forcings/era5
    bathymetry = /data/static/etopo1.nc

# Environment for all tasks
[env_vars]
OMP_NUM_THREADS = 8
DATA_DIR = /data/ocean

# Reusable task groups
[groups]
analysis = compute_sst, compute_currents, create_plots

# Workflow structure
[stages]
    [[prolog]]
    setup = create_workspace, download_bathymetry
    prepare = generate_grid, compile_model

    [[cycles]]
    simulate = run_ocean
    postprocess = analysis

    [[epilog]]
    finalize = merge_all_outputs
    archive = backup_to_tape

Best Practices#

Start Simple: Begin with a basic workflow and add complexity (cycles, ensembles) incrementally
Use Meaningful Names: Application, configuration, and experiment names should be descriptive
Plan Your Cycles: Consider if cycles should be independent or sequential based on your science
Organize Stages Logically: Use prolog for setup, cycles for repeated work, epilog for finalization
Document Parameters: Add comments in your configuration explaining what parameters control
Test Incrementally: Test with a single cycle before running many, test with a few members before a large ensemble
Use Groups: Define task groups for commonly repeated task sequences

Common Patterns#

Pattern 1: Simple Time-Stepping Model

[cycles]
begin_date = 2020-01-01
ncycles = 30
freq = 1D
indep = False

[stages]
    [[cycles]]
    run = model_timestep

Pattern 2: Embarrassingly Parallel Processing

[cycles]
begin_date = 2020-01-01
ncycles = 365
freq = 1D
indep = True  # All days can process in parallel

[stages]
    [[cycles]]
    process = analyze_day

Pattern 3: Ensemble Forecast

[ensemble]
size = 50
tasks = run_forecast

[cycles]
begin_date = 2020-01-01
ncycles = 10
freq = 1D
indep = False

[stages]
    [[cycles]]
    forecast = run_forecast

Pattern 4: No Cycles, Just Stages

# No [cycles] section needed

[stages]
    [[prolog]]
    prepare = download, preprocess

    [[epilog]]
    analyze = statistics, visualize

Workflow Configuration In-Depth

Contents

Workflow Configuration In-Depth#

Structure Overview#

Application Configuration#

Basic Configuration#

Practical Example#

Cycles Configuration#

Date-Based Cycles#

Cycle Dependencies#

No Cycles#

Forecast Cycles (`horizon`)#

Ensemble Configuration#

Basic Ensemble#

Parameterized Ensembles#

Practical Example#

Workflow Stages#

Stage Types#

Stage Sequences and Parallelism#

Task Groups#

Skipping Tasks at Runtime#

Custom Parameters#

Global Parameters#

Host and Task-Specific Parameters#

Environment Variables#

Complete Example#

Best Practices#

Common Patterns#

See Also#

Workflow Configuration In-Depth

Contents

Workflow Configuration In-Depth#

Structure Overview#

Application Configuration#

Basic Configuration#

Practical Example#

Cycles Configuration#

Date-Based Cycles#

Cycle Dependencies#

No Cycles#

Forecast Cycles (horizon)#

Ensemble Configuration#

Basic Ensemble#

Parameterized Ensembles#

Practical Example#

Workflow Stages#

Stage Types#

Stage Sequences and Parallelism#

Task Groups#

Skipping Tasks at Runtime#

Custom Parameters#

Global Parameters#

Host and Task-Specific Parameters#

Environment Variables#

Complete Example#

Best Practices#

Common Patterns#

See Also#

Forecast Cycles (`horizon`)#