Artifacts In-Depth#

Artifacts are output files that your tasks produce. Woom tracks artifacts to verify task completion, provide visibility into outputs, and help with workflow debugging and data management.

What Are Artifacts?#

Artifacts represent important output files that:

Indicate successful task completion
Serve as inputs to downstream tasks
Represent final products of your workflow
Need to be validated and tracked

Examples:

Model output files (NetCDF, HDF5)
Restart/checkpoint files
Analysis results (CSV, plots)
Log files
Processed data products

Why Track Artifacts?#

Validation: Verify tasks completed successfully by checking outputs exist
Debugging: Quickly identify which files are missing
Documentation: See what files your workflow produces
Dependencies: Understand data flow between tasks
Data Management: Identify files to archive or clean up

Basic Configuration#

Simple Artifact#

[run_model]
    [[artifacts]]
        [[[output]]]
        path = output.nc
        check = True

Fields:

path: File path (absolute or relative to run_dir)
check: Whether to verify file exists after task completes (default: True)
callable: Whether path is a function name (default: False)

Multiple Artifacts#

[run_model]
    [[artifacts]]
        [[[output]]]
        path = output.nc
        check = True

        [[[restart]]]
        path = restart.nc
        check = True

        [[[log]]]
        path = model.log
        check = False  # Optional file

Absolute vs Relative Paths#

Relative Paths#

Relative to task’s run_dir:

[task]
    [[content]]
    run_dir = /scratch/run

    [[artifacts]]
        [[[output]]]
        path = results/output.nc  # → /scratch/run/results/output.nc

Absolute Paths#

[[artifacts]]
    [[[shared_output]]]
    path = /data/shared/results.nc  # Absolute path

Template Paths#

Use template variables:

[[artifacts]]
    [[[output]]]
    path = {{ task_run_dir }}/output_{{ cycle.token }}.nc

    [[[dated_output]]]
    path = /data/outputs/{{ cycle.date.year }}/{{ cycle.date.month }}/data.nc

Artifact Checking#

Mandatory Artifacts#

[[artifacts]]
    [[[critical_output]]]
    path = output.nc
    check = True  # Task fails if missing

If check = True and file doesn’t exist after task completes, woom marks the task as failed.

Optional Artifacts#

[[artifacts]]
    [[[debug_log]]]
    path = debug.log
    check = False  # Warning only if missing

If check = False, missing files generate warnings but don’t fail the task.

Multiple Files#

Lists of Files#

[[artifacts]]
    [[[outputs]]]
    path = output1.nc, output2.nc, output3.nc
    check = True

Woom checks each file in the list.

Wildcards (Not Supported)#

Woom doesn’t support glob patterns in paths. Use callable generators instead:

[[artifacts]]
    [[[all_outputs]]]
    path = generate_output_list
    callable = True
    check = True

Dynamic Artifact Paths#

Using Templates#

Cycle-Dependent:

[[artifacts]]
    [[[daily_output]]]
    path = {{ task_run_dir }}/output_{{ cycle.date_str }}.nc

Member-Dependent:

[[artifacts]]
    [[[ensemble_output]]]
    path = {{ task_run_dir }}/output_{{ member.label }}.nc

Combined:

[[artifacts]]
    [[[result]]]
    path = {{ scratch_dir }}/{{ task_path }}/result_{{ cycle.token }}_{{ member.label }}.nc

Using Callable Generators#

For complex path generation:

tasks.cfg:

[[artifacts]]
    [[[ensemble_outputs]]]
    path = generate_ensemble_outputs
    callable = True
    check = True

        [[[[kwargs]]]]
        base_dir = {{ task_run_dir }}
        prefix = member

ext/artifacts_generators.py:

from woom.tasks import ARTIFACTS_GENERATORS

def generate_ensemble_outputs(context, base_dir, prefix):
    """Generate list of ensemble output files"""
    outputs = []
    if context.get('member'):
        # Single member - return its file
        member = context['member']
        outputs.append(f"{base_dir}/{prefix}_{member.label}.nc")
    else:
        # No member context - return all expected files
        workflow = context['workflow']
        for member in workflow.members:
            outputs.append(f"{base_dir}/{prefix}_{member.label}.nc")
    return outputs

ARTIFACTS_GENERATORS['generate_ensemble_outputs'] = generate_ensemble_outputs

Advanced: Time Series#

Generate daily file list:

def generate_daily_outputs(context, output_dir, pattern):
    """Generate daily output files for cycle"""
    outputs = []
    cycle = context.get('cycle')
    if not cycle:
        return outputs

    current_date = cycle.begin_date
    while current_date < cycle.end_date:
        filename = pattern.format(
            year=current_date.year,
            month=current_date.month,
            day=current_date.day
        )
        outputs.append(f"{output_dir}/{filename}")
        current_date += pd.Timedelta(days=1)

    return outputs

ARTIFACTS_GENERATORS['daily_outputs'] = generate_daily_outputs

Usage:

[[artifacts]]
    [[[daily_files]]]
    path = daily_outputs
    callable = True

        [[[[kwargs]]]]
        output_dir = {{ task_run_dir }}/daily
        pattern = output_{year:04d}{month:02d}{day:02d}.nc

Viewing Artifacts#

Command Line#

List all artifacts:

woom show artifacts

Filter by task:

woom show artifacts --task-name run_model

Filter by cycle:

woom show artifacts --cycle 2020-01-01

From Python#

# Get all artifacts for a task
artifacts = workflow.get_task_artifacts('run_model', cycle='2020-01-01')

for name, paths in artifacts.items():
    print(f"{name}: {paths}")

# Get specific artifact
output_path = workflow.get_task_artifact_paths(
    'output',
    'run_model',
    cycle='2020-01-01'
)

# Check if exists
import os
if os.path.exists(output_path):
    print("Output file exists")

Artifacts DataFrame#

# Get DataFrame of all artifacts
df = workflow.get_artifacts()
print(df)

# Filter
df_model = workflow.get_artifacts(task_name='run_model')

# Check existence
missing = df[~df['EXISTS?']]
print(f"Missing files: {len(missing)}")

Common Patterns#

Model Outputs#

[run_ocean_model]
    [[artifacts]]
        [[[output]]]
        path = {{ task_run_dir }}/ocean_{{ cycle.token }}.nc
        check = True

        [[[restart]]]
        path = {{ task_run_dir }}/restart_{{ cycle.end_date_str }}.nc
        check = True

        [[[diagnostics]]]
        path = {{ task_run_dir }}/diagnostics.nc
        check = False  # Optional

Analysis Results#

[compute_statistics]
    [[artifacts]]
        [[[stats]]]
        path = {{ scratch_dir }}/analysis/stats_{{ cycle.token }}.csv
        check = True

        [[[plots]]]
        path = plot_sst.png, plot_currents.png, plot_salinity.png
        check = False  # Plots are optional

Post-Processing#

[merge_outputs]
    [[artifacts]]
        [[[merged_file]]]
        path = /data/final/merged_{{ app.exp }}_{{ cycle.date.year }}.nc
        check = True

Data Download#

[download_forcing]
    [[artifacts]]
        [[[forcing_file]]]
        path = {{ params.forcing_dir }}/era5_{{ cycle.date_str }}.nc
        check = True

Ensemble Processing#

[ensemble_mean]
    [[artifacts]]
        [[[mean]]]
        path = {{ scratch_dir }}/ensemble/mean_{{ cycle.token }}.nc
        check = True

        [[[std]]]
        path = {{ scratch_dir }}/ensemble/std_{{ cycle.token }}.nc
        check = True

        [[[individual_members]]]
        path = list_ensemble_files
        callable = True
        check = False  # Don't fail if individual files missing

Artifact Best Practices#

Track Important Outputs: Define artifacts for critical files only
Use Meaningful Names: Artifact names should describe what the file contains
Set Appropriate Check Flags: - check=True for required outputs - check=False for optional/debug files
Use Template Variables: Make paths dynamic with cycle/member information
Organize Output Directories: Use consistent directory structures
Document Expectations: Comment what each artifact represents
Validate Paths: Test that paths are correct before running workflow
Handle Missing Gracefully: Use check=False for truly optional outputs
Consider Downstream: Think about which files downstream tasks need
Archive Strategy: Identify which artifacts to keep long-term

Troubleshooting#

Artifact Not Found#

Symptoms: Task marked as failed, “Artifact not found” message

Causes:

Task didn’t create the file
Wrong path in configuration
File created in different location
Permissions prevent access

Debug:

# Check what files task created
ls -R /path/to/run/dir

# Compare to expected artifact path
woom show artifacts --task-name my_task

# Check job output
cat jobs/*/my_task/job.out

Solutions:

Verify task command actually creates file
Check path template rendering
Ensure run_dir is set correctly
Use absolute paths if needed
Check file permissions

Path Template Errors#

Symptoms: Path doesn’t render correctly

Causes:

Syntax error in template
Variable undefined in context
Wrong variable used

Debug:

# Check rendered path
context = workflow.get_context(task_name='my_task', cycle='2020-01-01')
task = workflow.get_task('my_task')
task.set_context(context)
artifacts = task.render_artifacts()
print(artifacts)

Solutions:

Test template syntax
Verify variables exist in context
Use | default() for optional variables
Check for typos in variable names

Callable Not Working#

Symptoms: Artifact generator doesn’t run or errors

Causes:

Function not registered
Wrong function signature
Runtime error in function

Debug:

# Check if registered
from woom.tasks import ARTIFACTS_GENERATORS
print('my_generator' in ARTIFACTS_GENERATORS)

# Test function directly
result = ARTIFACTS_GENERATORS['my_generator'](context, **kwargs)
print(result)

Solutions:

Ensure function is registered in ARTIFACTS_GENERATORS
Check function signature matches: func(context, **kwargs)
Add error handling in generator function
Test with simple case first

Wrong Files Checked#

Symptoms: Task succeeds but didn’t create expected files

Causes:

check=False on critical artifacts
Wrong artifact configured
Files created with different names

Solutions:

Set check=True for required outputs
Verify artifact names match actual outputs
Use callable to list actual files created
Review task output logs

Performance Issues#

Symptoms: Artifact checking is slow

Causes:

Too many artifacts defined
Network file system latency
Large file lists from callables

Solutions:

Only track essential artifacts
Use check=False for non-critical files
Optimize callable generators
Consider aggregate artifacts (one check for directory)

Integration with Workflow#

Artifacts as Dependencies#

While woom doesn’t automatically create task dependencies based on artifacts, you can design your workflow to reflect these relationships:

# Stage 1: Create data
[[prolog]]
prepare = download_forcing

# Stage 2: Use data
[[cycles]]
simulate = run_model  # Uses forcing from prepare

Checking Before Run#

# Verify previous task's artifacts before running
artifacts = workflow.get_task_artifacts('previous_task', cycle='2020-01-01')

for name, paths in artifacts.items():
    for path in paths:
        if not os.path.exists(path):
            raise RuntimeError(f"Required input missing: {path}")

# Now safe to run dependent task
workflow.run()

Cleanup Strategy#

# Show all artifacts with existence status
woom show artifacts > artifact_inventory.txt

# Use artifact info for cleanup decisions
# Keep final products, remove intermediate files

Artifacts In-Depth

Contents

Artifacts In-Depth#

What Are Artifacts?#

Why Track Artifacts?#

Basic Configuration#

Simple Artifact#

Multiple Artifacts#

Absolute vs Relative Paths#

Relative Paths#

Absolute Paths#

Template Paths#

Artifact Checking#

Mandatory Artifacts#

Optional Artifacts#

Multiple Files#

Lists of Files#

Wildcards (Not Supported)#

Dynamic Artifact Paths#

Using Templates#

Using Callable Generators#

Advanced: Time Series#

Viewing Artifacts#

Command Line#

From Python#

Artifacts DataFrame#

Common Patterns#

Model Outputs#

Analysis Results#

Post-Processing#

Data Download#

Ensemble Processing#

Artifact Best Practices#

Troubleshooting#

Artifact Not Found#

Path Template Errors#

Callable Not Working#

Wrong Files Checked#

Performance Issues#

Integration with Workflow#

Artifacts as Dependencies#

Checking Before Run#

Cleanup Strategy#

See Also#