Artifacts In-Depth#
Artifacts are output files that your tasks produce. Woom tracks artifacts to verify task completion, provide visibility into outputs, and help with workflow debugging and data management.
What Are Artifacts?#
Artifacts represent important output files that:
Indicate successful task completion
Serve as inputs to downstream tasks
Represent final products of your workflow
Need to be validated and tracked
Examples:
Model output files (NetCDF, HDF5)
Restart/checkpoint files
Analysis results (CSV, plots)
Log files
Processed data products
Why Track Artifacts?#
Validation: Verify tasks completed successfully by checking outputs exist
Debugging: Quickly identify which files are missing
Documentation: See what files your workflow produces
Dependencies: Understand data flow between tasks
Data Management: Identify files to archive or clean up
Basic Configuration#
Simple Artifact#
[run_model]
[[artifacts]]
[[[output]]]
path = output.nc
check = True
Fields:
path: File path (absolute or relative torun_dir)check: Whether to verify file exists after task completes (default: True)callable: Whether path is a function name (default: False)
Multiple Artifacts#
[run_model]
[[artifacts]]
[[[output]]]
path = output.nc
check = True
[[[restart]]]
path = restart.nc
check = True
[[[log]]]
path = model.log
check = False # Optional file
Absolute vs Relative Paths#
Relative Paths#
Relative to task’s run_dir:
[task]
[[content]]
run_dir = /scratch/run
[[artifacts]]
[[[output]]]
path = results/output.nc # → /scratch/run/results/output.nc
Absolute Paths#
[[artifacts]]
[[[shared_output]]]
path = /data/shared/results.nc # Absolute path
Template Paths#
Use template variables:
[[artifacts]]
[[[output]]]
path = {{ task_run_dir }}/output_{{ cycle.token }}.nc
[[[dated_output]]]
path = /data/outputs/{{ cycle.date.year }}/{{ cycle.date.month }}/data.nc
Artifact Checking#
Mandatory Artifacts#
[[artifacts]]
[[[critical_output]]]
path = output.nc
check = True # Task fails if missing
If check = True and file doesn’t exist after task completes, woom marks the task as failed.
Optional Artifacts#
[[artifacts]]
[[[debug_log]]]
path = debug.log
check = False # Warning only if missing
If check = False, missing files generate warnings but don’t fail the task.
Multiple Files#
Lists of Files#
[[artifacts]]
[[[outputs]]]
path = output1.nc, output2.nc, output3.nc
check = True
Woom checks each file in the list.
Wildcards (Not Supported)#
Woom doesn’t support glob patterns in paths. Use callable generators instead:
[[artifacts]]
[[[all_outputs]]]
path = generate_output_list
callable = True
check = True
Dynamic Artifact Paths#
Using Templates#
Cycle-Dependent:
[[artifacts]]
[[[daily_output]]]
path = {{ task_run_dir }}/output_{{ cycle.date_str }}.nc
Member-Dependent:
[[artifacts]]
[[[ensemble_output]]]
path = {{ task_run_dir }}/output_{{ member.label }}.nc
Combined:
[[artifacts]]
[[[result]]]
path = {{ scratch_dir }}/{{ task_path }}/result_{{ cycle.token }}_{{ member.label }}.nc
Using Callable Generators#
For complex path generation:
tasks.cfg:
[[artifacts]]
[[[ensemble_outputs]]]
path = generate_ensemble_outputs
callable = True
check = True
[[[[kwargs]]]]
base_dir = {{ task_run_dir }}
prefix = member
ext/artifacts_generators.py:
from woom.tasks import ARTIFACTS_GENERATORS
def generate_ensemble_outputs(context, base_dir, prefix):
"""Generate list of ensemble output files"""
outputs = []
if context.get('member'):
# Single member - return its file
member = context['member']
outputs.append(f"{base_dir}/{prefix}_{member.label}.nc")
else:
# No member context - return all expected files
workflow = context['workflow']
for member in workflow.members:
outputs.append(f"{base_dir}/{prefix}_{member.label}.nc")
return outputs
ARTIFACTS_GENERATORS['generate_ensemble_outputs'] = generate_ensemble_outputs
Advanced: Time Series#
Generate daily file list:
def generate_daily_outputs(context, output_dir, pattern):
"""Generate daily output files for cycle"""
outputs = []
cycle = context.get('cycle')
if not cycle:
return outputs
current_date = cycle.begin_date
while current_date < cycle.end_date:
filename = pattern.format(
year=current_date.year,
month=current_date.month,
day=current_date.day
)
outputs.append(f"{output_dir}/{filename}")
current_date += pd.Timedelta(days=1)
return outputs
ARTIFACTS_GENERATORS['daily_outputs'] = generate_daily_outputs
Usage:
[[artifacts]]
[[[daily_files]]]
path = daily_outputs
callable = True
[[[[kwargs]]]]
output_dir = {{ task_run_dir }}/daily
pattern = output_{year:04d}{month:02d}{day:02d}.nc
Viewing Artifacts#
Command Line#
List all artifacts:
woom show artifacts
Filter by task:
woom show artifacts --task-name run_model
Filter by cycle:
woom show artifacts --cycle 2020-01-01
From Python#
# Get all artifacts for a task
artifacts = workflow.get_task_artifacts('run_model', cycle='2020-01-01')
for name, paths in artifacts.items():
print(f"{name}: {paths}")
# Get specific artifact
output_path = workflow.get_task_artifact_paths(
'output',
'run_model',
cycle='2020-01-01'
)
# Check if exists
import os
if os.path.exists(output_path):
print("Output file exists")
Artifacts DataFrame#
# Get DataFrame of all artifacts
df = workflow.get_artifacts()
print(df)
# Filter
df_model = workflow.get_artifacts(task_name='run_model')
# Check existence
missing = df[~df['EXISTS?']]
print(f"Missing files: {len(missing)}")
Common Patterns#
Model Outputs#
[run_ocean_model]
[[artifacts]]
[[[output]]]
path = {{ task_run_dir }}/ocean_{{ cycle.token }}.nc
check = True
[[[restart]]]
path = {{ task_run_dir }}/restart_{{ cycle.end_date_str }}.nc
check = True
[[[diagnostics]]]
path = {{ task_run_dir }}/diagnostics.nc
check = False # Optional
Analysis Results#
[compute_statistics]
[[artifacts]]
[[[stats]]]
path = {{ scratch_dir }}/analysis/stats_{{ cycle.token }}.csv
check = True
[[[plots]]]
path = plot_sst.png, plot_currents.png, plot_salinity.png
check = False # Plots are optional
Post-Processing#
[merge_outputs]
[[artifacts]]
[[[merged_file]]]
path = /data/final/merged_{{ app.exp }}_{{ cycle.date.year }}.nc
check = True
Data Download#
[download_forcing]
[[artifacts]]
[[[forcing_file]]]
path = {{ params.forcing_dir }}/era5_{{ cycle.date_str }}.nc
check = True
Ensemble Processing#
[ensemble_mean]
[[artifacts]]
[[[mean]]]
path = {{ scratch_dir }}/ensemble/mean_{{ cycle.token }}.nc
check = True
[[[std]]]
path = {{ scratch_dir }}/ensemble/std_{{ cycle.token }}.nc
check = True
[[[individual_members]]]
path = list_ensemble_files
callable = True
check = False # Don't fail if individual files missing
Artifact Best Practices#
Track Important Outputs: Define artifacts for critical files only
Use Meaningful Names: Artifact names should describe what the file contains
Set Appropriate Check Flags: -
check=Truefor required outputs -check=Falsefor optional/debug filesUse Template Variables: Make paths dynamic with cycle/member information
Organize Output Directories: Use consistent directory structures
Document Expectations: Comment what each artifact represents
Validate Paths: Test that paths are correct before running workflow
Handle Missing Gracefully: Use
check=Falsefor truly optional outputsConsider Downstream: Think about which files downstream tasks need
Archive Strategy: Identify which artifacts to keep long-term
Troubleshooting#
Artifact Not Found#
Symptoms: Task marked as failed, “Artifact not found” message
Causes:
Task didn’t create the file
Wrong path in configuration
File created in different location
Permissions prevent access
Debug:
# Check what files task created
ls -R /path/to/run/dir
# Compare to expected artifact path
woom show artifacts --task-name my_task
# Check job output
cat jobs/*/my_task/job.out
Solutions:
Verify task command actually creates file
Check path template rendering
Ensure run_dir is set correctly
Use absolute paths if needed
Check file permissions
Path Template Errors#
Symptoms: Path doesn’t render correctly
Causes:
Syntax error in template
Variable undefined in context
Wrong variable used
Debug:
# Check rendered path
context = workflow.get_context(task_name='my_task', cycle='2020-01-01')
task = workflow.get_task('my_task')
task.set_context(context)
artifacts = task.render_artifacts()
print(artifacts)
Solutions:
Test template syntax
Verify variables exist in context
Use
| default()for optional variablesCheck for typos in variable names
Callable Not Working#
Symptoms: Artifact generator doesn’t run or errors
Causes:
Function not registered
Wrong function signature
Runtime error in function
Debug:
# Check if registered
from woom.tasks import ARTIFACTS_GENERATORS
print('my_generator' in ARTIFACTS_GENERATORS)
# Test function directly
result = ARTIFACTS_GENERATORS['my_generator'](context, **kwargs)
print(result)
Solutions:
Ensure function is registered in ARTIFACTS_GENERATORS
Check function signature matches:
func(context, **kwargs)Add error handling in generator function
Test with simple case first
Wrong Files Checked#
Symptoms: Task succeeds but didn’t create expected files
Causes:
check=Falseon critical artifactsWrong artifact configured
Files created with different names
Solutions:
Set
check=Truefor required outputsVerify artifact names match actual outputs
Use callable to list actual files created
Review task output logs
Performance Issues#
Symptoms: Artifact checking is slow
Causes:
Too many artifacts defined
Network file system latency
Large file lists from callables
Solutions:
Only track essential artifacts
Use
check=Falsefor non-critical filesOptimize callable generators
Consider aggregate artifacts (one check for directory)
Integration with Workflow#
Artifacts as Dependencies#
While woom doesn’t automatically create task dependencies based on artifacts, you can design your workflow to reflect these relationships:
# Stage 1: Create data
[[prolog]]
prepare = download_forcing
# Stage 2: Use data
[[cycles]]
simulate = run_model # Uses forcing from prepare
Checking Before Run#
# Verify previous task's artifacts before running
artifacts = workflow.get_task_artifacts('previous_task', cycle='2020-01-01')
for name, paths in artifacts.items():
for path in paths:
if not os.path.exists(path):
raise RuntimeError(f"Required input missing: {path}")
# Now safe to run dependent task
workflow.run()
Cleanup Strategy#
# Show all artifacts with existence status
woom show artifacts > artifact_inventory.txt
# Use artifact info for cleanup decisions
# Keep final products, remove intermediate files
See Also#
Task Configuration In-Depth - Task configuration including artifacts
Tasks configuration specifications - Artifact configuration reference
woom show artifacts - Command line artifact viewing