.. _quick:

Start guide
###########

The concept
===========

Woom helps you perform tasks in isolated environments, in a given order, optionally cycling through dates and ensemble members, on your laptop or on an HPC with a scheduler.

Here are some definitions.

A **task** consists of:

     * a job script generated by the workflow
     * submission arguments, if any, for submission to a scheduler
     * a list of dependent jobs whose successful execution conditions the start of the current job.

A **job script** is a bash file containing:

    * a line for :command:`trap` termination signals
    * a block that declares the environment
    * a line to change the directory
    * a block of commands that do the main job
    * A block to check that expected artifacts were created
    * a :command:`exit` command that outputs any trapped signal or 0.

    .. seealso:: :ref:`templates`

To set up your workflow:

#. Create a directory dedicated to your workflow.
#. Configure your tasks in the :file:`tasks.cfg` file, in particular their execution content and submission specifications.
#. Define the necessary environments, directories and scheduler specifications in the :file:`host.cfg` file.
#. Configure your workflow in the :file:`workflow.cfg` file, in particular the parameters for generating the job script, the cycle and ensemble specifications and the order in which tasks are submitted through the stages.
#. Add additional material such as the :file:`bin` and :file:`lib` directories, a :file:`ext` extension directory, or other useful files that you can access at runtime using the ``workflow_dir`` substitution parameter or the :envvar:`WOOM_WORKFLOW_DIR` environment variable.

A typical structure of the workflow directory is the following:

.. code-block:: bash

    workflow/
    ├── workflow.cfg  # mandatory
    ├── tasks.cfg     # mandatory
    ├── hosts.cfg     # mandatory
    ├── ext/          # optional, woom extensions
    │   ├── jinja_filters.py
    │   └── validator_functions.py
    ├── bin/          # optional, prepended to $PATH in the job script
    │   └── myscript.py
    └── lib/
        └── python   # optional, prepended to $PYTHONPATH in the job script
            └── mylib.py

You can add more stuff to this directory and access it using the ``{{ workflow_dir }}`` template
in configuration files or the :envvar:`WOOM_WORKFLOW_DIR` environment variable.

Configurations
==============

Tasks with :file:`tasks.cfg`
----------------------------

This file helps you configure tasks:

* Their **content** with the environment name (declared in the :file:`hosts.cfg`  file), the run directory, the shell command line(s) to be executed and the exit signals to trap.
* The files that are expected to be created by the task and that are named **artifacts**.
* Their **submission arguments** when using a scheduler, like the queue and the resources.

See the :mod:`configobj` :ref:`specifications <cfgspecs.tasks>` for this configuration.

In the following example, four tasks with arbitrary names are specified in the configuration file.
The command lines use jinja patterns such as ``{{ data_dir }}``, which are filled with entries from both the ``[params]`` section of the :file:`workflow.cfg` file and the default entries provided by the workflow (:ref:`inputs_context`).
Some of the tasks here use an environment named "prepost", that must be declared in the :file:`hosts.cfg` configuration file.

.. literalinclude:: samples/tasks.cfg
    :language: ini
    :caption: Example of :file:`tasks.cfg`

Hosts with :file:`hosts.cfg`
----------------------------

This file helps you configure hosts:

* The name patterns to guess the host from names.
* The scheduler, where "background" means "submitted in background".
* A few commands.
* A list of environments with their name and specifications that describe environment modules and variables, or a conda environment to load.

See the :mod:`configobj` :ref:`specifications <cfgspecs.hosts>` for this configuration.

This example file declares the resources available on the datarmor host, in particular its scheduler, the scratch dir taken from the :envvar:`SCRATCH` environment variable and the name of the ``seq`` queue.
An environment called ``prepost`` is declared using environment modules and environment variables.

.. literalinclude:: samples/hosts.cfg
    :language: ini
    :caption: Example of :file:`hosts.cfg`

The **default** :file:`hosts.cfg` declares the ``local`` host that matches any computer by default. When a user provides their own hosts file, this one is merged with the default file. The user must use the ``local`` to extend the configuration of the default host.

.. literalinclude:: ../woom/hosts.cfg
    :language: ini
    :caption: Default :file:`hosts.cfg`

Workflow with :file:`workflow.cfg`
----------------------------------

This file helps you configure the workflow:

* Your application specifications: name, configuration and experiment. It is optional but highly recommended.
* The way you want to cycle over dates. It is also optional.
* The specifications of your ensemble when you want to iterate over members.
* The additional configuration parameters that will be used to declare environment variables and format task command lines with jinja substitutions when generating the job scripts.
* The workflow graph through stages that defines in which order to execute the tasks as defined in :file:`tasks.cfg`.
* Groups of tasks that must be run sequentially in the workflow.

See the :mod:`configobj` :ref:`specifications <cfgspecs.workflow>` for this configuration.

In this example, we give our application a name, specify which data to loop over and declare the ``box`` and ``data_dir`` parameters, which can be used in the :file:`tasks.cfg` file.
The ``clean_data_dir`` task is executed only once and before the looping over dates because it is called in the ``[prolog]`` stage.
Other tasks are run sequentially for each date interval, except ``fetch_data`` and ``cp_config`` which are run in parallel since they are executed in parallel since they are called in the same sequence named ``fetch``.

.. literalinclude:: samples/workflow.cfg
    :language: ini
    :caption: Example of :file:`workflow.cfg`

Job script generation
=====================

The path to the job script is :file:`{{submission_dir}}/job.sh`.
The script is first exported and rendered with Jinja as a string by the :meth:`woom.tasks.Task.render_content` method, which contains Jinja patterns. See :ref:`start_jinja`.
The rendering is performed by :func:`woom.render.render` using a dictionary created by the :meth:`woom.workflow.Workflow.get_task_inputs` method.
See ":ref:`inputs_context`" to see its default content.
This dictionary is specific to a given task, at a given cycle, and for a given ensemble member.

Trapped exit signals
--------------------

Trapping the signal allows the job to return an exit status other than zero in the event of an error.
The exit status is stored in :file:`{{submission_dir}}}/job.status` and is interpreted by the workflow to know the status of the job.

Environment
-----------

The environment we need is specified by its name in the task configuration and is detailed in the host configuration.
It typically takes the form of environment module directives and environment variable declarations.

Run directory
-------------

It is specified in the task configuration and defaults to :file:`{{scratch_dir}}/{{task_path}}`.
You can use the ``scratch_dir`` and ``work_dir`` host configuration options, or any :ref:`other input parameter <inputs_context>`.

Command lines
-------------

The bash lines are the core of what the task does.
They are configured in the task configuration and rendered as bash lines thanks to the powerful Jinja templating system (see: :ref:`start_jinja`).

Exit
----

Any exit signal that occurs is stored in :file:`{{submission_dir}}/job.status`.
This signal is then issued by the :command:`exit` command.

Finally
-------

The standard output is saved in :file:`{{submission_dir}}/job.out` and
the standard error into :file:`{{submission_dir}}/job.err`.


.. _start_jinja:

Jinja rendering
===============

Jinja is a package that allows advanced template rendering.
See its `website <https://jinja.palletsprojects.com/en/stable/>`_ for detailed explanations.
It is used to generate the job scripts using template files and parameters.

The default templates are detailed in the :ref:`templates` section.
The user can extend these templates by providing its own :file:`job.sh` and :file:`env.sh` template files in the :file:`templates/` directory of its workflow directory.

Jinja perform substitutions thanks to a :class:`~woom.context.Context` instance that is a dictionary containing the useful objects for a given task, a given cycle and a given member, as explained in the :ref:`inputs_context` section.

Template Filling
================

Woom can automatically fill Jinja2 templates to generate configuration files, namelists, or scripts before task execution.

Configure in the ``[[fill]]`` section of your task:

.. code-block:: ini

    [run_model]
        [[fill]]
            [[[namelist]]]
            template = ocean.nml.j2
            destination = {{ task_run_dir }}/ocean.nml

Templates use the same context variables as job scripts (``{{ cycle.begin_date }}``, ``{{ params.timestep }}``, etc.) and are stored in the :file:`templates/` directory.

.. seealso:: :ref:`indepth.templating` for complete guide and :ref:`cli.woom.fill` for manual filling

Artifacts
=========

Artifacts are files that are expected to be generated by a given task.
They have two usages:

* The job must fail by default if the artifact paths are not present at the end of the task job.
* One task can access the artifacts of any other task given the task name, the artifact name and the context (cycle, member).

Artifacts are declared in the :file:`tasks.cfg` as subsections of the ``[[artifacts]]`` section of given task:

* The section name is the short name of the artifact.
* The ``paths`` is a single or a list of relative or absolute paths, function names, and is always returned as a list. When it is a functionn name, this function must be registered in ``artifacts_generators`` extension to generate file paths.
* The ``check`` option tells if all paths must be checked for their existence at the end of the job.
* The ``callable`` option tells if the paths must a interpreted as function name to generate paths.

.. code-block:: ini

    [download_clim]

        [[content]
        ...

        [[artifacts]]
            [[[clim_file]]]
                paths={{ task_run_dir }}/clim.c

.. warning:: All artifacts must ultimately be able to be converted to an absolute path. So you must either declare an artifact with an absolute path, prepend it with a directory mapping like ``{{ task_run_dir }}`` or provide a relative path and fill the ``run_dir`` option of a task.

To make reference to an artifact in a task, there are two cases:

* If the artifact belongs to the current task, do like:

  .. code-block:: jinja

      ncdump -h {{ task.artifacts["clim_file"][0] }}

* If the artifact belongs to another task, do like:

  .. code-block:: jinja

      ncdump -h {{ workflow.get_task_artifact_paths("clim_file", "download_clim")[0] }}

To list all artifacts, expected or generated, use the :ref:`woom_show_artifacts` command line function.

.. code-block:: bash

    $ woom show artifacts

Please have a look at :ref:`this example <examples.academic.artifacts>`.


Controlling and running the workflow
====================================

Run all woom commands from the workflow directory.
See the :ref:`examples` section for more illustrative examples.

.. tip:: All woom commands support the ``--help`` option

.. highlight:: bash

First, make sure that your workflow is well interpreted::

    $ woom show overview

Then, run your workflow in dry (fake) and debug mode::

    $ woom --logger-level debug run --dry-run

Then, run it in normal mode if everything is ok::

    $ woom run

To check the status of all jobs, especially on an HPC with a scheduler::

    $ woom show status

To kill jobs::

    $ woom kill      # all jobs
    $ woom kill 1264 # one job
    $ woom kill --task fetch_data # identified by task name

.. seealso:: :ref:`woom_main`, :ref:`woom_show`, :ref:`woom_run` and :ref:`woom_kill`