Running the Pipeline¶

Starting a pipeline run¶

The Pre-Processing Pipeline can be run from the command line using a CWL runner, e.g., cwltool or toil. Below we describe how the pipeline can be run with these runners.

$ cwltool --no-container $PREPROCESS_ROOT/workflows/pipeline.cwl input.json

$ toil-cwl-runner --no-container $PREPROCESS_ROOT/workflows/pipeline.cwl input.json

where $PREPROCESS_ROOT refers to the location where the CWL files have been installed; this environment variable needs to be set! The pipeline parameters are provided via a JSON file, described at the bottom of this page. Additionally, cwltool and toil come with a number of useful command line arguments, some of which are listed below. Please refer to their respective documentation for a full overview.

Note

Do not forget to add the --no-container option when the dependencies have been installed locally on system. Besides fully running the pipeline in a container (as explained in the next section), this is the only supported way to execute the Pre-Processing Pipeline.

Starting a run from within a container¶

If you followed the Docker installation instructions on the Downloading and Installation page, you can run the container using Docker as follows:

$ docker run --rm -v <source_directory>:<mount_point> -w <mount_point> preprocess cwltool --preserve-entire-environment --no-container /usr/local/share/prep/workflows/pipeline.cwl input.json

Since the Pre-Processing Pipeline is running inside a container, do not forgot to add the --no-container option to your CWL runner.

`cwltool` options¶

There are a number of command-line options you might want consider adding when running cwltool:

--outdir: specifies the (relative) path to the directory containing the output of the pipeline (make sure to mount this directory when running the pipeline in a container)
--log-dir: specifies the location of the log files produces by the stdout and stderr of a CommandLineTool (make sure to mount this directory when running the pipeline in a container)
--preserve-entire-environment: use your system’s environment variables when manually installing the dependencies (or when running the pipeline inside a container)
--no-container: do not execute jobs in a container (add this when the dependencies have been installed manually or when running fully inside a container)
--singularity: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
--debug: more verbose output, useful when debugging

Make sure to mount the output and log directories, specified by --outdir and --log-dir, when running the pipeline inside a container to ensure the files are not lost after execution.

A full overview of CLI arguments is available in their documentation.

`toil` options¶

Similarly, these options might be of interest when using toil:

--outdir: specifies the path to the directory containing the output of the pipeline
--workDir: specifies the path to the directory where the temporary files generated by Toil should be placed
--log-dir: specifies the location of the log files produces by the stdout and stderr of a CommandLineTool
--logFile: path to the main log file
—-jobStore: path to the Toil job-store (must not exist yet)
—batchSystem: use a specific batch system of a HPC cluster (e.g., slurm or single_machine)
--preserve-entire-environment: use your system’s environment variables when manually installing the dependencies
--no-container: do not execute jobs in a container (add this when the dependencies have been installed manually or when running fully inside a container)
--singularity: use the Apptainer (previously Singularity) runtime for running containers instead of Docker
--stats: with this option Toil collects runtime statistics (they can be used by toil stats)

Make sure to mount the output, log directories, and working directories, when running the pipeline inside a container to ensure the files are not lost after execution.

A full overview of CLI arguments is available in their documentation.

Configuring the pipeline¶

The parameters of the pipeline are provided as a JSON file. As an example, a minimal input could be a list of MeasurementSets (MSs) that you would like to process:

{
     "msin": [
         {
             "class": "Directory",
             "path": "/data/L888536_SAP000_SB026_uv.MS"
         },
         {
             "class": "Directory",
             "path": "/data/L888536_SAP000_SB027_uv.MS"
         }
     ]
 }

Refer to the Overview of the Pipeline section for a full overview of all pipeline parameters and their default values.

Running the Pipeline¶

Starting a pipeline run¶

Starting a run from within a container¶

`cwltool` options¶

`toil` options¶

Configuring the pipeline¶

LOFAR Pre-Processing Pipeline

Navigation

Related Topics

Running the Pipeline¶

Starting a pipeline run¶

Starting a run from within a container¶

cwltool options¶

toil options¶

Configuring the pipeline¶

`cwltool` options¶

`toil` options¶