Running the Pipeline ==================== Starting a pipeline run ----------------------- The Pre-Processing Pipeline can be run from the command line using a CWL runner, e.g., ``cwltool`` or ``toil``. Below we describe how the pipeline can be run with these runners. .. code:: console $ cwltool --no-container $PREPROCESS_ROOT/workflows/pipeline.cwl input.json $ toil-cwl-runner --no-container $PREPROCESS_ROOT/workflows/pipeline.cwl input.json where ``$PREPROCESS_ROOT`` refers to the location where the CWL files have been installed; this environment variable needs to be set! The pipeline parameters are provided via a JSON file, described at the bottom of this page. Additionally, ``cwltool`` and ``toil`` come with a number of useful command line arguments, some of which are listed below. Please refer to their respective documentation for a full overview. .. note:: Do not forget to add the ``--no-container`` option when the dependencies have been installed locally on system. Besides fully running the pipeline in a container (as explained in the next section), this is the only supported way to execute the Pre-Processing Pipeline. Starting a run from within a container -------------------------------------- If you followed the Docker installation instructions on the :doc:`installation` page, you can run the container using Docker as follows: .. code:: console $ docker run --rm -v : -w preprocess cwltool --preserve-entire-environment --no-container /usr/local/share/prep/workflows/pipeline.cwl input.json Since the Pre-Processing Pipeline is running inside a container, do not forgot to add the ``--no-container`` option to your CWL runner. ``cwltool`` options ------------------- There are a number of command-line options you might want consider adding when running ``cwltool``: * ``--outdir``: specifies the (relative) path to the directory containing the output of the pipeline (make sure to mount this directory when running the pipeline in a container) * ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool`` (make sure to mount this directory when running the pipeline in a container) * ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies (or when running the pipeline inside a container) * ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually or when running fully inside a container) * ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker * ``--debug``: more verbose output, useful when debugging Make sure to mount the output and log directories, specified by ``--outdir`` and ``--log-dir``, when running the pipeline inside a container to ensure the files are not lost after execution. A full overview of CLI arguments is available in their `documentation `_. ``toil`` options ---------------- Similarly, these options might be of interest when using ``toil``: * ``--outdir``: specifies the path to the directory containing the output of the pipeline * ``--workDir``: specifies the path to the directory where the temporary files generated by Toil should be placed * ``--log-dir``: specifies the location of the log files produces by the ``stdout`` and ``stderr`` of a ``CommandLineTool`` * ``--logFile``: path to the main log file * ``—-jobStore``: path to the Toil job-store (must not exist yet) * ``—batchSystem``: use a specific batch system of a HPC cluster (e.g., ``slurm`` or ``single_machine``) * ``--preserve-entire-environment``: use your system's environment variables when manually installing the dependencies * ``--no-container``: do not execute jobs in a container (add this when the dependencies have been installed manually or when running fully inside a container) * ``--singularity``: use the Apptainer (previously Singularity) runtime for running containers instead of Docker * ``--stats``: with this option Toil collects runtime statistics (they can be used by ``toil stats``) Make sure to mount the output, log directories, and working directories, when running the pipeline inside a container to ensure the files are not lost after execution. A full overview of CLI arguments is available in their `documentation `_. Configuring the pipeline ------------------------ The parameters of the pipeline are provided as a JSON file. As an example, a minimal input could be a list of MeasurementSets (MSs) that you would like to process: .. code:: json { "msin": [ { "class": "Directory", "path": "/data/L888536_SAP000_SB026_uv.MS" }, { "class": "Directory", "path": "/data/L888536_SAP000_SB027_uv.MS" } ] } Refer to the :doc:`overview` section for a full overview of all pipeline parameters and their default values.