Overview of the Pipeline
========================
Processing workflow
-------------------
The pre-processing takes place in the ``preprocess`` step, which is a wrapper to `DP3 `_. This step consists of four major categories: flagging, demixing, averaging, and compression. The flagging procedure is comprised of multiple stages of flagging with DP3's PreFlagger and flagging of radio-frequency interference (RFI) with `AOFlagger `_. Subsequently, A-team sources are demixed and the data is averaged in time and frequency. Finally, the visibility data is compressed using the `Dysco `_ compression algorithm.
The pipeline performs the following operations in order:
* flagging edge channels
* flagging correlation type (e.g., all auto-correlations)
* flagging of data at low elevations (typically below 15 degrees)
* flagging low-amplitude signals (below 1E-6)
* flagging of interfering radio signals
* optionally interpolating flagged values
* demixing of A-team sources
* averaging in time and frequency
* compression of visibility data
Output
------
The Pre-Processing Pipeline produces the following output in the directory specified by ``--outdir``:
* pre-processed LOFAR MeasurementSets
* a log file with the captured standard out and standard error (``pipeline.log``)
* diagnostic data products
Diagnostics
-----------
To aid quality assessment of the pre-processed MeasurementSets, the pipeline produces a number of QA data products.
Diagnostic plots of the:
* A-team source separation
* demixing solutions
* ratio of the noise before and after demixing
* Dataloss for each station
* RFI percentage over the observing bandwidth
* Noise per station
Besides the plots, the pipeline also outputs the following raw data products as HDF5 files for further inspection:
* a single file containing the dataloss, noise, and RFI percentage
* norm of the demixing solutions
* ratio of the noise before and after demixing
User-defined parameters
-----------------------
Most of these parameters set similarly named DP3 parameters. Hence, please refer to the relevant DP3 documentation pages for further details regarding their possible values; specifically, the following pages: `PreFlagger `_, `AOFlagger `_, `Demixer `_, and `Output `_.
**Mandatory parameters:**
* ``msin``: path to the input data (list of MeasurementSets).
* ``demix_timeres``: time resolution to average during demixing; this does not affect averaging of the output.
* ``demix_freqres``: frequency resolution to average during demixing; this does not affect averaging of the output.
* ``avg_timestep``: this parameter defines the number of time steps that the output data will be averaged by.
* ``avg_freqstep``: this parameter defines the number of channels that the output data will be averaged by.
**Optional parameters:**
*Flagging options:*
* ``preflag_corrtype``: select a type of correlation to flag, e.g., the auto-correlations (default: ``auto``).
* ``preflag_elevation``: flag the selected elevations, the syntax is described in the `DP3 documentation `_ (default: ``0deg..15deg``)
* ``preflag_min_amplitude``: data below this amplitude will be flagged (default: ``1E-6``).
* ``aoflagger_rfistrategy``: the RFI flagging strategy used by AOFlagger (default: ``lofar-default.lua``).
*Interpolation options:*
* ``use_interpolation``: enable interpolation, which replaces flagged values by interpolating them using a neighrest neighbour approach taking a Gaussian weighted sum of nearby data (default: ``false``).
* ``interpolation_windowsize``: size of the window over which the value is interpolated (default: ``15``). Note that this should be an odd number.
*Options for demixing:*
* ``demix_skymodel``: the skymodel used by the demixing algorithm (default: ``A-Team.skymodel``).
* ``demix_sources``: the list of sources to demix, e.g., ``[CasA_Gaussian, CygA_Gaussian]``. Note that these sources have to be present in the provided skymodel. (default: ``[VirA_Gaussian, CygA_Gaussian, CasA_Gaussian, TauA_Gaussian]``.)
* ``demix_baselines``: select the baselines used to demix, the baseline selection syntax is described in the `DP3 documentation `_ (default: ``[CR]S*&``).
* ``demix_ignoretarget``: if set to ``true``, the source model of the target will not be taken into account during demixing, i.e., the target will be ignored (default: ``false``).
* ``demix_lbfgs_robustdof``: the degrees of freedom of the LBFGS solver noise model (default: ``200``).
* ``demix_lbfgs_historysize``: the history size the LBFGS solver uses to approximate the inverse Hessian (default: ``10``).
* ``force_demix``: if set to ``true``, force demixing using all sources specified by ``demix_sources``. If set to ``false``, do not demix at all and if set to ``null``, automatically determine sources to be demixed (default: ``null``).
* ``use_dnn``: use the deep neural network model to determine demix parameters, if PyTorch is available (default: ``false``). Note: if PyTorch is not available or ``use_dnn`` is ``false``, it will revert to the Fuzzy logic based approach; refer to `Yatawatta, et al. 2026 `_ for additional details.
*Options for compression:*
* ``dysco_distribution``: compression distribution used by the Dysco compression algorithm (default: ``TruncatedGaussian``).
* ``dysco_databitrate``: the number of bits per float used to represent the visibility data (default: ``10``).
* ``dysco_weightbitrate``: the number of bits per float used to represent the weights (default: ``12``).
*Miscellaneous parameters:*
* ``sasid``: identifier of the SAS process that called the pipeline; this unique identifier is prefixed with an 'L'. The SASID is used to name the output MSs, and replaces the Observation ID (default: reuse Observation ID).
* ``msin_autoweight``: set this parameter to true when the input consist of raw LOFAR data, this will ensure proper weights are set (default: ``true``).
* ``dp3_numthreads``: the number of threads per process used by DP3 (default: ``10``).
* ``dp3_log_filename``: the filename of the concatenated DP3 log files; the filename includes an extension (default: ``pipeline.log``).
* ``aoflagger_memorymax``: maximum amount of memory AOFlagger is allowed to use in GBs (default: ``0``, meaning no maximum is set).
* ``aoflagger_memoryperc``: percentage of the host machine's memory AOFlagger will use (default: ``0``, which means the setting is unused). Note: if both ``aoflagger_memorymax`` and this option are set, AOFlagger will use requested percentage, but will be clipped by the user-provided memory maximum.