Overview of the Pipeline ======================== Processing workflow ------------------- The pre-processing takes place in the ``preprocess`` step, which is a wrapper to `DP3 `_. This step consists of four major categories: flagging, demixing, averaging, and compression. The flagging procedure is comprised of multiple stages of flagging with DP3's PreFlagger and flagging of radio-frequency interference (RFI) with `AOFlagger `_. Subsequently, A-team sources are demixed and the data is averaged in time and frequency. Finally, the visibility data is compressed using the `Dysco `_ compression algorithm. The pipeline performs the following operations in order: * flagging edge channels * flagging correlation type (e.g., all auto-correlations) * flagging of data at low elevations (typically below 15 degrees) * flagging low-amplitude signals (below 1E-6) * flagging of interfering radio signals * optionally interpolating flagged values * demixing of A-team sources * averaging in time and frequency * compression of visibility data Output ------ The Pre-Processing Pipeline produces the following output in the directory specified by ``--outdir``: * pre-processed LOFAR MeasurementSets * a log file with the captured standard out and standard error (``pipeline.log``) * diagnostic data products Diagnostics ----------- To aid quality assessment of the pre-processed MeasurementSets, the pipeline produces a number of QA data products. Diagnostic plots of the: * A-team source separation * demixing solutions * ratio of the noise before and after demixing * Dataloss for each station * RFI percentage over the observing bandwidth * Noise per station Besides the plots, the pipeline also outputs the following raw data products as HDF5 files for further inspection: * a single file containing the dataloss, noise, and RFI percentage * norm of the demixing solutions * ratio of the noise before and after demixing User-defined parameters ----------------------- Most of these parameters set similarly named DP3 parameters. Hence, please refer to the relevant DP3 documentation pages for further details regarding their possible values; specifically, the following pages: `PreFlagger `_, `AOFlagger `_, `Demixer `_, and `Output `_. **Mandatory parameters:** * ``msin``: path to the input data (list of MeasurementSets). * ``demix_timeres``: time resolution to average during demixing; this does not affect averaging of the output. * ``demix_freqres``: frequency resolution to average during demixing; this does not affect averaging of the output. * ``avg_timestep``: this parameter defines the number of time steps that the output data will be averaged by. * ``avg_freqstep``: this parameter defines the number of channels that the output data will be averaged by. **Optional parameters:** *Flagging options:* * ``preflag_corrtype``: select a type of correlation to flag, e.g., the auto-correlations (default: ``auto``). * ``preflag_elevation``: flag the selected elevations, the syntax is described in the `DP3 documentation `_ (default: ``0deg..15deg``) * ``preflag_min_amplitude``: data below this amplitude will be flagged (default: ``1E-6``). * ``aoflagger_rfistrategy``: the RFI flagging strategy used by AOFlagger (default: ``lofar-default.lua``). *Interpolation options:* * ``use_interpolation``: enable interpolation, which replaces flagged values by interpolating them using a neighrest neighbour approach taking a Gaussian weighted sum of nearby data (default: ``false``). * ``interpolation_windowsize``: size of the window over which the value is interpolated (default: ``15``). Note that this should be an odd number. *Options for demixing:* * ``demix_skymodel``: the skymodel used by the demixing algorithm (default: ``A-Team.skymodel``). * ``demix_sources``: the list of sources to demix, e.g., ``[CasA_Gaussian, CygA_Gaussian]``. Note that these sources have to be present in the provided skymodel. (default: ``[VirA_Gaussian, CygA_Gaussian, CasA_Gaussian, TauA_Gaussian]``.) * ``demix_baselines``: select the baselines used to demix, the baseline selection syntax is described in the `DP3 documentation `_ (default: ``[CR]S*&``). * ``demix_ignoretarget``: if set to ``true``, the source model of the target will not be taken into account during demixing, i.e., the target will be ignored (default: ``false``). * ``demix_lbfgs_robustdof``: the degrees of freedom of the LBFGS solver noise model (default: ``200``). * ``demix_lbfgs_historysize``: the history size the LBFGS solver uses to approximate the inverse Hessian (default: ``10``). * ``force_demix``: if set to ``true``, force demixing using all sources specified by ``demix_sources``. If set to ``false``, do not demix at all and if set to ``null``, automatically determine sources to be demixed (default: ``null``). * ``use_dnn``: use the deep neural network model to determine demix parameters, if PyTorch is available (default: ``false``). Note: if PyTorch is not available or ``use_dnn`` is ``false``, it will revert to the Fuzzy logic based approach; refer to `Yatawatta, et al. 2026 `_ for additional details. *Options for compression:* * ``dysco_distribution``: compression distribution used by the Dysco compression algorithm (default: ``TruncatedGaussian``). * ``dysco_databitrate``: the number of bits per float used to represent the visibility data (default: ``10``). * ``dysco_weightbitrate``: the number of bits per float used to represent the weights (default: ``12``). *Miscellaneous parameters:* * ``sasid``: identifier of the SAS process that called the pipeline; this unique identifier is prefixed with an 'L'. The SASID is used to name the output MSs, and replaces the Observation ID (default: reuse Observation ID). * ``msin_autoweight``: set this parameter to true when the input consist of raw LOFAR data, this will ensure proper weights are set (default: ``true``). * ``dp3_numthreads``: the number of threads per process used by DP3 (default: ``10``). * ``dp3_log_filename``: the filename of the concatenated DP3 log files; the filename includes an extension (default: ``pipeline.log``). * ``aoflagger_memorymax``: maximum amount of memory AOFlagger is allowed to use in GBs (default: ``0``, meaning no maximum is set). * ``aoflagger_memoryperc``: percentage of the host machine's memory AOFlagger will use (default: ``0``, which means the setting is unused). Note: if both ``aoflagger_memorymax`` and this option are set, AOFlagger will use requested percentage, but will be clipped by the user-provided memory maximum.