Overview of the Pipeline¶
Processing workflow¶
The pre-processing takes place in the preprocess step, which is a wrapper to DP3. This step consists of four major categories: flagging, demixing, averaging, and compression. The flagging procedure is comprised of multiple stages of flagging with DP3’s PreFlagger and flagging of radio-frequency interference (RFI) with AOFlagger. Subsequently, A-team sources are demixed and the data is averaged in time and frequency. Finally, the visibility data is compressed using the Dysco compression algorithm.
The pipeline performs the following operations in order:
flagging edge channels
flagging correlation type (e.g., all auto-correlations)
flagging of data at low elevations (typically below 15 degrees)
flagging low-amplitude signals (below 1E-6)
flagging of interfering radio signals
optionally interpolating flagged values
demixing of A-team sources
averaging in time and frequency
compression of visibility data
Output¶
The Pre-Processing Pipeline produces the following output in the directory specified by --outdir:
pre-processed LOFAR MeasurementSets
a log file with the captured standard out and standard error (
pipeline.log)diagnostic data products
Diagnostics¶
To aid quality assessment of the pre-processed MeasurementSets, the pipeline produces a number of QA data products.
Diagnostic plots of the:
A-team source separation
demixing solutions
ratio of the noise before and after demixing
Dataloss for each station
RFI percentage over the observing bandwidth
Noise per station
Besides the plots, the pipeline also outputs the following raw data products as HDF5 files for further inspection:
a single file containing the dataloss, noise, and RFI percentage
norm of the demixing solutions
ratio of the noise before and after demixing
User-defined parameters¶
Most of these parameters set similarly named DP3 parameters. Hence, please refer to the relevant DP3 documentation pages for further details regarding their possible values; specifically, the following pages: PreFlagger, AOFlagger, Demixer, and Output.
Mandatory parameters:
msin: path to the input data (list of MeasurementSets).demix_timeres: time resolution to average during demixing; this does not affect averaging of the output.demix_freqres: frequency resolution to average during demixing; this does not affect averaging of the output.avg_timestep: this parameter defines the number of time steps that the output data will be averaged by.avg_freqstep: this parameter defines the number of channels that the output data will be averaged by.
Optional parameters:
Flagging options:
preflag_corrtype: select a type of correlation to flag, e.g., the auto-correlations (default:auto).preflag_elevation: flag the selected elevations, the syntax is described in the DP3 documentation (default:0deg..15deg)preflag_min_amplitude: data below this amplitude will be flagged (default:1E-6).aoflagger_rfistrategy: the RFI flagging strategy used by AOFlagger (default:lofar-default.lua).
Interpolation options:
use_interpolation: enable interpolation, which replaces flagged values by interpolating them using a neighrest neighbour approach taking a Gaussian weighted sum of nearby data (default:false).interpolation_windowsize: size of the window over which the value is interpolated (default:15). Note that this should be an odd number.
Options for demixing:
demix_skymodel: the skymodel used by the demixing algorithm (default:A-Team.skymodel).demix_sources: the list of sources to demix, e.g.,[CasA_Gaussian, CygA_Gaussian]. Note that these sources have to be present in the provided skymodel. (default:[VirA_Gaussian, CygA_Gaussian, CasA_Gaussian, TauA_Gaussian].)demix_baselines: select the baselines used to demix, the baseline selection syntax is described in the DP3 documentation (default:[CR]S*&).demix_ignoretarget: if set totrue, the source model of the target will not be taken into account during demixing, i.e., the target will be ignored (default:false).demix_lbfgs_robustdof: the degrees of freedom of the LBFGS solver noise model (default:200).demix_lbfgs_historysize: the history size the LBFGS solver uses to approximate the inverse Hessian (default:10).force_demix: if set totrue, force demixing using all sources specified bydemix_sources. If set tofalse, do not demix at all and if set tonull, automatically determine sources to be demixed (default:null).use_dnn: use the deep neural network model to determine demix parameters, if PyTorch is available (default:false). Note: if PyTorch is not available oruse_dnnisfalse, it will revert to the Fuzzy logic based approach; refer to Yatawatta, et al. 2026 for additional details.
Options for compression:
dysco_distribution: compression distribution used by the Dysco compression algorithm (default:TruncatedGaussian).dysco_databitrate: the number of bits per float used to represent the visibility data (default:10).dysco_weightbitrate: the number of bits per float used to represent the weights (default:12).
Miscellaneous parameters:
sasid: identifier of the SAS process that called the pipeline; this unique identifier is prefixed with an ‘L’. The SASID is used to name the output MSs, and replaces the Observation ID (default: reuse Observation ID).msin_autoweight: set this parameter to true when the input consist of raw LOFAR data, this will ensure proper weights are set (default:true).dp3_numthreads: the number of threads per process used by DP3 (default:10).dp3_log_filename: the filename of the concatenated DP3 log files; the filename includes an extension (default:pipeline.log).aoflagger_memorymax: maximum amount of memory AOFlagger is allowed to use in GBs (default:0, meaning no maximum is set).aoflagger_memoryperc: percentage of the host machine’s memory AOFlagger will use (default:0, which means the setting is unused). Note: if bothaoflagger_memorymaxand this option are set, AOFlagger will use requested percentage, but will be clipped by the user-provided memory maximum.