Topic: Data Science
Podium Presentation in Room 5 on Thursday at 11:00 (Chair: Irene van den Broek)
Authors: Bo Burla (1), Jeremy John Selva (2), Shanshan Ji (1), Gao Liang (2), Peter Benke (2), Anne K. Bendt (1), Federico Torta (2), Markus R. Wenk (1,2)
Data processing and quality control are integral elements of quantitative omics workflows. They can have a major impact on the results and can themselves be sources of variability, bias and artefacts. Furthermore, the data processing methods used in lipidomics/metabolomics analyses are often not sufficiently and transparently documented, limiting transparency and reusability of published datasets.
The developed pipeline and software aim to provide a structured, supervised and reproducible data processing and quality control workflow for quantitative lipidomics raw datasets.
The data processing and analytical quality control pipeline was developed based on published procedures (e.g. Broadhurst et al., Metabolomics, 2018) and insights obtained from processing of the lipidomics analyses in our lab. The core of the LIDAR software pipeline is implemented as an R package with defined data structures and classes. The LIDAR user interface is implemented in R/Shiny with interactive plots and Rmarkdown. The tool is internally deployed using Docker containers.
The LIDAR toolbox comprises modules for the system suitability monitoring, validation of integrated peaks based on retention times and ion ratios, internal standard-based normalization and quantification, correction for isotopic interferences, testing for matrix effects, performing drift and batch corrections, standardizations with reference materials. Functions to filter datasets according to defined QC criteria (e.g. RSD, S/N) are implemented as well. LiDAR reports and visualizes the effects of each of data processing steps before and after. It therefore allows potential identification of analytical issues, and artefacts introduced by data processing steps, which is valuable in the method development/validation phase but also for QA/QC of established assays. Using this tool we show examples of artefacts that can result from data processing, e.g. that ISTD-based normalization can bias or inflate the variability of the results. Furthermore, a few lipidomics-specific data exploration tools are also provided in LIDAR, e.g. considering fatty acids compositions and lipid pathways.
The presented workflow, accompanied by a software toolbox, enables automated, supervised and reproducible data processing of large-scale complex LC/shotgun MS-based lipidomics datasets by lab analysts. This toolbox, implemented in R with defined data structures/interfaces, should facilitate addition of new/improved functionalities by the community. This workflow may also be useful for targeted metabolomics assays. We hope that this workflow and toolbox will contribute to the ongoing efforts towards harmonization and more reproducible research in the lipidomics field.
|Planning to mention or discuss specific products or technology of the company(ies) listed above:||