MSACL 2017 US Abstract

A Computational Pipeline for Accurate and Reproducible Analysis of Peptides in Data Independent Acquisition MS Data: Application to Human Clinical Samples

Jarrett Egertson (Presenter)
University of Washington

Bio: Jarrett Egertson is a staff scientist at the Michael MacCoss lab at the University of Washington. He has 10 years of experience in the field of mass spectrometry-based proteomics with specific expertise on the design of data independent acquisition (DIA) based LC-MS/MS assays. His thesis work focused on the development of MSX -- a highly selective, multiplexed DIA approach implemented on quadrupole-Orbitrap mass spectrometers. He currently heads the services division of the MacCoss lab which offers services covering targeted assay design, evaluation, and execution to clients in industry and academia.

Authorship: Jarrett D. Egertson, Brian C. Searle, Gennifer E. Merrihew, Jason M. Gilmore, Ying S. Ting, Brendan X MacLean, Lindsay K. Pino, Han-Yin Yang, Thomas J. Montine, Michael J. MacCoss
University of Washington, Seattle

Short Abstract
We present a pipeline for automated analysis of data acquired using data independent acquisition (DIA). Beyond removing the potential for human-error, automated peak integration of DIA data is necessary due to the large amount of analytes measured in a single DIA LC-MS/MS run. We present details of the data processing workflow in the context of an application to an Alzheimer’s disease cohort consisting of samples from human brain autopsies. We assess the impact of the pipeline on precision by comparing human and automated analysis of a “5x5” data set with sample assayed repeatedly between and within days. We also demonstrate quality control steps in the form of summary statistics and visualizations for each step of the pipeline. Finally, we discuss containerization and versioning o support repeatable data processing.

Long Abstract

-- Introduction --

Similar to selected reaction monitoring (SRM) approaches, data independent acquisition (DIA) approaches quantify analytes based on the signal of a set of selective fragment ions measured by MS/MS and integrated over time. Unlike SRM, DIA approaches comprehensively measure MS/MS data for every precursor in a wide m/z range (e.g. 500 – 900 m/z) by acquiring a repeated cycle of wide isolation window MS/MS scans using a full scan mass analyzer. With the DIA approach, MS/MS data for quantitation can be extracted for any query peptide within a wide precursor m/z range thus enabling highly multiplexed assays at the expense of selectivity compared to SRM. Additionally, DIA can be used as a tool for early development of SRM assays by assessing peptide stability, digestion kinetics, and linearity in a highly-multiplexed manner (Egertson, MSACL 2016). While the acquisition of DIA data is relatively straightforward, analyzing the data in an accurate, reproducible, and repeatable manner remains a challenge.

-- Automated Processing of DIA Data --

Automated processing of DIA data is a necessity. As a demonstration, we present application of a DIA workflow analyzing human brain tissue in an Alzheimer’s disease cohort. Each 90 minute LC-MS/MS run results in measurements for ~50,000 peptides. While manual peak integration and validation is an option for many targeted-MS clinical assays, the sheer quantity of data in a DIA run makes this intractable. Furthermore, automated processing is expected to be more reproducible and repeatable. We present our workflow for automated analysis of DIA data consisting of the following steps:

Library Generation using PECAN:

DIA data is acquired on a pooled sample of representative tissue (multiple brain samples) using twelve injections which collectively measure peptide precursors between 400 and 1200 m/z (Egertson JD, MSACL 2016). To obtain maximal sensitivity in this step, each injection measures a 50 m/z precursor range with a repeated cycle of a 50 m/z wide MS scan and 25 adjacent 2 m/z wide MS/MS scans. An in-house search tool (Pecan, Ting, YS, manuscript in preparation) is used to detect peptides in these analyses using a peptide-centric approach (1). The extracted MS/MS chromatograms and normalized retention time for each detected peptide is stored in a chromatogram library.

Peptide Detection using EncyclopeDIA:

DIA data on individual brain samples is acquired using a single DIA LC-MS/MS run with wide isolation window (20 m/z) MS/MS scans. The data are queried for all peptides in the library using EncyclopeDIA (Searle, BC, manuscript in preparation). A false discovery rate is calculated using target-decoy statistics and all peptides detected at q<0.01 are reported. For each detected peptide, transitions with interference are removed, and integration boundaries determined using a non-parametric approach.

Outlier Removal:

Even with a tight false discovery rate enforced for peptide detection, there is an expectation that sometimes a peptide sequence will be incorrectly assigned to a chromatographic peak. Such outliers are detected by comparing peaks assignments for each peptide across all runs.

Integration Boundary Imputation:

If a peptide is not detected in a run (but is detected in others), the quantity for that peptide can be “imputed” by non-parametric retention time alignment of peak boundaries from another run where the peptide was detected.

Transition Selection

Transitions with interference were removed in each individual file. The set of transitions to be used for quantification in all runs is determined by a voting algorithm that selects the top N transitions based on the number of times the transitions occurred without interference and the rank intensity of the transition.

We assess the impact of each step of the pipeline on the precision of peptide quantification in a “5x5” experiment (2) measuring inter and intra-assay reproducibility with a comparison to data analyzed manually by human operators.

-- Building a Robust Pipeline --

Quality Control / Visualization:

Each step of the data processing pipeline has a “quality control” summary statistic and visual for quick detection of errors in data processing.

Repeatability:

While many data processing pipelines are automated, not all are repeatable. Note – in this case we are referring to “repeatability” of a compute pipeline, rather than assay repeatability. Each step of the data processing pipeline is containerized and versioned. Containerization ensures that each individual data processing step is repeatable because it is run in an isolated computational environment that is re-initialized every time it is run. Without containerization, processing steps are vulnerable to unexpected changes such as software updates since the last time the pipeline was run. Because each step is separately containerized, it is easy to remove, or change individual steps by swapping in a different container. When the processing pipeline is used for a dataset, the version number of all components of the pipeline are stored. This approach allows for flexibility without sacrificing repeatability.

Transferability:

The pipeline is designed to be easily transferred to enable the exact same computational steps to be applied to different datasets at a different site.

-- Application to an Alzheimer’s Disease Cohort --

We demonstrate the pipeline with application to a study human brain autopsy samples from an Alzheimer’s disease cohort.

References & Acknowledgements:

(1) Ting YS, et. al., Mol. Cell. Proteomics 2015

(2) Grant RP, Hoofnagle AN, Clinical Chemistry 2014

Financial Disclosure

Description	Y/N	Source
Grants	no
Salary	no
Board Member	no
Stock	no
Expenses	no

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above:
no