MSACL 2018 US Abstract

Topic: Data Science

A Systematic Approach to Data-Independent Acquisition-MS for Plasma-based Proteomics Analyses

Shenyan Zhang (Presenter)
Cedars Sinai Medical Center

Bio: Dr. Shenyan Zhang is a postdoctoral scientist in Cedars Sinai Medical Center (CSMC), Los Angeles. She got her bachelor degree of Applied Chemistry in Tianjin University, ChinaCthen she joined Beijing Institute of Genomics, Chinese Academy of Science where she received her PhD in biochemistry and molecular biology. Now she worked at Heart Institute of CSMC, and main research area is clinical proteomics, including high throughput quantitation of protein biomarkers by MRM and SWATH and intact protein analysis in plasma.

Authorship: Shenyan Zhang, Qin Fu, Vidya Venkatraman, Ronald Holewinski, Mitra Mastali, Koen Raedschelders, Jennifer Van Eyk
Cedars Sinai Medical Center, Los Angeles, CA

Short Abstract

Plasma is one of the most common matrices for clinical proteomics analyses. The advent of data-independent acquisition(DIA) mass spectrometry has enabled the simultaneous quantitative analysis of hundreds of proteins in plasma samples in support population and disease studies. Unlike MRM workflows that target a constrained number of peptides for accurate quantitation, DIA workflows cannot be simultaneously tailored to suit all plasma protein subsets. We sought to identify the subset of proteins that are most amenable to quantitative analysis by DIA-MS from depleted- and undepleted-plasma samples, respectively. In the process, we built a QC strategy for DIA-MS that monitors the digestion, desalting, and LC-MS steps inherent to plasma proteomics workflows. This strategy serves as a useful guide for DIA-MS analysis of large cohort studies.

Long Abstract


The advent of data-independent acquisition(DIA) mass spectrometry has enabled the simultaneous quantitative analysis of hundreds of proteins in plasma samples in support population and disease studies. Although plasma is one of the most common matrices for clinical proteomics analyses, its broad dynamic range and its inherent complexity continue to represent significant challenges to discovery proteomics workflows. In contrast to MRM/SRM approaches that target a constrained number of peptides for accurate quantitation, DIA workflows cannot be simultaneously tailored to suit all plasma protein subsets. Our work sought to define preparative and acquisition parameters that maximizing proteome coverage and quantitative accuracy from depleted- and undepleted plasma samples, respectively. In this process, we developed a parallel preparation strategy in which samples are divided into two aliquots: the first aliquot was shunted to IgY14-based depletion and desalting to target less-abundant candidates that minimally interact with high abundance matrix proteins; the second aliquot remained undepleted to preserve the quantitative accuracy among candidates that more extensively interact with high-abundance matrix proteins (such as albumins). These experiments yielded a reproducible workflow with quantitative results for over 700 plasma proteins.


Human plasma spiked with equal amounts of ƒÀ-galactosidase (b-gal) (Sigma) was depleted using a Seppro IgY14 column (Sigma). Depleted plasma samples were desalted upstream of MS analysis in three different ways; 1. peptide desalting by SPE. 2. protein desalting by LC. 3. 2-step protein Zeba spin column desalting (Thermo) followed by HLB peptide desalting (Waters). Individual sample preparation steps were evaluated for quality control using BCA assay (Pierce) for depletion efficiency; MRM for desalting recovery and targeted digestion efficiency (albumin, b-gal, ApoE); and DIA-MS for overall digestion efficiency. The undepleted and depleted plasma samples were reduced, alkylated, and trypsin digested in parallel. Internal retention time standards were added before MS analysis for LC calibration. All DIA data were acquired on a TripleTOF 6600+ MS (Sciex) with variable window settings. A publically available plasma ion library was use for DIA data analysis. DIA data was processed by OpenSWATH and MapDIA. For digestion evaluation, the DIA data was extracted and converted to mgf by DIA-Umpire then searched by Proteinpilot (Sciex) in thorough mode.


To evaluate plasma depletion procedure, we used commercial plasma spiked with ƒÀ-galactosidase as QC samples. Protein quantification results showed that 91.4% protein was depleted with CV of 7.2% (n=5). Moreover, Human serum albumin and b-gal were selected as markers for high abundant protein depletion and low abundant recovery, respectively. MRM of albumin and b-gal indicated an overall albumin depletion efficiency of 99.6% with a CV of 3.4% (n=5), while overall b-gal recovery, including the digestion procedure, was 72.7% with a CV of 2.2% (n=5).

We evaluated 3 different desalting strategies for depleted plasma; 1. Solid phase peptide extraction after in-solution digestion. 2. Simultaneous LC-based protein desalting and enrichment upstream of digestion using a C18 column. 3. Size exclusion spin-column based desalting followed by in-solution digestion and successive peptide enrichment/desalting with solid phase peptide extraction. We identified 564 proteins in undepleted plasma, and between 600-700 proteins in depleted plasma depending on desalting strategies. Among these strategies, LC-desalting provided the most comprehensive proteome coverage, with 909 plasma proteins identified in total, thus outperforming the SPE-peptide desalting strategy by 77 identified proteins.

Due to the fundamental difference between depleted and undepleted plasma compositions, we optimized DIA acquisition parameters of dwell time and number of windows for each duty cycle to maximize feature identification and quantification. For undepleted plasma, the number of identified proteins were similar with different window settings, but the combination of 100 variable windows and 30ms dwell time generated the most peptides (1832 peptides; CV<20%; n=3). In the case of depleted plasma, a combination of 50 variable windows and a 60ms dwell time resulted in the largest number of identified peptides (3803) as well as the most quantifiable peptides (2914 peptides; CV<20%; n=3).

Sample preparation and instrument status are fundamental to data quality in large cohort studies. For this reason, comprehensive QC protocols are critical for long-term experimental integrity. Aside from BCA and MRM checkpoints for the sample preparation QC, we also established a post-acquisition QC strategy that proved suitable for both depleted and undepleted plasma. Since traditional ion library-dependent data analysis is limited to the extraction of its own constituent peptides, it is unable to identify aberrantly digested- or modified-proteins that can arise from problematic sample preparation. Conversely, ion-library independent approaches such as DIA-umpire rely on synthetic retention time standards, and these are often insufficient during low signal extraction. For these reasons, we identified and validated a set of 286 endogenous plasma peptides from 70 acquisitions that can serve as reliable internal retention time standards in both depleted and undepleted plasma. These assembled spectra show good coverage of non-tryptic or semi-tryptic peptides that are inconsistently represented in ion libraries. Ultimately, the proportion of aberrantly digested- to correct-peptides reflected experimentally altered digestion conditions during QC validation.

In this study we performed high-throughput sample preparation using a 96-well plate-based automated liquid handling system. B-gal was spiked into every sample as an exogenous QC while albumin was used as endogenous QC, the CV of b-gal and albumin in 128 plasma samples were 16.0% and 23.2%. Our plate layout contained 8 separate and non-sequential QC wells containing QC plasma and b-gal dispersed across the 96 well plate. We extracted albumin and b-gal from these wells to calculate the intra-plate CVfs of 2.5% and 3.5%, respectively (n=8). Moreover, peptides of pooled QC plasma digests were aliquoted, dried down, and stored in -80oC as batch QC standards. These aliquots were acquired before and after every batch of 11 patients for QC monitoring of the LC-MS. The CVs of albumin and b-gal across 14 Batches were 4.3% and 4.7% respectively (n=28).

Conclusions & Discussion

The combination of both depleted and undepleted plasma samples for DIA-MS results in improved plasma proteome coverage and quantitation. Twinning depleted and undepleted workflows not only enabled us to define those proteins that were uniquely identified in a given preparation, but also informed which preparative strategy was quantitatively superior for those proteins that were consistently identified. We established a comprehensive QC system compatible with DIA-based plasma proteomics, and applied this system throughout a large cohort study.

References & Acknowledgements:

1. Tsou, C.C., et al., DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods, 2015. 12(3): p. 258-64, 7 p following 264.

2. Rost, H.L., et al., OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol, 2014. 32(3): p. 219-23.

3. Liu, Y., et al., Quantitative variability of 342 plasma proteins in a human twin population. Mol Syst Biol, 2015. 11(1): p. 786.

Financial Disclosure

Board Memberno

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above: