= Emerging. More than 5 years before clinical availability.
= Expected to be clinically available in 1 to 4 years.
= Clinically available now.
MSACL 2018 EU : Jauffrit

MSACL 2018 EU Abstract

Topic: Microbiology

Deciphering MALDI-TOF Identification of Bacteria Using a Proteogenomic Approach

Frédéric Jauffrit (Presenter)
Université de Lyon

Presenter Bio: Computer scientist by training, my thirst for challenging data science applications lead me to join the bioinformatics field.
I am currently at the tail end of my PhD program, where I combined phylogenomics and proteogenomics to unlock insight into bacterial identification by MALDI-TOF. This PhD is a colaboration between the LBBE and bioMérieux.
I developed the RiboDB database, a public resource for ribosomal proteins designed to help phylogeneticists resolve the evolutionary history of bacterial and archaeal species, from subspecies to phylum level.
Further, my work focused on adding knowledge to MALDI-TOF spectra as a means to better understand why MALDI-TOF enable bacterial identification. The finality of my work is to establish a link between proteomics and genomics for bacterial identification.

Authors: Frédéric Jauffrit (1,2), Corinne Beaulieu (1), Pierre-Jean Cotte-Pattat (3), Victoria Girard (3), Martin Welker (1), Céline Brochier-Armanet (2), Jean-Pierre Flandrois (2), Jean-Philippe Charrier (1)
(1) Microbiology Research Department, Marcy l’Etoile, bioMérieux S.A., France (2) Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France (3) R&D Microbiology, bioMérieux S.A., La Balme Les Grottes, France

Short Abstract

Ribosomal and non-ribosomal proteins producing to peaks in WC-MALDI-TOF MS spectra were identified using proteogenomics and, as a case study, the VITEK® MS calibrant strain, Escherichia coli ATCC8739.
Protein expression was evidenced using LC-MS/MS data and a 6 frame translation of the entire genome. ORFs were inferred and MALDI-TOF peaks were assigned to possible protein m/z according to gene sequences, PTMs and charge states.
This method, applied on 832 E. coli spectra, allowed the identification of all major MALDI-TOF peaks (56 peaks above 3% relative intensity) and more deeply of 424 proteins corresponding to 649 peaks.

Long Abstract

Introduction

Whole-cell matrix-assisted laser desorption/ionization mass spectrometry (WC-MALDI-TOF MS) can provide fast and accurate identification of bacterial and fungal species. However, the way this technology distinguishes species is not fully understood. Ribosomal proteins have been observed as important biomarkers [1]. However, not all observed mass peaks originate from ribosomal proteins [2] and several studies have recently provided new identifications of proteins present on MALDI-TOF spectra [3,4]. Using the VITEK® MS calibrant strain (Escherichia coli ATCC8739) as a case study, the present study highlights common pitfalls and shows how spectra annotation can benefit from recent [5,6] and original developments in proteogenomics.

Methods

LC-MS/MS data was used to find evidence of peptide expression independently of conventional protein annotation methods. The 6 frame translation of the complete genome sequence was used as a pseudo-protein database. Peptide spectral matches (PSM) were identified using MS-GF+ [7]. A dedicated class-specific validation method was designed to minimize false positive PSMs due to the database inflation. Novel peptides were validated using several descriptors of PSM quality: precursor mass accuracy, precursor charge state, precursor isotope patterns, percentage of explained signal, peptide fragmentation coverage… Furthermore, to account for unpredicted post-translational modifications (PTM), the specificity of the covered peptide sequence was used to search for alternative peptides using an “open search” strategy. This refined approach was compared with the false discovery rate (FDR) calculation using a standard target/decoy approach.

Open Reading Frames (ORFs) were then inferred from the expansion of the identified peptide sequences to build the set of protein candidates. Theoretical protein m/z were calculated according to amino acid sequences, potential post-translational modifications (PTMs) and charge states. Mass candidates were filtered based on the protein expression as observed by LC-MS/MS and eventually matched to peaks from WC-MALDI-TOF spectra.

This method was applied on a data set of 832 MALDI-TOF spectra of E. coli ATCC8739, internally calibrated using ribosomal protein masses. Multiple samples of individual peaks was used to further improve mass accuracy and add confidence to peaks interpretations. Remaining ambiguities were solved using quantitative data extracted from the LC-MS/MS dataset.

Results

1391 known proteins were identified by LC-MS/MS. 21 PSMs corresponding to 9 novel peptides were able to satisfy all criteria of the novel PSM validation method. All these peptides belonged to supposedly non-coding regions. All but two of these peptides could be related to ORFs coding for 4 proteins. Further investigation showed that ORFs could be recovered around the two other novel peptides when accounting for the correction of a repeated region and a frameshift, respectively. The use of LC-MS/MS greatly reduced the number of candidate proteins for MALDI-TOF peaks assignment. Furthermore, improved mass accuracy provided by internal spectra calibration and peak m/z averaging helped reducing the number of ambiguous cases.

Evidence of the presence of 81 proteins corresponding to 196 peaks was confidently inferred. All major MALDI-TOF peaks (56 peaks above 3% relative intensity) could be assigned to these proteins. 319 less intense peaks, stochastically visible according to the quality of the spectra, were also unambiguously identified, as well as 134 peaks that could come from more than 1 protein. The latter were subsequently disambiguated using quantitation from LC-MS/MS and 112 additional proteins were identified.

Conclusions & Discussion

The possibility to identify bacterial proteins peaks in MALDI-TOF spectra was demonstrated in deep using E. coli ATCC8739 strain as a case study. As a result, molecular information was provided for hundreds of small and abundant proteins. It paves the way for a low cost, fast, simple, however limited, molecular characterization of microorganisms thanks to identification of protein biomarkers.

As a side consequence, it has been shown that proteogenomics can be a tool to improve the quality of genomic data by annotating proteins without the use of gene models. It is of particular importance for small proteins that are often not annotated, including for model organism. This study also demonstrated the potential of proteomics to allow the correction of sequencing errors.


References & Acknowledgements:

[1] Arnold, R. J., & Reilly, J. P. (1999). Observation of Escherichia coli Ribosomal Proteins and Their Posttranslational Modifications by Mass Spectrometry. Analytical Biochemistry, 269(1), 105–112. https://doi.org/10.1006/abio.1998.3077

[2] Welker, M., & Moore, E. R. B. (2011). Applications of whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry in systematic microbiology. Systematic and Applied Microbiology, 34(1), 2–11. https://doi.org/10.1016/j.syapm.2010.11.013

[3] Armengaud, J. (2017). Defining Diagnostic Biomarkers Using Shotgun Proteomics and MALDI-TOF Mass Spectrometry. In Diagnostic Bacteriology, Methods in molecular biology 1616 (pp. 107-120). Springer

[4] Fagerquist, C. K. (2017). Unlocking the proteomic information encoded in MALDI-TOF-MS data used for microbial identification and characterization. Expert Review of Proteomics, 14(1), 97–107. https://doi.org/10.1080/14789450.2017.1260451

[5] Armengaud, J. (2017). Proteogenomics of Non-model Microorganisms. In MALDI-TOF and Tandem MS for Clinical Microbiology (pp. 529–538). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118960226.ch20

[6] Zhu, Y., Orre, L. M., Johansson, H. J., Huss, M., Boekel, J., Vesterlund, M., … Lehtiö, J. (2018). Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nature Communications, 9(1), 903. https://doi.org/10.1038/s41467-018-03311-y

[7] Kim, S., & Pevzner, P. a. (2014). MS-GF+ makes progress towards a universal database search tool for proteomics. Nature Communications, 5, 5277. https://doi.org/10.1038/ncomms6277


Financial Disclosure

DescriptionY/NSource
Grantsno
SalaryyesEZUS Lyon, university of Lyon
Board Memberno
Stockno
Expensesno

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above:

yes