MSACL 2016 EU Abstract

Quick Survey of Complex Microbiota by Tandem Mass Spectrometry

Jean Armengaud (Presenter)
CEA-Li2D

Bio: Jean Armengaud is an international expert in multi-OMICs approaches. He received his PhD in Biochemistry from the University of Grenoble, France (1994). He spent six years as a postdoctoral fellow at the Department of Microbiology, Helmholtz Centre for Infection Research, Braunschweig, Germany and the Structural Biology Institute, Grenoble, France. In 2000, he joined the French Alternative Energies and Atomic Energy Commission (CEA) for developing mass spectrometry in protein science, microbiology, and CBRN-E counter-measures. He is best known for his work on high-throughput proteomics and proteogenomics of bacteria, archaea, viruses, algae and animals, as well as the characterization of radiotolerant organisms and pathogens. His research unit located near Avignon, France, is dedicated to the development of innovative technologies for the detection of pathogens and toxins.

Authorship: Olivier Pible, Charlotte Mappa, Gérard Steinmetz, Guylaine Miotello, François Allain, Béatrice Alpha-Bazin, Jean Armengaud
CEA-Marcoule, DRF/IBITEC-S/SPI/Li2D, Laboratory “Innovative technologies for Detection and Diagnostic”, BP 17171, F-30200 Bagnols-sur-Cèze, France

Short Abstract

Tandem MS is a powerful tool to identify peptide sequences. The most abundant organisms from complex microbiota can be identified based on the detected peptides. We have developed several tools for achieving a quick survey of complex microbiota by tandem MS. First, generalist databases used to interpret the data acquired by tandem MS should be corrected for numerous errors. Indeed, the quality of genome, protein sequence, and taxonomy databases can critically impact data interpretation. A software program has been designed to extract phylogenetic information from the peptide sequences established by high resolution tandem mass spectrometry. Furthermore, a metaproteomic reference dataset has been proposed to assess the quality of the whole analytical pipeline. Several examples of microbiota surveys established within a 1 hour experimental MS setup only will be presented.

Long Abstract

• Introduction

Next-Generation Sequencing has considerably changed our vision of medical microbiology, especially revealing the importance of human microbiota. While RNA16S amplification and sequencing, metagenomics / metatranscriptomics, and culturomics, are comprehensive approaches, a need for quicker surveys is important for diagnostics. Metaproteomics, consisting in the shotgun analysis of the whole proteome content by tandem mass spectrometry, has been proposed as an alternative for identifying the main microbial players and characterizing the active metabolic pathways. Several studies have highlighted its potential to follow the colonization of the infant gastrointestinal tract and propose novel markers of dysfunctions. The potential clinical impact of proteomic interrogation of the gut microbiota could be important if personalized metaproteomics is able to become a routine and accessible methodology. We have developed several tools for achieving a quick survey of complex microbiota by means of high-throughput peptide identification by tandem mass spectrometry: generalist database corrections, bioinformatics approaches to extract phylogenetic information from the peptide sequences established by high resolution tandem mass spectrometry, and a metaproteomic reference dataset acquired on a laboratory-assembled microbial defined mixture for assessing the quality of the analytical pipeline.

• Methods

Complex microbiota surveys comprise typically a protein extraction step based on beads beating. After trypsin proteolysis, the resulting peptides are resolved using a 60-min gradient and analyzed by nanoLC-MS/MS on either a LTQ-Orbitrap XL hybrid mass spectrometer (Thermo) or a Q-Exactive HF mass spectrometer (Thermo) equipped with an ultra-high field orbitrap analyzer. A modified version of the NCBInr database is used for peptide inference, taxonomical attribution and phylotyping processing. Peptide assignation is performed with the MASCOT search engine. A post-process pipeline written in Python is then applied to obtain spectra to taxa assignations, then identification and relative quantitation of main organisms. The metaproteomic reference dataset has been recorded on an equalized mixture of peptides prepared from 24 different bacteria chosen from 20 different genera belonging to 5 representative bacterial phyla.

• Results

Not all entries in a nucleotide or protein sequence database are of equal quality, thus posing serious doubts on meta-omics results (Pible & Armengaud, 2015). We have shown that cross-contamination of genome sequences is underestimated while frequent. For example, we were intrigued by the genome of the cucumber, Cucumis sativus L, an economically important crop, which was systematically appearing in metaproteomic analysis of samples comprising Enterobacter spp. We proved by systematic sequence similarity searches that the cucumber genome was contaminated with short genome fragments from Enterobacter bacteria, and thus should be corrected or at least annotated in public databases. We also detected some important inconsistencies in the description of higher taxonomical ranks of several microorganisms, such as is the case for the Chlorophyta phylum. Therefore, we propose prior conducting in-depth metaproteomics analysis to invest some efforts at correcting generalist databases.

A python post-process pipeline has been developed to extract phylogenetic information from the peptide sequences established by high resolution tandem mass spectrometry. The determination of the organisms present in a given sample is based on the examination of MS/MS spectra pertaining to taxa at a given taxonomical level, starting at the higher taxonomical rank (superkingdom level) and then gradually treating lower taxonomical levels till the species level. Spectra from validated taxa are excluded and the next best taxon is searched for until each clade is populated with a number of spectra coherent with the higher taxonomical level. Specific spectra are also used for taxa validation or confidence assessment. We proposed to roughly quantify the different organisms present based on the number of peptide-to-taxa assignations, once all organisms have been confidently established. The python procedure allows processing in less than 15 min a dataset comprising 40,000 MS/MS spectra typically acquired along a 60 min gradient with a Q-Exactive HF instrument.

The metaproteomic reference dataset comprised MS/MS spectra belonging to peptide sequences extracted from 24 different bacteria. These microorganisms have been chosen for covering 20 different genera belonging to 5 representative bacterial phyla (Firmicutes, Actinobacteria, Bacteroidetes, Proteobacteria, and Deinococcus-Thermus). This dataset allows assessing the confidence level of any metaproteomics pipeline in terms of taxonomical typing on the first hand, and quantitation of the different items at several taxonomical ranks on the other.

Several examples of microbiota surveys established within a 1 hour experimental MS setup only will be presented. Fecal microbiotas or fungi-bacteria mixed biofilms are routinely assessed without the need of any subculture or any time-consuming sample preparation.

• Conclusions

The different tools developed in our research team allow a quick survey of complex microbiota by means of high-throughput peptide identification by tandem mass spectrometry. Due to the continuous increase of generalist databases, metaproteomics approaches are applicable whatever the sample origin. Our quick survey of complex microbiota could also benefit from the improvement of tandem mass spectrometry in the next years, as speed and sensitivity of instruments may be increased.


References & Acknowledgements:

Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J (2016) Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics. 13(2):185-99.

Pible O, Armengaud J (2015) Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0. Proteomics. 15(20):3418-23.

Armengaud J (2016) Next-generation proteomics faces new challenges in environmental biotechnology. Curr Opin Biotechnol. 38:174-82.


Financial Disclosure

DescriptionY/NSource
Grantsno
Salaryno
Board MemberyesMSACL EU
Stockno
Expensesno

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above:

no