MSACL 2018 US Abstract

Topic: Data Science

High-Throughput Mass Spectrometry Coupled to Machine Learning Algorithms to Enable Accelerated Biomarker Discovery

Jacques Corbeil (Presenter)
Laval University

Bio: Dr. Jacques Corbeil focuses on using the latest techniques in bioinformatics and machine learning to assist diagnostics, to facilitate prognostics and to optimize the response to treatment. Dr. Corbeil is using state-of-the-art instrumentation and big data analytics to facilitate the interpretation of complex phenotypic data. Dr. Corbeil’s research includes investigating how infectious microorganisms interact with their hosts, examining the effects of antibiotics on our microbial flora, exploring how to design small molecules and drugs to interfere with specific microbial functions and integrating omic data. Dr. Corbeil operates at the interface of computers and omic sciences.

Authorship: Pier-Luc Plante (1,2), Julie Carbonneau (1), Marie-Ève Hamelin(1), Guy Boivin(1), François Laviolette(2) and Jacques Corbeil(1,2)
(1) Laval university, Infectious diseases research center and (2) Laval University, Big Data Research Center

Short Abstract

We developed a very efficient methodology coupling high throughput mass spectrometry and machine learning algorithms to reliably identify the minimum set of biomarkers (metabolites) that can be used as classifiers or predictors for specific phenotypes of interest.

Long Abstract


Respiratory viral infections represent a significant cause of morbidity and mortality and constitute one of the top reasons why individuals visit a hospital emergency room every year. Influenza viruses are of particular medical importance due to their significant burden on the healthcare system and their propensity to affect seniors. Rapid identification of infected patients and those at risk for severe infection is a critical unmet clinical need, as prompt interventions such as antiviral therapy for influenza can improve outcomes. We generated a classifier that is predictive of influenza infection. We are now aiming to develop a predictor in conjunction with clinical parameters to stratify the risk of viral disease severity.


We extracted the metabolites from nasopharyngeal swabs (NPS) using a salt-assisted liquid-liquid extraction procedure to select small polar metabolites. First, we precipitate proteins with acetonitrile and create two liquid phases by adding a water solution saturated with salt (NaCl) followed by vortexing and centrifugation. The upper phase (organic) contains most of the polar metabolites. The main advantage of this method is its simplicity, rapidity and low cost as it requires only a few microliters of solvent. Moreover, it does not require excessive dilution of the sample, which would result in decreased sensitivity in the detection of metabolites. Two µl of the metabolite extract was deposited unto a 96-Lazwell plate (Phytronix). Once dried on the metallic surface, the plate is placed in the laser diode thermal desorption (LDTD) ion source. The metabolites are analyzed by mass spectrometry (MS) using positive and negative ionization on a high-resolution mass spectrometer (Synapt G2Si, Waters). This method can analyze a sample very quickly, typically in less than 15 seconds and provides quantitative and qualitative information about a vast panel of small molecules (less than 4000 daltons) in the form of a mass spectrum.


Using our metabolomic approach, we successfully differentiated NPS samples obtained from patients infected with influenza A (96 samples) from patients with influenza-like symptoms but negative for 15 respiratory viruses including influenza by RT-PCR (96 samples). The mass spectra were corrected and aligned using the virtual lock mass (VLM) algorithm. We trained multiple machine learning models with the decision tree algorithm (N=302) by randomly assigning 80% of samples to the training set and 20% of samples to a validation set. These models correctly classified influenza-positive and -negative samples with an average accuracy of 82% (+/- 5%) based on validation samples only, which were not used to train the algorithm. Sparse models using as few as 5-8 peaks (metabolites) were sufficient to achieve this high level of classification. A learning curve demonstrated that we would achieve 90% accuracy with approximately 550 samples.

Conclusions & Discussion

Our approach allows to rapidly identify a minimum amount of features that can classify and potentially predict a phenotype of interest. Moreover, these features can be added to other parameters to improve and make the predictors and classifiers even more robust.

References & Acknowledgements:

Financial Disclosure

GrantsyesWaters and phytronix
Board MemberyesCompute Canada

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above: