MSACL 2015 EU Abstract

Molecular Annotation and Mapping of Big Data from Spatial Metabolomics
Theodore Alexandrov
European Molecular Biology Laboratory

Theodore Alexandrov (1-3)
(1) European Molecular Biology Laboratory, (2) SCiLS GmbH, (3) Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego

Short Abstract

Spatial metabolomics is emerging as a powerful approach to localize hundreds of metabolites directly from sections of biological samples with the grand challenge to be in the molecular annotation of big data generated. Existing bioinformatics tools cannot be applied directly because of the sheer data size and high complexity of spectra. We developed algorithms for molecular annotation for High Resolution Imaging Mass Spectrometry that integrates both spectral and spatial filters and map the results onto metabolic pathways. We will present our efficient implementation using modern big data technologies and apply it to 3D cell spheroids, microbial agar plates, and biological tissues. We will present the European project METASPACE on Bioinformatics for Spatial Metabolomics.

Long Abstract

Metabolomics is recognized as a crucial scientific domain, promising to advance our understanding of cell biology, physiology, and medicine. Metabolomics complements genomics, transcriptomics, and proteomics by analyzing the final read-out of biochemical processes and by revealing the contributions of non-genetic factors, such as the environment, diet, or microbiome. In the last years, metabolomics progressed from simply cataloguing chemical structures to answering complex biomedical questions and enabling new discoveries in life science. This progress has been made largely in the analysis of liquid samples, so the next frontier for metabolomics now lies in spatial metabolomics, where the challenge is to localize hundreds of metabolites directly from sections of biological samples with cellular and sub-cellular spatial resolution.

The two major challenges to establish spatial metabolomics are: molecular annotation, and mapping of annotated molecules back into the spatial context and onto existing metabolomics knowledge bases.

We will present our recent and unpublished results on solving these challenges when using High mass Resolution (HR) imaging Mass Spectrometry (MS). This emerging technique of spatial metabolomics is able to resolve signals from molecules differing by 0.01 mass units. This, together with the high dynamic range of the technique (1e4-1e5), small pixel size (down to 5 um), and high sample complexity when performing in situ analysis, generates unprecedented amount of information from a sample. A dataset usually reaches 10-100 gigabytes in size and contains millions of mass spectrometry signals.

Existing bioinformatics tools developed for molecular annotation of other types of metabolomics MS (using chromatography, direct-infusion, or dried-droplet approaches) can be hardly applied to imaging MS because of the sheer size of the data generated and high complexity of spectra. Moreover, metabolomics MS is based on using MS/MS for molecular identification whereas imaging MS delivers mostly MS1 data that creates a need for novel methods to improve MS1-based molecular annotation.

Recently, we have developed an algorithmic pipeline for molecular annotation of big data generated by HR imaging MS. This pipeline integrates both existing spectral filters and newly-developed spatial filters and combinations of those. We will show that using spatial information, such as relations between spectra from neighbor pixels, allowed us to significantly reduce the number of false positives and improve the reliability of molecular annotation. We will present our efficient implementation of this algorithmic pipeline performed using modern big data technologies, including framework for computing in distributed memory Apache Spark and cloud service Amazon Web Services. We will illustrate this approach by showing results of application to 3D spheroids from cell lines and primary cells, microbial agar plates, and biological tissues.

In the second part, we will show recent advances in mapping the annotated molecules back into spatial context and onto molecular knowledge bases. We will show how big and complex HR imaging MS data can be translated into molecular images and how it can be mapped onto metabolic pathways. Finally, we will present a new European project METASPACE on Bioinformatics on Spatial Metabolomics starting in July 2015 that aims to create an open online engine for spatial metabolomics.