MSACL 2016 US Abstract

A Workflow for Drug Discovery from Environmental Samples Using Molecular Networks

Stefano Bonissone (Presenter)
Digital Proteomics LLC

Authorship: Stefano Bonissone (1), Natalie Castellana (1)
(1) Digital Proteomics LLC, San Diego, CA

Short Abstract

The problem of identifying non-reference encoded biomolecules using tandem mass spectrometry can be avoided by comparing spectral similarities and building a network based on these similarities. Nodes in such a network represent spectra, while edges connect similar spectra with small mass differences. The spectral network approach allows the identification of new molecules by creating a network containing labeled spectra. These labeled nodes allow the determination of similar molecules occurring as proximal nodes in the graph. We demonstrate this molecular network approach using an environmental sample, utilizing spectral libraries from known natural product compounds to find new, similar, candidates from the environment.

Long Abstract

Tandem mass spectrometry is a powerful tool for identifying biomolecules in a sample. However, for most types of molecules, a reference library of annotated mass spectra or good computational models for fragmentation are needed. For many environmental samples, the set of reference libraries is small and woefully incomplete.

The problem of an incomplete reference can be mitigated by using the reference library to not only identify identical molecules, but to identify molecules that are ‘related’ to reference spectra. In our workflow, we use spectral similarity as a proxy for molecular similarity. For example, if a peptide contains a mutation, many of the fragment ions will be identical to the non-mutated version while the fragment ions containing the mutation will be shifted by exactly the mass of the amino acid difference. In this way, we can find modification or mutation variants for reference molecules in our environmental samples.

We perform an all to all comparison of spectra in the sample, computing a similarity score between each pair of spectra. After applying a similarity score cutoff, we visualize the space of related molecules by constructing a network. Nodes in this network represent spectra, while edges connect similar spectra. Clusters of connected nodes in the network represent families of molecules, and each member has a unique chemical composition.

The molecular network approach allows the identification of new molecules by creating a network containing some spectra that are labeled. These labels can be obtained by the creation of spectral libraries, i.e., when spectra from known molecules are created and identified. Additionally, spectra from database searches can also be used.

We demonstrate this molecular network approach using an environmental sample, utilizing spectral libraries from 535 FDA natural product compounds. Such an experimental setup allows for the spectral library to elucidate similar molecules by proximity to labeled nodes within graph. Unlabeled proximal nodes to a labeled node can be elucidated by propagating information along the edges of the molecular network. Discovery in such an environmental sample, with known labeled natural product compounds, provides us with the potential to identify useful therapeutic molecules.


References & Acknowledgements:


Financial Disclosure

DescriptionY/NSource
Grantsno
Salaryyes Digital Proteomics LLC
Board Memberno
Stockno
Expensesno

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above:

yes