= Emerging. More than 5 years before clinical availability. (26.62%)
= Expected to be clinically available in 1 to 4 years. (38.91%)
= Clinically available now. (34.47%)
MSACL 2020 US : Chopra

MSACL 2020 US Abstract

Topic: Data Science

Podium Presentation in Room 5 on Thursday at 11:40 (Chair: Irene van den Broek)

Deep Learning for Prediction and Prospective Validation of Functional Groups for Autonomous Instrumentation

Gaurav Chopra (Presenter)
Purdue University

Presenter Bio(s): Gaurav Chopra is an Assistant Professor in the Department of Chemistry at Purdue University and a member of the Purdue Center for Cancer Research and the Purdue Institutes of Data Science, Drug Discovery, Integrative Neuroscience, and Immunology. His laboratory brings together chemistry, immunology, and machine learning to study chemical environments from atomic to molecular to cellular scales. Chopra obtained his Ph.D. in computational structural chemistry/biology with Dr. Michael Levitt (2013 Chemistry Nobel Laureate) at Stanford University and was a JDRF fellow in experimental immunology with Dr. Jeffrey A. Bluestone (University of California, San Francisco). Chopra is the recipient of NIH NCATS ASPIRE Awards, Showalter Trust Award, and his lab is funded by NIH, NSF and the Department of Defense, Merck & Co, and Indiana Biosciences Research Institute.

Authors: Gaurav Chopra
Department of Chemistry, Purdue University



Functional groups are central to chemistry linking analytical, physical, organic, and materials chemistry. State-of-the-art identification of the functional groups present in an unknown chemical entity requires expertise of a skilled spectroscopist to analyze and interpret Fourier Transform Infra-Red (FTIR), Mass Spectroscopy (MS) and/or Nuclear Magnetic Resonance (NMR) data. This process can be time-consuming and error-prone, especially for complex chemical entities that poorly characterized in the literature, or inefficient to use with synthetic robots producing and testing new molecules at an accelerated rate.


We introduce fast deep neural network architectures for accurately identifying all the functional groups of unknown compounds. We do not use any database, pre-established rules, procedures, or peak-matching methods for prediction. Instead, we derive patterns and correlations using chemical reactions available from publicly available FTIR and MS spectral data.


We introduce two new metrics (Molecular F1 score and Molecular Perfection rate) to measure the performance of identifying all functional groups on molecules for practical use. The optimized model has a Molecular F1 score of 0.92 and a Molecular Perfection rate of 72%. Additionally, our trained neural network reveals patterns typically used by human chemists to identify standard groups. Next, we experimentally validated our neural network, trained on single compounds, to predict functional groups in compound mixtures in our lab. Finally, we developed a long short-term memory (LSTM) based method to predict spectra with from a chemical reaction autonomously.


Our methodology showcases future practical utility for complex mixture identification, impurity detection in drug formulation, etc. We hope our methodology can pave the way for future methods for automated structural identification and help towards developing machine learning methods to make autonomous analytical instrumentation a reality.

Financial Disclosure

Board Memberno
ExpensesyesEli Lilly, Merck
IP Royaltyno

Planning to mention or discuss specific products or technology of the company(ies) listed above: