Topic: Data Science
Podium Presentation in Room 5 on Thursday at 11:40 (Chair: Irene van den Broek)
Authors: Gaurav Chopra
Functional groups are central to chemistry linking analytical, physical, organic, and materials chemistry. State-of-the-art identification of the functional groups present in an unknown chemical entity requires expertise of a skilled spectroscopist to analyze and interpret Fourier Transform Infra-Red (FTIR), Mass Spectroscopy (MS) and/or Nuclear Magnetic Resonance (NMR) data. This process can be time-consuming and error-prone, especially for complex chemical entities that poorly characterized in the literature, or inefficient to use with synthetic robots producing and testing new molecules at an accelerated rate.
We introduce fast deep neural network architectures for accurately identifying all the functional groups of unknown compounds. We do not use any database, pre-established rules, procedures, or peak-matching methods for prediction. Instead, we derive patterns and correlations using chemical reactions available from publicly available FTIR and MS spectral data.
We introduce two new metrics (Molecular F1 score and Molecular Perfection rate) to measure the performance of identifying all functional groups on molecules for practical use. The optimized model has a Molecular F1 score of 0.92 and a Molecular Perfection rate of 72%. Additionally, our trained neural network reveals patterns typically used by human chemists to identify standard groups. Next, we experimentally validated our neural network, trained on single compounds, to predict functional groups in compound mixtures in our lab. Finally, we developed a long short-term memory (LSTM) based method to predict spectra with from a chemical reaction autonomously.
Our methodology showcases future practical utility for complex mixture identification, impurity detection in drug formulation, etc. We hope our methodology can pave the way for future methods for automated structural identification and help towards developing machine learning methods to make autonomous analytical instrumentation a reality.
|Expenses||yes||Eli Lilly, Merck|
|Planning to mention or discuss specific products or technology of the company(ies) listed above:||