MSACL 2023 Abstract

Self-Classified Topic Area(s): Data Analytics > Precision Medicine > Multi-omics

Podium Presentation in Steinbeck 3 on Thursday at 15:55 (Chair: Stephen Master / Bo Burla)

Translation stage. Development of Innovative Mass Spectrometry Pipelines for Improved Cancer Diagnosis and Biomarker Identification Through Feature Extraction
Yanis Zirem (1), Michel Salzet (1), Isabelle Fournier (1)
(1) Laboratory PRISM Inserm U1192

Yanis Zirem (Presenter)
Laboratory Prism Inserm U1192

Presenter Bio: As an engineer in Bioinformatics for omics data, I'm passionate about data processing and developping new algorithms for the advancements of health, medical treatments and personalized medicine. With background in biochemistry and ML/IA, I'm consistently seeking innovative solutions to complex challenges in the field of mass spectrometry and bioinformatics.

Abstract

Introduction : SpiderMass is an ambient mass spectrometry technology that offers a promising solution for in vivo cancer diagnosis and prognosis. The accurate prediction of cancer requires the use of sophisticated AI algorithms such as supervised machine learning, and the selection of the best suited classification model can have a significant impact on the accuracy of the results. The objective of this research was to develop a reliable pipeline for the automated selection of the most efficient classification model and the extraction of potential biomarkers.

Methods : The study used various machine learning classification models to determine the one with the highest predictive accuracy and shortest training time. Python and its open-source packages, including scikit-learn, pandas, numpy, scipy, lazy predict, eli5, matplotlib and seaborn, were used to develop an automated pipeline. The pipeline consists of three main modules: 1) selection of the optimal classifier, 2) extraction of specific ions that influence the prediction of each class, and 3) comparison of the relative abundances of all m/z peaks between classes to identify those that are significantly different with a p-value of at least 0.05. The input data consisted of cancer tissues, including ovarian, esogastric, and glioma cancers.

Results : The study was successful in identifying the most efficient model for each type of data and extracting biomarkers with high statistical significance. The pipeline highlighted the key features involved in class characterization and compared them to the results of the third module that tested all features between the different classes. Features that were consistent between the two modules were considered potential biomarkers. For example, in the case of glioma cancer, the linear SVC model was found to be the most effective classification model with a 93% good classification rate. The analysis revealed potential biomarkers, such as the presence of specific ions in cancerous tissue compared to healthy and necrotic tissue (e.g. a high abundance of PAs in cancerous tissue compared to PCs and PSs in healthy tissue and an absence of PIs in necrotic tissue).

Discussion : The findings of this study emphasize the importance of selecting an efficient classification model and using feature extraction techniques for cancer diagnosis and prognosis. The potential biomarkers identified in this study could be incorporated into a large database to advance personalized medicine and aid in the development of new drug treatments. Furthermore, combining other mass spectrometry technologies and analyzing other omics data could provide a more comprehensive view of the cancer, leading to more accurate diagnosis and prognosis.

Financial Disclosure

Description	Y/N	Source
Grants	no
Salary	yes	University of Lille
Board Member	no
Stock	no
Expenses	no
IP Royalty	no

Planning to mention or discuss specific products or technology of the company(ies) listed above:
no