MSACL 2017 US Abstract

Eliminating Review of Acceptable Mass Spectrometry Data – an Approach using Machine Learning Algorithm

Min Yu (Presenter)
University of Virginia

Bio: After completed my two-year fellowshipping training in the Clinical Chemistry Laboratory Medicine at the University of Virginia this July, I am now working as an instructor in the same department. Before that I earned my MD degree, which included a one-year internship, in China and, later, completed my doctoral degree in Molecular and Environmental Toxicology at the University of Wisconsin-Madison. I have gained broad medical knowledge through my medical training, and built independent research abilities during my graduate studies. These prior experiences provided me a solid base for my current work in clinical chemistry.

Authorship: Min Yu, Lindsay Bazydlo, David Bruns, James Harrison
University of Virginia, Department of Pathology

Short Abstract

The volume of mass spectrometric testing in clinical laboratories has increased tremendously in recent years, leading to a need to enhance workflow. Automation of workflow has been achieved in sample preparation, data collection and processing, but turnaround time and productivity are hampered by time-consuming manual data review. Our aim is to develop a software tool that can automate the process of review of quality data. Using cannabinoids as an example, we demonstrate the use of software to speed and enhance review and show that the Support Vector Machines algorithm is superior to other classifiers in its ability of distinguish results that need further investigation from results that can be released without further review. Replacing the traditional manual review with this approach may increase throughput and increase efficiency while maintaining confidence in reported results.

Long Abstract

Introduction: Clinical laboratories are receiving increasing requests for mass spectrometric assays because of their advantages in analytical sensitivity and specificity. Automation in sample preparation and data processing has been implemented in some clinical laboratories to improve the productivity and efficiency. Manual review of the generated data to ensure accuracy of each reported result remains a time-consuming process. Because of the complexity of the generated mass spectrometry data, large amounts of time are consumed in the data quality review. To verify every patient sample result, the reviewer must evaluate numerous features including, but not limited to, peak shape, retention time (RT), ion ratio, area of internal standard (IS). Acceptable results are then released, while problematic results are held for further investigation to determine whether they can be released. As a result of the high performance of this analytical technique, the majority of the data reviewed is acceptable while searching for the few samples that have noticeable flaws. Our aim is to develop a software tool to automate the separation of the reportable data from the reviewable data, and thus lead to quality data review for only a limited number of samples. Evaluation of the quality of mass spectroscopic analyses involves recognition of patterns across multiple features in each run. Machine learning strategies are designed for this type of problem and are widely used in many domains. In mass spectrometry, machine learning has been used previously in proteomics studies for protein identification. Existing data within our laboratory provided a set of manually-classified runs that were reportable or reviewable and thus we chose to evaluate several supervised machine learning models, which require data samples that have known class labels.

Methods: Data were collected retrospectively from our in-house cannabinoids (THC) assay from September 2015 to September 2016. Current practice within the laboratory involves manual review of all data by the laboratory director and medical technologist, and samples are qualified as either acceptable as is, or noted as requiring further investigation. Data were collected for all samples and if further investigation was warranted at the time of clinical analysis, the sample was labeled as an outlier. Data were collected and pooled for this project by recall from 2 Agilent GC-MS instruments in our laboratory, a 6890-5975 and 6890-5977, and included metrics such as retention time (RT), internal standard (IS) area, and ion ratios of THC and internal standard.

Analytics and machine learning were carried out using the Python programming language (version 3.5.2) with the Scikit-learn machine learning library. To balance the variances from runs and instruments, the metrics were normalized to data for the level 3 (of 5) calibrator. The data was divided into two different classes: if upon review the data was noted to require further investigation the data was designated as the outlier class, and the data deemed acceptable was placed in the non-outlier class. The dataset was standardized by preprocessing using the StandardScaler utility class, which reapplied the computed mean and standard deviation from training set data to test set data. Supervised machine learning methods that were evaluated, included Logistic Regression (LR), Support Vector Machines (SVMs), Decision Tree (DT), Random Forest (RF) and Adaboost (AdaB). Performance of each classifier is usually evaluated using K-fold cross-validation, which randomly splits up a single dataset into k-1 fold training sets and one-fold test set and the score is the mean performance of the k-fold sets. We used stratified 10-fold cross-validation with maintenance of target class percentages across the sets (stratified-Kfold cross-validation), which is important for imbalanced datasets like ours. Due to the low incidence of outliers, we scored performance by recall of the outlier class (sensitivity) rather than the average accuracy score of the outlier and non-outlier classes. Method selection and hyperparameter tuning for the selected method were based on the score of stratified K-fold cross-validation. The validation of the selected method with tuned parameters was eventually performed by building the model on a training set and testing on the testing set created by a random split of the total data with a test size equal to one third.

Results: Out of a total number of 853 patient samples, there were 93 outliers (11%). For comparing different classifier algorithms, the prediction scoring system was weighted to favor the results of the outlier class. Since this approach is analogous to a screening test, a higher recall of the outlier translates into a decreased chance that we may miss a true outlier. The SVMs performed the best for the datasets evaluated. Performance of each methods (recall score for outlier class (Standard deviation)):,SVMs: 0.955556 (0.073703), LR: 0.870000 (0.118431), RF: 0.857778 (0.120984) CART: 0.802222 (0.129367), AdaB: 0.780000 (0.164939). Tuning was performed and the optimal parameters were determined. Lastly, we trained the SVMs classifier with this tuned parameter on the random split training dataset and tested on the testing dataset. The recall (sensitivity) for the outlier class is 97%, where only 1 outlier out of 35 outliers (total of 282 patient samples) was missed by the trained model during validation on the test set. And after investigation, the missed outlier was a patient result that was reported after further investigation. The total accuracy score (correct prediction rate) for both classes is 95% as 267 out 282 were accurately predicted. In addition, the precision rate (positive predictive value) is 71%, compared with 11% from a manual review since all the data is reviewed. The impact of using this model as a screening method could be assessed by estimating if we have 20 patient samples in a run, typically we would review all 20 samples to discover 2 samples that need further investigation. By contrast, with implementation of this tool, we need to manually review only 3 samples/run of 20 samples and the review would include two that would be unacceptable and only one that would ultimately be released. This would relieve the laboratory director and medical technologist of the task of needlessly reviewing acceptable data and allow them to focus on the samples that need review.

Conclusions: Application of the Support Vector Machines model to quality review of mass spectrometry data has excellent sensitivity and good precision in distinguishing data that needs review from data that can be released without manual review. Implementation of this tool would relieve the laboratory director and medical technologist of the task of needlessly reviewing acceptable data and allow them to focus on the samples that need review. Decreasing time spent manually reviewing acceptable data has the potential to improve productivity and efficiency in the workflow of mass spectrometry and eventually improve the quality of health care.


References & Acknowledgements:


Financial Disclosure

DescriptionY/NSource
Grantsno
Salaryno
Board Memberno
Stockno
Expensesno

IP Royalty: no

Planning to mention or discuss specific products or technology of the company(ies) listed above:

no