INTRODUCTION: Accurate peak detection and annotation is the crux of global profiling metabolomics studies. The deconvolution of raw data into peak lists is the first step in the lengthy data analysis pipeline. With many tools out there for this conversion, results often vary between tools. This is due to the different algorithms that make up each tool. While there has been debate over which tool works the best, in addition to proprietary software, 3 open-source tools have gained significant traction as commonly used softwares for data deconvolution: XCMS, MS-DIAL, and MZmine3+ADAP. However, there is insufficient information available on the comparative performance of these tools. Herein we present a systematic analysis of the 3 commonly used peak-picking softwares on a large-scale multi-batched semi-targeted dataset. Not only are we comparing common features identified between the software, but we are also evaluating the robustness each when handling multiple batches of data, which will show key differences between the tools and the reproducibility of these programs.
METHODS: The Study of Environmental Enteropathy and Malnutrition in Pakistan (SEEM), study was utilized for this comparative analysis. In this large dataset, over 800 urine samples from 400 children were analyzed in 9 analytical batches. A semi-targeted analysis of bile acids was conducted to using a novel in-house method. The data were collected on a UHPLC-HRMS system with Q ExactiveTM plus hybrid quadrupole-OrbitrapTM mass spectrometer interfaced with Vanquish ultra-high performance liquid chromatography (UHPLC) system (Thermo Scientific, Waltham, MA). For this comparative analysis, we are processing data with 3 open-source softwares: XCMS, MS-DIAL, and MZmine3 with the ADAP method. The bases of most software created for untargeted analysis is 2 algorithms: peak detection and peak alignment. In XCMS, peak detection is conducted by the centWave algorithm which used peakwidth, expected range of the peak, and ppm, maximum expected deviation of m/z values of centroids corresponding to one chromatographic peak. After the identification of the peaks in each sample, they are aligned by a 0.05 Da m/z window and RT deviations which are corrected by identifying by peak groups and then applying a loess smoothing function to correct the RTs. Thus, a RT tolerance value is not defined, rather a 0.2 span value is used to define the degree of smoothing for the regression. The MS-DIAL peak detection algorithm uses a mass tolerance, 0.01 Da by default, and a mass slice width, 0.05 recommended, to identify peaks. The peak alignment algorithm is based on Joint Aligner which is used in MZmine, which creates a reference peak table based on a user-defined ‘reference file’ and fits each sample’s peak table to this reference. The suggested range by MS-DIAL for RT tolerance was 0.05-0.1 min and 0.05 min was utilized for these data. MZmine3 with ADAP uses ADAP for peak detection. ADAP follows a similar approach to centWave in XCMS but with a key difference the filtering process. ADAP sorts the peaks and prioritizes those peaks with the largest chances of being real peak and filters down. ADAP uses a 0.002 Da tolerance (the smallest between the 3 tools) for peak detection. Features are aligned using join align which calculate a score using a m/z tolerance of 0.001 Da and RT tolerance of 0.1 min. Each batch was individually processed with each software and data aligned across the batches. For this study, an in-house database was built to identify the bile acids.
RESULTS: While there are similarities in the algorithms across the software packages, the final results of each software package varies significantly based on the differences. Additionally, each software has their own recommended parameter setting, which also vary based on the software. Each software provided initial outputs of different sizes that were then cleaned to remove noise. Any peak with a signal-to-noise threshold below 10 between sample injection and blank injection, or missing data for greater than 20% of sample injections, were deemed noise and removed. Initial XCMS outputs ranged from 12,800 to 14,900 features which then were then filtered down to 6,500 to 8,300 features. Initial MS-DIAL outputs ranged from 40,900 to 79,900 features that then were then filtered down to 14,000 to 21,000 features. Initial MZmine+ADAP outputs ranged from 20,700 to 29,600 features that then were then filtered down to 1,400 to 1,800 features. From initial peak list to the final filtered features we see drastic differences between these 3 tools. All 15 internal standards, eluting at various points in the run, were clearly identified. For this novel in-house bile acid analysis, we must build a database to accurately identify typical and atypical bile acids. Using this semi-targeted data, we are able to use the untargeted approach of data analysis while still having a pre-defined metabolite space for identification to compare the different tools. This database will allow us to identity which features are consistently identified across all the batches and which were found using each software tool. Currently, a preliminary database of 196 bile acids has been created on these data and is continuing to grow.
CONCLUSION: Any tool used in an analysis workflow has the potential to greatly influence on the results, especially if it is the initial peak-picking step. Optimization and identification of which tool and subsequently which parameters to use remains a challenge. With large-scale studies or new method developments, the deconvolution tool will directly impact which compounds are identified and incorporated into a database. A comparative analysis of the tools reveals the pros and cons of each method allowing for future users to select the best one. Our study not only compared the 3 tools but evaluated each tool’s robustness when applied to multiple batches of data.