A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.

<h4>Motivation</h4>Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile or...

Full description

Saved in:
Bibliographic Details
Main Authors: Salome Horsch, Dominik Kopczynski, Elias Kuthe, Jörg Ingo Baumbach, Sven Rahmann, Jörg Rahnenführer
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0184321&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850076500687060992
author Salome Horsch
Dominik Kopczynski
Elias Kuthe
Jörg Ingo Baumbach
Sven Rahmann
Jörg Rahnenführer
author_facet Salome Horsch
Dominik Kopczynski
Elias Kuthe
Jörg Ingo Baumbach
Sven Rahmann
Jörg Rahnenführer
author_sort Salome Horsch
collection DOAJ
description <h4>Motivation</h4>Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process.<h4>Method</h4>We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios.<h4>Results</h4>The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.
format Article
id doaj-art-8c6c111d15f743fb921a56c4dd7466cf
institution DOAJ
issn 1932-6203
language English
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-8c6c111d15f743fb921a56c4dd7466cf2025-08-20T02:46:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01129e018432110.1371/journal.pone.0184321A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.Salome HorschDominik KopczynskiElias KutheJörg Ingo BaumbachSven RahmannJörg Rahnenführer<h4>Motivation</h4>Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process.<h4>Method</h4>We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios.<h4>Results</h4>The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0184321&type=printable
spellingShingle Salome Horsch
Dominik Kopczynski
Elias Kuthe
Jörg Ingo Baumbach
Sven Rahmann
Jörg Rahnenführer
A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
PLoS ONE
title A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
title_full A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
title_fullStr A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
title_full_unstemmed A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
title_short A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.
title_sort detailed comparison of analysis processes for mcc ims data in disease classification automated methods can replace manual peak annotations
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0184321&type=printable
work_keys_str_mv AT salomehorsch adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT dominikkopczynski adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT eliaskuthe adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT jorgingobaumbach adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT svenrahmann adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT jorgrahnenfuhrer adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT salomehorsch detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT dominikkopczynski detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT eliaskuthe detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT jorgingobaumbach detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT svenrahmann detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT jorgrahnenfuhrer detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations