Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance

Application of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis....

Full description

Saved in:
Bibliographic Details
Main Authors: Carl-Magnus Svensson, Ron Hübler, Marc Thilo Figge
Format: Article
Language:English
Published: Wiley 2015-01-01
Series:Journal of Immunology Research
Online Access:http://dx.doi.org/10.1155/2015/573165
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832549896061714432
author Carl-Magnus Svensson
Ron Hübler
Marc Thilo Figge
author_facet Carl-Magnus Svensson
Ron Hübler
Marc Thilo Figge
author_sort Carl-Magnus Svensson
collection DOAJ
description Application of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous even for a trained expert or because of mistakes. Here we investigated the interobserver variability of image data comprising fluorescently stained circulating tumor cells and its effect on the performance of two automated classifiers, a random forest and a support vector machine. We found that uncertainty in annotation between observers limited the performance of the automated classifiers, especially when it was included in the test set on which classifier performance was measured. The random forest classifier turned out to be resilient to uncertainty in the training data while the support vector machine’s performance is highly dependent on the amount of uncertainty in the training data. We finally introduced the consensus data set as a possible solution for evaluation of automated classifiers that minimizes the penalty of interobserver variability.
format Article
id doaj-art-eb823e158d7e42e6abb58452d2b88f6d
institution Kabale University
issn 2314-8861
2314-7156
language English
publishDate 2015-01-01
publisher Wiley
record_format Article
series Journal of Immunology Research
spelling doaj-art-eb823e158d7e42e6abb58452d2b88f6d2025-02-03T06:08:20ZengWileyJournal of Immunology Research2314-88612314-71562015-01-01201510.1155/2015/573165573165Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and PerformanceCarl-Magnus Svensson0Ron Hübler1Marc Thilo Figge2Applied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplication of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous even for a trained expert or because of mistakes. Here we investigated the interobserver variability of image data comprising fluorescently stained circulating tumor cells and its effect on the performance of two automated classifiers, a random forest and a support vector machine. We found that uncertainty in annotation between observers limited the performance of the automated classifiers, especially when it was included in the test set on which classifier performance was measured. The random forest classifier turned out to be resilient to uncertainty in the training data while the support vector machine’s performance is highly dependent on the amount of uncertainty in the training data. We finally introduced the consensus data set as a possible solution for evaluation of automated classifiers that minimizes the penalty of interobserver variability.http://dx.doi.org/10.1155/2015/573165
spellingShingle Carl-Magnus Svensson
Ron Hübler
Marc Thilo Figge
Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
Journal of Immunology Research
title Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_full Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_fullStr Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_full_unstemmed Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_short Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_sort automated classification of circulating tumor cells and the impact of interobsever variability on classifier training and performance
url http://dx.doi.org/10.1155/2015/573165
work_keys_str_mv AT carlmagnussvensson automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance
AT ronhubler automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance
AT marcthilofigge automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance