Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance

Application of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis....

Full description

Saved in:

Bibliographic Details
Main Authors:	Carl-Magnus Svensson, Ron Hübler, Marc Thilo Figge
Format:	Article
Language:	English
Published:	Wiley 2015-01-01
Series:	Journal of Immunology Research
Online Access:	http://dx.doi.org/10.1155/2015/573165
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832549896061714432
author	Carl-Magnus Svensson Ron Hübler Marc Thilo Figge
author_facet	Carl-Magnus Svensson Ron Hübler Marc Thilo Figge
author_sort	Carl-Magnus Svensson
collection	DOAJ
description	Application of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous even for a trained expert or because of mistakes. Here we investigated the interobserver variability of image data comprising fluorescently stained circulating tumor cells and its effect on the performance of two automated classifiers, a random forest and a support vector machine. We found that uncertainty in annotation between observers limited the performance of the automated classifiers, especially when it was included in the test set on which classifier performance was measured. The random forest classifier turned out to be resilient to uncertainty in the training data while the support vector machine’s performance is highly dependent on the amount of uncertainty in the training data. We finally introduced the consensus data set as a possible solution for evaluation of automated classifiers that minimizes the penalty of interobserver variability.
format	Article
id	doaj-art-eb823e158d7e42e6abb58452d2b88f6d
institution	Kabale University
issn	2314-8861 2314-7156
language	English
publishDate	2015-01-01
publisher	Wiley
record_format	Article
series	Journal of Immunology Research
spelling	doaj-art-eb823e158d7e42e6abb58452d2b88f6d2025-02-03T06:08:20ZengWileyJournal of Immunology Research2314-88612314-71562015-01-01201510.1155/2015/573165573165Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and PerformanceCarl-Magnus Svensson0Ron Hübler1Marc Thilo Figge2Applied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute (HKI), Beutenbergstraße 11a, 07745 Jena, GermanyApplication of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous even for a trained expert or because of mistakes. Here we investigated the interobserver variability of image data comprising fluorescently stained circulating tumor cells and its effect on the performance of two automated classifiers, a random forest and a support vector machine. We found that uncertainty in annotation between observers limited the performance of the automated classifiers, especially when it was included in the test set on which classifier performance was measured. The random forest classifier turned out to be resilient to uncertainty in the training data while the support vector machine’s performance is highly dependent on the amount of uncertainty in the training data. We finally introduced the consensus data set as a possible solution for evaluation of automated classifiers that minimizes the penalty of interobserver variability.http://dx.doi.org/10.1155/2015/573165
spellingShingle	Carl-Magnus Svensson Ron Hübler Marc Thilo Figge Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance Journal of Immunology Research
title	Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_full	Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_fullStr	Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_full_unstemmed	Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_short	Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance
title_sort	automated classification of circulating tumor cells and the impact of interobsever variability on classifier training and performance
url	http://dx.doi.org/10.1155/2015/573165
work_keys_str_mv	AT carlmagnussvensson automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance AT ronhubler automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance AT marcthilofigge automatedclassificationofcirculatingtumorcellsandtheimpactofinterobsevervariabilityonclassifiertrainingandperformance

Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance

Similar Items