Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text

Among mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copie...

Full description

Saved in:

Bibliographic Details
Main Authors:	Susanne Haaf, Frank Wiegand, Alexander Geyken
Format:	Article
Language:	deu
Published:	Text Encoding Initiative Consortium 2015-03-01
Series:	Journal of the Text Encoding Initiative
Subjects:	digitization tools double-keying quality control error classification transcription accuracy
Online Access:	https://journals.openedition.org/jtei/739
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832578492672245760
author	Susanne Haaf Frank Wiegand Alexander Geyken
author_facet	Susanne Haaf Frank Wiegand Alexander Geyken
author_sort	Susanne Haaf
collection	DOAJ
description	Among mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copies or other difficulties such as spelling variation or complex text structures. Providers of data entry services using the double-keying method generally advertise very high accuracy rates (around 99.95% to 99.98%). These advertised percentages are generally estimated on the basis of small samples, and little if anything is said about either the actual amount of text or the text genres which have been proofread, about error types, proofreaders, etc. In order to obtain significant data on this problem it is necessary to analyze a large amount of text representing a balanced sample of different text types, to distinguish the structural XML/TEI level from the typographical level, and to differentiate between various types of errors which may originate from different sources and may not be equally severe. This paper presents an extensive and complex approach to the analysis and correction of double-keying errors which has been applied by the DFG-funded project "Deutsches Textarchiv" (German Text Archive, hereafter DTA) in order to evaluate and preferably to increase the transcription and annotation accuracy of double-keyed DTA texts. Statistical analyses of the results gained from proofreading a large quantity of text are presented, which verify the common accuracy rates for the double-keying method.
format	Article
id	doaj-art-6616a0775c194363803395e875a74ed2
institution	Kabale University
issn	2162-5603
language	deu
publishDate	2015-03-01
publisher	Text Encoding Initiative Consortium
record_format	Article
series	Journal of the Text Encoding Initiative
spelling	doaj-art-6616a0775c194363803395e875a74ed22025-01-30T13:56:16ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032015-03-01410.4000/jtei.739Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical TextSusanne HaafFrank WiegandAlexander GeykenAmong mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copies or other difficulties such as spelling variation or complex text structures. Providers of data entry services using the double-keying method generally advertise very high accuracy rates (around 99.95% to 99.98%). These advertised percentages are generally estimated on the basis of small samples, and little if anything is said about either the actual amount of text or the text genres which have been proofread, about error types, proofreaders, etc. In order to obtain significant data on this problem it is necessary to analyze a large amount of text representing a balanced sample of different text types, to distinguish the structural XML/TEI level from the typographical level, and to differentiate between various types of errors which may originate from different sources and may not be equally severe. This paper presents an extensive and complex approach to the analysis and correction of double-keying errors which has been applied by the DFG-funded project "Deutsches Textarchiv" (German Text Archive, hereafter DTA) in order to evaluate and preferably to increase the transcription and annotation accuracy of double-keyed DTA texts. Statistical analyses of the results gained from proofreading a large quantity of text are presented, which verify the common accuracy rates for the double-keying method.https://journals.openedition.org/jtei/739digitizationtoolsdouble-keyingquality controlerror classificationtranscription accuracy
spellingShingle	Susanne Haaf Frank Wiegand Alexander Geyken Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text Journal of the Text Encoding Initiative digitization tools double-keying quality control error classification transcription accuracy
title	Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
title_full	Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
title_fullStr	Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
title_full_unstemmed	Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
title_short	Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text
title_sort	measuring the correctness of double keying error classification and quality control in a large corpus of tei annotated historical text
topic	digitization tools double-keying quality control error classification transcription accuracy
url	https://journals.openedition.org/jtei/739
work_keys_str_mv	AT susannehaaf measuringthecorrectnessofdoublekeyingerrorclassificationandqualitycontrolinalargecorpusofteiannotatedhistoricaltext AT frankwiegand measuringthecorrectnessofdoublekeyingerrorclassificationandqualitycontrolinalargecorpusofteiannotatedhistoricaltext AT alexandergeyken measuringthecorrectnessofdoublekeyingerrorclassificationandqualitycontrolinalargecorpusofteiannotatedhistoricaltext

Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text

Similar Items