A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources

This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest seri...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gunnar Thorvaldsen, Joana Maria Pujadas-Mora, Trygve Andersen, Line Eikvil, Josep Lladós, Alícia Fornés, Anna Cabré
Format:	Article
Language:	English
Published:	International Institute of Social History 2015-01-01
Series:	Historical Life Course Studies
Subjects:	Word spotting Optical Character Recognition Vital records Census Nominative sources Computer vision
Online Access:	https://www.openjournals.nl/index.php/hlcs/article/view/9355
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571074206760960
author	Gunnar Thorvaldsen Joana Maria Pujadas-Mora Trygve Andersen Line Eikvil Josep Lladós Alícia Fornés Anna Cabré
author_facet	Gunnar Thorvaldsen Joana Maria Pujadas-Mora Trygve Andersen Line Eikvil Josep Lladós Alícia Fornés Anna Cabré
author_sort	Gunnar Thorvaldsen
collection	DOAJ
description	This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.
format	Article
id	doaj-art-071e12e9b86c4e168dca545b8915346c
institution	Kabale University
issn	2352-6343
language	English
publishDate	2015-01-01
publisher	International Institute of Social History
record_format	Article
series	Historical Life Course Studies
spelling	doaj-art-071e12e9b86c4e168dca545b8915346c2025-02-02T13:13:40ZengInternational Institute of Social HistoryHistorical Life Course Studies2352-63432015-01-01210.51964/hlcs9355A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical SourcesGunnar ThorvaldsenJoana Maria Pujadas-MoraTrygve AndersenLine EikvilJosep LladósAlícia FornésAnna CabréThis article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.https://www.openjournals.nl/index.php/hlcs/article/view/9355Word spottingOptical Character RecognitionVital recordsCensusNominative sourcesComputer vision
spellingShingle	Gunnar Thorvaldsen Joana Maria Pujadas-Mora Trygve Andersen Line Eikvil Josep Lladós Alícia Fornés Anna Cabré A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources Historical Life Course Studies Word spotting Optical Character Recognition Vital records Census Nominative sources Computer vision
title	A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources
title_full	A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources
title_fullStr	A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources
title_full_unstemmed	A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources
title_short	A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources
title_sort	tale of two transcriptions machine assisted transcription of historical sources
topic	Word spotting Optical Character Recognition Vital records Census Nominative sources Computer vision
url	https://www.openjournals.nl/index.php/hlcs/article/view/9355
work_keys_str_mv	AT gunnarthorvaldsen ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT joanamariapujadasmora ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT trygveandersen ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT lineeikvil ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT josepllados ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT aliciafornes ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT annacabre ataleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT gunnarthorvaldsen taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT joanamariapujadasmora taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT trygveandersen taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT lineeikvil taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT josepllados taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT aliciafornes taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources AT annacabre taleoftwotranscriptionsmachineassistedtranscriptionofhistoricalsources

A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources

Similar Items