Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format

This paper presents work in progress on the DTA “Base Format” for Manuscripts (DTABf-M), an extension to the DTA “Base Format” (DTABf) for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous, annotation of large amounts of historical text. Duri...

Full description

Saved in:
Bibliographic Details
Main Authors: Susanne Haaf, Christian Thomas
Format: Article
Language:deu
Published: Text Encoding Initiative Consortium 2017-08-01
Series:Journal of the Text Encoding Initiative
Subjects:
Online Access:https://journals.openedition.org/jtei/1650
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832578478678999040
author Susanne Haaf
Christian Thomas
author_facet Susanne Haaf
Christian Thomas
author_sort Susanne Haaf
collection DOAJ
description This paper presents work in progress on the DTA “Base Format” for Manuscripts (DTABf-M), an extension to the DTA “Base Format” (DTABf) for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous, annotation of large amounts of historical text. During our work on the DTA corpora, the DTABf has continuously been subject to further adaptations to specific annotation needs. The latest addition, the DTABf-M, contains elements, attributes, and values necessary for the annotation of (historical) handwritten documents. The goal is to provide a TEI format for diverse manuscripts in large text corpora. While the DTABf covers a wide range of phenomena found not only in printed texts but also in manuscripts, there are certain manuscript-specific features which have to be additionally represented by the DTABf-M. There are several prerequisites for DTABf-M to be suitable for the DTA and its workflows and processes: First, it should be based on the original DTABf tagset, and only extend it if unavoidable. Second, like the DTABf, the DTABf-M should be created in a bottom-up approach, that is, based on actual phenomena found in handwritten texts which are transcribed and encoded using the DTABf. Third, the format should complement the DTABf, not replace it. Hence, it is necessary to find a modular way of integrating the DTABf-M into the DTABf. This paper describes how we deal with these issues in the process of developing the DTABf-M.
format Article
id doaj-art-c8a0fe08530a445086a9c74875f21e9f
institution Kabale University
issn 2162-5603
language deu
publishDate 2017-08-01
publisher Text Encoding Initiative Consortium
record_format Article
series Journal of the Text Encoding Initiative
spelling doaj-art-c8a0fe08530a445086a9c74875f21e9f2025-01-30T13:56:28ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032017-08-011010.4000/jtei.1650Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the FormatSusanne HaafChristian ThomasThis paper presents work in progress on the DTA “Base Format” for Manuscripts (DTABf-M), an extension to the DTA “Base Format” (DTABf) for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous, annotation of large amounts of historical text. During our work on the DTA corpora, the DTABf has continuously been subject to further adaptations to specific annotation needs. The latest addition, the DTABf-M, contains elements, attributes, and values necessary for the annotation of (historical) handwritten documents. The goal is to provide a TEI format for diverse manuscripts in large text corpora. While the DTABf covers a wide range of phenomena found not only in printed texts but also in manuscripts, there are certain manuscript-specific features which have to be additionally represented by the DTABf-M. There are several prerequisites for DTABf-M to be suitable for the DTA and its workflows and processes: First, it should be based on the original DTABf tagset, and only extend it if unavoidable. Second, like the DTABf, the DTABf-M should be created in a bottom-up approach, that is, based on actual phenomena found in handwritten texts which are transcribed and encoded using the DTABf. Third, the format should complement the DTABf, not replace it. Hence, it is necessary to find a modular way of integrating the DTABf-M into the DTABf. This paper describes how we deal with these issues in the process of developing the DTABf-M.https://journals.openedition.org/jtei/1650interchangeinteroperabilitystandardizationTEI customizationannotation of manuscriptsTEI corpora
spellingShingle Susanne Haaf
Christian Thomas
Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
Journal of the Text Encoding Initiative
interchange
interoperability
standardization
TEI customization
annotation of manuscripts
TEI corpora
title Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
title_full Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
title_fullStr Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
title_full_unstemmed Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
title_short Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format
title_sort enabling the encoding of manuscripts within the dtabf extension and modularization of the format
topic interchange
interoperability
standardization
TEI customization
annotation of manuscripts
TEI corpora
url https://journals.openedition.org/jtei/1650
work_keys_str_mv AT susannehaaf enablingtheencodingofmanuscriptswithinthedtabfextensionandmodularizationoftheformat
AT christianthomas enablingtheencodingofmanuscriptswithinthedtabfextensionandmodularizationoftheformat