The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings

Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically availab...

Full description

Saved in:
Bibliographic Details
Main Authors: Tomaž Erjavec, Andrej Pančur
Format: Article
Language:deu
Published: Text Encoding Initiative Consortium 2022-04-01
Series:Journal of the Text Encoding Initiative
Subjects:
Online Access:https://journals.openedition.org/jtei/4133
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832578464994033664
author Tomaž Erjavec
Andrej Pančur
author_facet Tomaž Erjavec
Andrej Pančur
author_sort Tomaž Erjavec
collection DOAJ
description Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation into corpora and for open distribution. For these reasons many countries have already produced corpora of parliamentary proceedings, but each typically in their own encoding, limiting their comparability and utilization in a multilingual setting. In this paper we propose an encoding schema which could serve as an interchange format for parliamentary corpora compiled for the purposes of scholarly investigations. The schema, called Parla-CLARIN, was developed within the CLARIN research infrastructure, and is written as a TEI ODD which includes a TEI customization and prose guidelines with examples of use. We discuss the coverage and choices made in designing the recommendations, and give an overview of the guidelines. We also discuss two other standard schemas for encoding parliamentary data, Akoma Ntoso and RDF, and their relation to Parla-CLARIN. We conclude by presenting corpora already encoded in Parla-CLARIN and discussing further work, especially the provision of a set of example documents and of transformation scripts that would make the proposed encoding more usable.
format Article
id doaj-art-6a8e01cee20f4779bece6a591e13a306
institution Kabale University
issn 2162-5603
language deu
publishDate 2022-04-01
publisher Text Encoding Initiative Consortium
record_format Article
series Journal of the Text Encoding Initiative
spelling doaj-art-6a8e01cee20f4779bece6a591e13a3062025-01-30T13:56:41ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032022-04-011410.4000/jtei.4133The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary ProceedingsTomaž ErjavecAndrej PančurParliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation into corpora and for open distribution. For these reasons many countries have already produced corpora of parliamentary proceedings, but each typically in their own encoding, limiting their comparability and utilization in a multilingual setting. In this paper we propose an encoding schema which could serve as an interchange format for parliamentary corpora compiled for the purposes of scholarly investigations. The schema, called Parla-CLARIN, was developed within the CLARIN research infrastructure, and is written as a TEI ODD which includes a TEI customization and prose guidelines with examples of use. We discuss the coverage and choices made in designing the recommendations, and give an overview of the guidelines. We also discuss two other standard schemas for encoding parliamentary data, Akoma Ntoso and RDF, and their relation to Parla-CLARIN. We conclude by presenting corpora already encoded in Parla-CLARIN and discussing further work, especially the provision of a set of example documents and of transformation scripts that would make the proposed encoding more usable.https://journals.openedition.org/jtei/4133TEI ODDparliamentary corporaencoding recommendations
spellingShingle Tomaž Erjavec
Andrej Pančur
The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
Journal of the Text Encoding Initiative
TEI ODD
parliamentary corpora
encoding recommendations
title The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
title_full The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
title_fullStr The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
title_full_unstemmed The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
title_short The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
title_sort parla clarin recommendations for encoding corpora of parliamentary proceedings
topic TEI ODD
parliamentary corpora
encoding recommendations
url https://journals.openedition.org/jtei/4133
work_keys_str_mv AT tomazerjavec theparlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings
AT andrejpancur theparlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings
AT tomazerjavec parlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings
AT andrejpancur parlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings