The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically availab...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | deu |
Published: |
Text Encoding Initiative Consortium
2022-04-01
|
Series: | Journal of the Text Encoding Initiative |
Subjects: | |
Online Access: | https://journals.openedition.org/jtei/4133 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832578464994033664 |
---|---|
author | Tomaž Erjavec Andrej Pančur |
author_facet | Tomaž Erjavec Andrej Pančur |
author_sort | Tomaž Erjavec |
collection | DOAJ |
description | Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation into corpora and for open distribution. For these reasons many countries have already produced corpora of parliamentary proceedings, but each typically in their own encoding, limiting their comparability and utilization in a multilingual setting. In this paper we propose an encoding schema which could serve as an interchange format for parliamentary corpora compiled for the purposes of scholarly investigations. The schema, called Parla-CLARIN, was developed within the CLARIN research infrastructure, and is written as a TEI ODD which includes a TEI customization and prose guidelines with examples of use. We discuss the coverage and choices made in designing the recommendations, and give an overview of the guidelines. We also discuss two other standard schemas for encoding parliamentary data, Akoma Ntoso and RDF, and their relation to Parla-CLARIN. We conclude by presenting corpora already encoded in Parla-CLARIN and discussing further work, especially the provision of a set of example documents and of transformation scripts that would make the proposed encoding more usable. |
format | Article |
id | doaj-art-6a8e01cee20f4779bece6a591e13a306 |
institution | Kabale University |
issn | 2162-5603 |
language | deu |
publishDate | 2022-04-01 |
publisher | Text Encoding Initiative Consortium |
record_format | Article |
series | Journal of the Text Encoding Initiative |
spelling | doaj-art-6a8e01cee20f4779bece6a591e13a3062025-01-30T13:56:41ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032022-04-011410.4000/jtei.4133The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary ProceedingsTomaž ErjavecAndrej PančurParliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation into corpora and for open distribution. For these reasons many countries have already produced corpora of parliamentary proceedings, but each typically in their own encoding, limiting their comparability and utilization in a multilingual setting. In this paper we propose an encoding schema which could serve as an interchange format for parliamentary corpora compiled for the purposes of scholarly investigations. The schema, called Parla-CLARIN, was developed within the CLARIN research infrastructure, and is written as a TEI ODD which includes a TEI customization and prose guidelines with examples of use. We discuss the coverage and choices made in designing the recommendations, and give an overview of the guidelines. We also discuss two other standard schemas for encoding parliamentary data, Akoma Ntoso and RDF, and their relation to Parla-CLARIN. We conclude by presenting corpora already encoded in Parla-CLARIN and discussing further work, especially the provision of a set of example documents and of transformation scripts that would make the proposed encoding more usable.https://journals.openedition.org/jtei/4133TEI ODDparliamentary corporaencoding recommendations |
spellingShingle | Tomaž Erjavec Andrej Pančur The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings Journal of the Text Encoding Initiative TEI ODD parliamentary corpora encoding recommendations |
title | The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings |
title_full | The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings |
title_fullStr | The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings |
title_full_unstemmed | The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings |
title_short | The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings |
title_sort | parla clarin recommendations for encoding corpora of parliamentary proceedings |
topic | TEI ODD parliamentary corpora encoding recommendations |
url | https://journals.openedition.org/jtei/4133 |
work_keys_str_mv | AT tomazerjavec theparlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings AT andrejpancur theparlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings AT tomazerjavec parlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings AT andrejpancur parlaclarinrecommendationsforencodingcorporaofparliamentaryproceedings |