TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI

CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken langua...

Full description

Saved in:
Bibliographic Details
Main Authors: Christophe Parisse, Carole Etienne, Loïc Liégeois
Format: Article
Language:deu
Published: Text Encoding Initiative Consortium 2021-07-01
Series:Journal of the Text Encoding Initiative
Subjects:
Online Access:https://journals.openedition.org/jtei/3464
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832578474502520832
author Christophe Parisse
Carole Etienne
Loïc Liégeois
author_facet Christophe Parisse
Carole Etienne
Loïc Liégeois
author_sort Christophe Parisse
collection DOAJ
description CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.
format Article
id doaj-art-757ebaea906749528ced688b3a6bae6a
institution Kabale University
issn 2162-5603
language deu
publishDate 2021-07-01
publisher Text Encoding Initiative Consortium
record_format Article
series Journal of the Text Encoding Initiative
spelling doaj-art-757ebaea906749528ced688b3a6bae6a2025-01-30T13:56:37ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032021-07-011310.4000/jtei.3464TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEIChristophe ParisseCarole EtienneLoïc LiégeoisCORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.https://journals.openedition.org/jtei/3464transcriptionTEIconversionoral corporaannotationBlock
spellingShingle Christophe Parisse
Carole Etienne
Loïc Liégeois
TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
Journal of the Text Encoding Initiative
transcription
TEI
conversion
oral corpora
annotationBlock
title TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_fullStr TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full_unstemmed TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_short TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_sort teicorpo a conversion tool for spoken language transcription with a pivot file in tei
topic transcription
TEI
conversion
oral corpora
annotationBlock
url https://journals.openedition.org/jtei/3464
work_keys_str_mv AT christopheparisse teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei
AT caroleetienne teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei
AT loicliegeois teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei