TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI

CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken langua...

Full description

Saved in:

Bibliographic Details
Main Authors:	Christophe Parisse, Carole Etienne, Loïc Liégeois
Format:	Article
Language:	deu
Published:	Text Encoding Initiative Consortium 2021-07-01
Series:	Journal of the Text Encoding Initiative
Subjects:	transcription TEI conversion oral corpora annotationBlock
Online Access:	https://journals.openedition.org/jtei/3464
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832578474502520832
author	Christophe Parisse Carole Etienne Loïc Liégeois
author_facet	Christophe Parisse Carole Etienne Loïc Liégeois
author_sort	Christophe Parisse
collection	DOAJ
description	CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.
format	Article
id	doaj-art-757ebaea906749528ced688b3a6bae6a
institution	Kabale University
issn	2162-5603
language	deu
publishDate	2021-07-01
publisher	Text Encoding Initiative Consortium
record_format	Article
series	Journal of the Text Encoding Initiative
spelling	doaj-art-757ebaea906749528ced688b3a6bae6a2025-01-30T13:56:37ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032021-07-011310.4000/jtei.3464TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEIChristophe ParisseCarole EtienneLoïc LiégeoisCORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics, especially on spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the destination target format. TEICORPO can run the Treetagger part-of-speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.https://journals.openedition.org/jtei/3464transcriptionTEIconversionoral corporaannotationBlock
spellingShingle	Christophe Parisse Carole Etienne Loïc Liégeois TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI Journal of the Text Encoding Initiative transcription TEI conversion oral corpora annotationBlock
title	TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full	TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_fullStr	TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_full_unstemmed	TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_short	TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
title_sort	teicorpo a conversion tool for spoken language transcription with a pivot file in tei
topic	transcription TEI conversion oral corpora annotationBlock
url	https://journals.openedition.org/jtei/3464
work_keys_str_mv	AT christopheparisse teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei AT caroleetienne teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei AT loicliegeois teicorpoaconversiontoolforspokenlanguagetranscriptionwithapivotfileintei

TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI

Similar Items