An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation

Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to...

Full description

Saved in:
Bibliographic Details
Main Authors: João Pedro Carvalho Moreira, Vinícius Rezende Carvalho, Eduardo Mazoni Andrade Marçal Mendes, Aria Fallah, Terrence J. Sejnowski, Claudia Lainscsek, Lindy Comstock
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05187-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207426622521344
author João Pedro Carvalho Moreira
Vinícius Rezende Carvalho
Eduardo Mazoni Andrade Marçal Mendes
Aria Fallah
Terrence J. Sejnowski
Claudia Lainscsek
Lindy Comstock
author_facet João Pedro Carvalho Moreira
Vinícius Rezende Carvalho
Eduardo Mazoni Andrade Marçal Mendes
Aria Fallah
Terrence J. Sejnowski
Claudia Lainscsek
Lindy Comstock
author_sort João Pedro Carvalho Moreira
collection DOAJ
description Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome noise in the EEG signal and remain reliable across sessions and stimuli that reflect types of real-world linguistic complexity without overfitting to a dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in 40 consonant-vowel pairs, 20 real words, and 20 pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation (TMS) to assess whether stimulation would augment the EEG signal associated with specific articulatory processes.
format Article
id doaj-art-7743bdcbfd3e47aba4a0c8d4da302dca
institution OA Journals
issn 2052-4463
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-7743bdcbfd3e47aba4a0c8d4da302dca2025-08-20T02:10:31ZengNature PortfolioScientific Data2052-44632025-06-0112111510.1038/s41597-025-05187-2An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulationJoão Pedro Carvalho Moreira0Vinícius Rezende Carvalho1Eduardo Mazoni Andrade Marçal Mendes2Aria Fallah3Terrence J. Sejnowski4Claudia Lainscsek5Lindy Comstock6Postgraduate Program in Electrical Engineering, Federal University of Minas GeraisPostgraduate Program in Electrical Engineering, Federal University of Minas GeraisPostgraduate Program in Electrical Engineering, Federal University of Minas GeraisDepartment of Neurosurgery, University of California, Los AngelesComputational Neurobiology Laboratory, The Salk Institute for Biological StudiesComputational Neurobiology Laboratory, The Salk Institute for Biological StudiesDepartment of Psychiatry & Biobehavioral Studies, University of California, Los AngelesAbstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome noise in the EEG signal and remain reliable across sessions and stimuli that reflect types of real-world linguistic complexity without overfitting to a dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in 40 consonant-vowel pairs, 20 real words, and 20 pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation (TMS) to assess whether stimulation would augment the EEG signal associated with specific articulatory processes.https://doi.org/10.1038/s41597-025-05187-2
spellingShingle João Pedro Carvalho Moreira
Vinícius Rezende Carvalho
Eduardo Mazoni Andrade Marçal Mendes
Aria Fallah
Terrence J. Sejnowski
Claudia Lainscsek
Lindy Comstock
An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
Scientific Data
title An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
title_full An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
title_fullStr An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
title_full_unstemmed An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
title_short An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
title_sort open access eeg dataset for speech decoding exploring the role of articulation and coarticulation
url https://doi.org/10.1038/s41597-025-05187-2
work_keys_str_mv AT joaopedrocarvalhomoreira anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT viniciusrezendecarvalho anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT eduardomazoniandrademarcalmendes anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT ariafallah anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT terrencejsejnowski anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT claudialainscsek anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT lindycomstock anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT joaopedrocarvalhomoreira openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT viniciusrezendecarvalho openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT eduardomazoniandrademarcalmendes openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT ariafallah openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT terrencejsejnowski openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT claudialainscsek openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation
AT lindycomstock openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation