An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-06-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05187-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207426622521344 |
|---|---|
| author | João Pedro Carvalho Moreira Vinícius Rezende Carvalho Eduardo Mazoni Andrade Marçal Mendes Aria Fallah Terrence J. Sejnowski Claudia Lainscsek Lindy Comstock |
| author_facet | João Pedro Carvalho Moreira Vinícius Rezende Carvalho Eduardo Mazoni Andrade Marçal Mendes Aria Fallah Terrence J. Sejnowski Claudia Lainscsek Lindy Comstock |
| author_sort | João Pedro Carvalho Moreira |
| collection | DOAJ |
| description | Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome noise in the EEG signal and remain reliable across sessions and stimuli that reflect types of real-world linguistic complexity without overfitting to a dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in 40 consonant-vowel pairs, 20 real words, and 20 pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation (TMS) to assess whether stimulation would augment the EEG signal associated with specific articulatory processes. |
| format | Article |
| id | doaj-art-7743bdcbfd3e47aba4a0c8d4da302dca |
| institution | OA Journals |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-7743bdcbfd3e47aba4a0c8d4da302dca2025-08-20T02:10:31ZengNature PortfolioScientific Data2052-44632025-06-0112111510.1038/s41597-025-05187-2An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulationJoão Pedro Carvalho Moreira0Vinícius Rezende Carvalho1Eduardo Mazoni Andrade Marçal Mendes2Aria Fallah3Terrence J. Sejnowski4Claudia Lainscsek5Lindy Comstock6Postgraduate Program in Electrical Engineering, Federal University of Minas GeraisPostgraduate Program in Electrical Engineering, Federal University of Minas GeraisPostgraduate Program in Electrical Engineering, Federal University of Minas GeraisDepartment of Neurosurgery, University of California, Los AngelesComputational Neurobiology Laboratory, The Salk Institute for Biological StudiesComputational Neurobiology Laboratory, The Salk Institute for Biological StudiesDepartment of Psychiatry & Biobehavioral Studies, University of California, Los AngelesAbstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome noise in the EEG signal and remain reliable across sessions and stimuli that reflect types of real-world linguistic complexity without overfitting to a dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in 40 consonant-vowel pairs, 20 real words, and 20 pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation (TMS) to assess whether stimulation would augment the EEG signal associated with specific articulatory processes.https://doi.org/10.1038/s41597-025-05187-2 |
| spellingShingle | João Pedro Carvalho Moreira Vinícius Rezende Carvalho Eduardo Mazoni Andrade Marçal Mendes Aria Fallah Terrence J. Sejnowski Claudia Lainscsek Lindy Comstock An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation Scientific Data |
| title | An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation |
| title_full | An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation |
| title_fullStr | An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation |
| title_full_unstemmed | An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation |
| title_short | An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation |
| title_sort | open access eeg dataset for speech decoding exploring the role of articulation and coarticulation |
| url | https://doi.org/10.1038/s41597-025-05187-2 |
| work_keys_str_mv | AT joaopedrocarvalhomoreira anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT viniciusrezendecarvalho anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT eduardomazoniandrademarcalmendes anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT ariafallah anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT terrencejsejnowski anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT claudialainscsek anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT lindycomstock anopenaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT joaopedrocarvalhomoreira openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT viniciusrezendecarvalho openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT eduardomazoniandrademarcalmendes openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT ariafallah openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT terrencejsejnowski openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT claudialainscsek openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation AT lindycomstock openaccesseegdatasetforspeechdecodingexploringtheroleofarticulationandcoarticulation |