An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation
Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-06-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05187-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Electroencephalography (EEG) holds promise for brain-computer interface (BCI) devices as a non-invasive measure of neural activity. With increased attention to EEG-based BCI systems, publicly available datasets incorporating the complex stimuli found in naturalistic speech are necessary to establish a common standard of performance within the BCI community. Effective solutions must overcome noise in the EEG signal and remain reliable across sessions and stimuli that reflect types of real-world linguistic complexity without overfitting to a dataset or task. We present two validated datasets (N=8 and N=16) for classification at the phoneme and word level and by the articulatory properties of phonemes. EEG signals were recorded from 64 channels while subjects listened to and repeated six consonants and five vowels. Individual phonemes were combined in different phonetic environments to produce coarticulated variation in 40 consonant-vowel pairs, 20 real words, and 20 pseudowords. Phoneme pairs and words were presented during a control condition and during transcranial magnetic stimulation (TMS) to assess whether stimulation would augment the EEG signal associated with specific articulatory processes. |
|---|---|
| ISSN: | 2052-4463 |