An improved deep learning approach for speech enhancement
Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cro...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Universidade do Porto
2023-11-01
|
| Series: | U.Porto Journal of Engineering |
| Subjects: | |
| Online Access: | https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1531 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849702757797199872 |
|---|---|
| author | Malek Miled Mohamed Anouar Ben Messaoud |
| author_facet | Malek Miled Mohamed Anouar Ben Messaoud |
| author_sort | Malek Miled |
| collection | DOAJ |
| description | Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.
|
| format | Article |
| id | doaj-art-c506a633d4e743d7aced2ea7b49271c9 |
| institution | DOAJ |
| issn | 2183-6493 |
| language | English |
| publishDate | 2023-11-01 |
| publisher | Universidade do Porto |
| record_format | Article |
| series | U.Porto Journal of Engineering |
| spelling | doaj-art-c506a633d4e743d7aced2ea7b49271c92025-08-20T03:17:32ZengUniversidade do PortoU.Porto Journal of Engineering2183-64932023-11-019510.24840/2183-6493_009-005_001531An improved deep learning approach for speech enhancementMalek Miled0https://orcid.org/0009-0002-4456-3748Mohamed Anouar Ben Messaoud1https://orcid.org/0000-0002-7190-2736Universidade do El Manar, Instituta de EngenhariaNational School of Engineers of TunisSingle-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions. https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1531Speech EnhancementEmpirical Mode DecompositionPrincipal Component AnalysisLearning Model |
| spellingShingle | Malek Miled Mohamed Anouar Ben Messaoud An improved deep learning approach for speech enhancement U.Porto Journal of Engineering Speech Enhancement Empirical Mode Decomposition Principal Component Analysis Learning Model |
| title | An improved deep learning approach for speech enhancement |
| title_full | An improved deep learning approach for speech enhancement |
| title_fullStr | An improved deep learning approach for speech enhancement |
| title_full_unstemmed | An improved deep learning approach for speech enhancement |
| title_short | An improved deep learning approach for speech enhancement |
| title_sort | improved deep learning approach for speech enhancement |
| topic | Speech Enhancement Empirical Mode Decomposition Principal Component Analysis Learning Model |
| url | https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1531 |
| work_keys_str_mv | AT malekmiled animproveddeeplearningapproachforspeechenhancement AT mohamedanouarbenmessaoud animproveddeeplearningapproachforspeechenhancement AT malekmiled improveddeeplearningapproachforspeechenhancement AT mohamedanouarbenmessaoud improveddeeplearningapproachforspeechenhancement |