Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ali Erbey, Necaattin Barışçı
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	lip-reading ensemble learning 3DCNN
Online Access:	https://www.mdpi.com/2076-3417/15/2/563
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589301437693952
author	Ali Erbey Necaattin Barışçı
author_facet	Ali Erbey Necaattin Barışçı
author_sort	Ali Erbey
collection	DOAJ
description	Understanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.
format	Article
id	doaj-art-fb500c6d5c344e7d8b87401151166d99
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-fb500c6d5c344e7d8b87401151166d992025-01-24T13:19:50ZengMDPI AGApplied Sciences2076-34172025-01-0115256310.3390/app15020563Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNNAli Erbey0Necaattin Barışçı1Department of Computer Programming, Distance Education Vocational School, Usak University, Usak 64200, TürkiyeDepartment of Computer Engineering, Faculty of Technology, Gazi University, Ankara 06560, TürkiyeUnderstanding others correctly is of great importance for maintaining effective communication. Factors such as hearing difficulties or environmental noise can disrupt this process. Lip reading offers an effective solution to these challenges. With the growing success of deep learning architectures, research on lip reading has gained momentum. The aim of this study is to create a lip reading dataset for Turkish digit recognition and to conduct predictive analyses. The dataset has divided into two subsets: the face region and the lip region. CNN, LSTM, and 3DCNN-based models, including C3D, I3D, and 3DCNN+BiLSTM, were used. While LSTM models are effective in processing temporal data, 3DCNN-based models, which can process both spatial and temporal information, achieved higher accuracy in this study. Experimental results showed that the dataset containing only the lip region performed better; accuracy rates for CNN, LSTM, C3D, and I3D on the lip region were 67.12%, 75.53%, 86.32%, and 93.24%, respectively. The 3DCNN-based models achieved higher accuracy due to their ability to process spatio-temporal data. Furthermore, an additional 1.23% improvement was achieved through ensemble learning, with the best result reaching 94.53% accuracy. Ensemble learning, by combining the strengths of different models, provided a meaningful improvement in overall performance. These results demonstrate that 3DCNN architectures and ensemble learning methods yield high success in addressing the problem of lip reading in the Turkish language. While our study focuses on Turkish digit recognition, the proposed methods have the potential to be successful in other languages or broader lip reading applications.https://www.mdpi.com/2076-3417/15/2/563lip-readingensemble learning3DCNN
spellingShingle	Ali Erbey Necaattin Barışçı Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN Applied Sciences lip-reading ensemble learning 3DCNN
title	Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
title_full	Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
title_fullStr	Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
title_full_unstemmed	Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
title_short	Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN
title_sort	lip reading classification of turkish digits using ensemble learning architecture based on 3dcnn
topic	lip-reading ensemble learning 3DCNN
url	https://www.mdpi.com/2076-3417/15/2/563
work_keys_str_mv	AT alierbey lipreadingclassificationofturkishdigitsusingensemblelearningarchitecturebasedon3dcnn AT necaattinbarıscı lipreadingclassificationofturkishdigitsusingensemblelearningarchitecturebasedon3dcnn

Lip-Reading Classification of Turkish Digits Using Ensemble Learning Architecture Based on 3DCNN

Similar Items