Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset

Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ashwaq Waleed Abdul Ameer, Pedram Salehpour, Mohammad Asadpour
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Lip reading visual speech recognition NASNetMobile transfer learning
Online Access:	https://ieeexplore.ieee.org/document/10699339/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590331452850176
author	Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour
author_facet	Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour
author_sort	Ashwaq Waleed Abdul Ameer
collection	DOAJ
description	Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity.
format	Article
id	doaj-art-6a59dc14dc474fda99a8da8a888d9876
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-6a59dc14dc474fda99a8da8a888d98762025-01-24T00:02:02ZengIEEEIEEE Access2169-35362025-01-0113116231163810.1109/ACCESS.2024.347052110699339Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild DatasetAshwaq Waleed Abdul Ameer0Pedram Salehpour1https://orcid.org/0000-0002-1300-7848Mohammad Asadpour2Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranLip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity.https://ieeexplore.ieee.org/document/10699339/Lip readingvisual speech recognitionNASNetMobiletransfer learning
spellingShingle	Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset IEEE Access Lip reading visual speech recognition NASNetMobile transfer learning
title	Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_full	Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_fullStr	Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_full_unstemmed	Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_short	Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_sort	deep transfer learning for lip reading based on nasnetmobile pretrained model in wild dataset
topic	Lip reading visual speech recognition NASNetMobile transfer learning
url	https://ieeexplore.ieee.org/document/10699339/
work_keys_str_mv	AT ashwaqwaleedabdulameer deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset AT pedramsalehpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset AT mohammadasadpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset

Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset

Similar Items