Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset

Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the...

Full description

Saved in:
Bibliographic Details
Main Authors: Ashwaq Waleed Abdul Ameer, Pedram Salehpour, Mohammad Asadpour
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10699339/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590331452850176
author Ashwaq Waleed Abdul Ameer
Pedram Salehpour
Mohammad Asadpour
author_facet Ashwaq Waleed Abdul Ameer
Pedram Salehpour
Mohammad Asadpour
author_sort Ashwaq Waleed Abdul Ameer
collection DOAJ
description Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity.
format Article
id doaj-art-6a59dc14dc474fda99a8da8a888d9876
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-6a59dc14dc474fda99a8da8a888d98762025-01-24T00:02:02ZengIEEEIEEE Access2169-35362025-01-0113116231163810.1109/ACCESS.2024.347052110699339Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild DatasetAshwaq Waleed Abdul Ameer0Pedram Salehpour1https://orcid.org/0000-0002-1300-7848Mohammad Asadpour2Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranLip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity.https://ieeexplore.ieee.org/document/10699339/Lip readingvisual speech recognitionNASNetMobiletransfer learning
spellingShingle Ashwaq Waleed Abdul Ameer
Pedram Salehpour
Mohammad Asadpour
Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
IEEE Access
Lip reading
visual speech recognition
NASNetMobile
transfer learning
title Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_full Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_fullStr Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_full_unstemmed Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_short Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
title_sort deep transfer learning for lip reading based on nasnetmobile pretrained model in wild dataset
topic Lip reading
visual speech recognition
NASNetMobile
transfer learning
url https://ieeexplore.ieee.org/document/10699339/
work_keys_str_mv AT ashwaqwaleedabdulameer deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset
AT pedramsalehpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset
AT mohammadasadpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset