Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10699339/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590331452850176 |
---|---|
author | Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour |
author_facet | Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour |
author_sort | Ashwaq Waleed Abdul Ameer |
collection | DOAJ |
description | Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity. |
format | Article |
id | doaj-art-6a59dc14dc474fda99a8da8a888d9876 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-6a59dc14dc474fda99a8da8a888d98762025-01-24T00:02:02ZengIEEEIEEE Access2169-35362025-01-0113116231163810.1109/ACCESS.2024.347052110699339Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild DatasetAshwaq Waleed Abdul Ameer0Pedram Salehpour1https://orcid.org/0000-0002-1300-7848Mohammad Asadpour2Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranDepartment of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranLip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity.https://ieeexplore.ieee.org/document/10699339/Lip readingvisual speech recognitionNASNetMobiletransfer learning |
spellingShingle | Ashwaq Waleed Abdul Ameer Pedram Salehpour Mohammad Asadpour Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset IEEE Access Lip reading visual speech recognition NASNetMobile transfer learning |
title | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
title_full | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
title_fullStr | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
title_full_unstemmed | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
title_short | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
title_sort | deep transfer learning for lip reading based on nasnetmobile pretrained model in wild dataset |
topic | Lip reading visual speech recognition NASNetMobile transfer learning |
url | https://ieeexplore.ieee.org/document/10699339/ |
work_keys_str_mv | AT ashwaqwaleedabdulameer deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset AT pedramsalehpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset AT mohammadasadpour deeptransferlearningforlipreadingbasedonnasnetmobilepretrainedmodelinwilddataset |