Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset
Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10699339/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Lip reading is extensively used in visual speech recognition to classify and recognize words by analyzing mouth movements in silent videos. Transfer learning models are often employed to improve performance in this area. However, the effectiveness of these models can be significantly affected by the quality of the data, especially when dealing with unstructured wild data. This includes scenarios where subjects are not directly facing the camera, words are not clearly articulated, or image cropping techniques are not used. In this study, we present a novel framework for visual speech recognition utilizing the NASNetMobile architecture. We evaluate the performance of this model on the GLips 2022 wild dataset, which is known for its unstructured and challenging nature. This lightweight model, with its low parameter count, is particularly well-suited for this task. The proposed framework involves a process that extracts features from video frames in a time sequence, employing methods such as Convolutional Neural Networks (CNN), CNN-Gated Recurrent Units (CNN-GRU), Temporal CNN, and Temporal PoinWise. We empirically demonstrate that NASNetMobile outperforms VGG19, DenseNet121, and MobileNetV3Large, which are used in existing models. The performance of NASNetMobile is evaluated using several key metrics. The model contains 4,680,739 parameters. It achieves an accuracy of 0.4840, an F1 score of 0.4850, and a weighted F1 score of 0.4850. Our findings show that NASNetMobile is more effective, increasing accuracy and F1 scores while reducing model complexity. |
---|---|
ISSN: | 2169-3536 |