Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN

The fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for...

Full description

Saved in:
Bibliographic Details
Main Authors: Kidist Alemayehu, Worku Jifara, Demissie Jobir
Format: Article
Language:English
Published: Wiley 2023-01-01
Series:Journal of Electrical and Computer Engineering
Online Access:http://dx.doi.org/10.1155/2023/6645356
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832559531270340608
author Kidist Alemayehu
Worku Jifara
Demissie Jobir
author_facet Kidist Alemayehu
Worku Jifara
Demissie Jobir
author_sort Kidist Alemayehu
collection DOAJ
description The fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for facial expression synthesis. Most previous works focused on synthesizing frontal and near frontal faces and manual annotation. However, considering only the frontal and near frontal area is not sufficient for many real-world applications, and manual annotation fails when the video is incomplete. AffineGAN, a recent study, uses affine transformation in latent space to automatically infer the expression intensity value; however, this work requires extraction of the feature of the target ground truth image, and the generated sequence of images is also not sufficient. To address these issues, this study is proposed to infer the expression of intensity value automatically without the need to extract the feature of the ground truth images. The local dataset is prepared with frontal and with two different face positions (the left and right sides). Average content distance metrics of the proposed solution along with different experiments have been measured, and the proposed solution has shown improvements. The proposed method has improved the ACD-I of affine GAN from 1.606 ± 0.018 to 1.584 ± 0.00, ACD-C of affine GAN from 1.452 ± 0.008 to 1.430 ± 0.009, and ACD-G of affine GAN from 1.769 ± 0.007 to 1.744 ± 0.01, which is far better than AffineGAN. This work concludes that integrating self-attention into the generator network improves a quality of the generated images sequences. In addition, evenly distributing values based on frame size to assign expression intensity value improves the consistency of image sequences being generated. It also enables the generator to generate different frame size videos while remaining within the range [0, 1].
format Article
id doaj-art-ff1d4dce495a4037b14ade8dab4d667d
institution Kabale University
issn 2090-0155
language English
publishDate 2023-01-01
publisher Wiley
record_format Article
series Journal of Electrical and Computer Engineering
spelling doaj-art-ff1d4dce495a4037b14ade8dab4d667d2025-02-03T01:29:51ZengWileyJournal of Electrical and Computer Engineering2090-01552023-01-01202310.1155/2023/6645356Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GANKidist Alemayehu0Worku Jifara1Demissie Jobir2School of Electrical Engineering and ComputingSchool of Electrical Engineering and ComputingSchool of Electrical Engineering and ComputingThe fundamental challenge in video generation is not only generating high-quality image sequences but also generating consistent frames with no abrupt shifts. With the development of generative adversarial networks (GANs), great progress has been made in image generation tasks which can be used for facial expression synthesis. Most previous works focused on synthesizing frontal and near frontal faces and manual annotation. However, considering only the frontal and near frontal area is not sufficient for many real-world applications, and manual annotation fails when the video is incomplete. AffineGAN, a recent study, uses affine transformation in latent space to automatically infer the expression intensity value; however, this work requires extraction of the feature of the target ground truth image, and the generated sequence of images is also not sufficient. To address these issues, this study is proposed to infer the expression of intensity value automatically without the need to extract the feature of the ground truth images. The local dataset is prepared with frontal and with two different face positions (the left and right sides). Average content distance metrics of the proposed solution along with different experiments have been measured, and the proposed solution has shown improvements. The proposed method has improved the ACD-I of affine GAN from 1.606 ± 0.018 to 1.584 ± 0.00, ACD-C of affine GAN from 1.452 ± 0.008 to 1.430 ± 0.009, and ACD-G of affine GAN from 1.769 ± 0.007 to 1.744 ± 0.01, which is far better than AffineGAN. This work concludes that integrating self-attention into the generator network improves a quality of the generated images sequences. In addition, evenly distributing values based on frame size to assign expression intensity value improves the consistency of image sequences being generated. It also enables the generator to generate different frame size videos while remaining within the range [0, 1].http://dx.doi.org/10.1155/2023/6645356
spellingShingle Kidist Alemayehu
Worku Jifara
Demissie Jobir
Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
Journal of Electrical and Computer Engineering
title Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
title_full Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
title_fullStr Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
title_full_unstemmed Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
title_short Attention-Based Image-to-Video Translation for Synthesizing Facial Expression Using GAN
title_sort attention based image to video translation for synthesizing facial expression using gan
url http://dx.doi.org/10.1155/2023/6645356
work_keys_str_mv AT kidistalemayehu attentionbasedimagetovideotranslationforsynthesizingfacialexpressionusinggan
AT workujifara attentionbasedimagetovideotranslationforsynthesizingfacialexpressionusinggan
AT demissiejobir attentionbasedimagetovideotranslationforsynthesizingfacialexpressionusinggan