Building a Framework for Visual Question Answering Systems

VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of th...

Full description

Saved in:
Bibliographic Details
Main Authors: Maya Abu Hamoud, Wasim Safi
Format: Article
Language:Arabic
Published: Higher Commission for Scientific Research 2025-01-01
Series:Syrian Journal for Science and Innovation
Subjects:
Online Access:https://journal.hcsr.gov.sy/archives/1504
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586181631541248
author Maya Abu Hamoud
Wasim Safi
author_facet Maya Abu Hamoud
Wasim Safi
author_sort Maya Abu Hamoud
collection DOAJ
description VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.
format Article
id doaj-art-24ae0a522a474f2c80cec7f160ba3a66
institution Kabale University
issn 2959-8591
language Arabic
publishDate 2025-01-01
publisher Higher Commission for Scientific Research
record_format Article
series Syrian Journal for Science and Innovation
spelling doaj-art-24ae0a522a474f2c80cec7f160ba3a662025-01-26T08:27:37ZaraHigher Commission for Scientific ResearchSyrian Journal for Science and Innovation2959-85912025-01-013110.5281/zenodo.14723207Building a Framework for Visual Question Answering SystemsMaya Abu Hamoud0Wasim Safi1Syrian Virtual University_ Damascus.Higher Institute for Applied Sciences and Technology -Damascus-Syria.VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.https://journal.hcsr.gov.sy/archives/1504vqavgg19glovemscoco dataset
spellingShingle Maya Abu Hamoud
Wasim Safi
Building a Framework for Visual Question Answering Systems
Syrian Journal for Science and Innovation
vqa
vgg19
glove
mscoco dataset
title Building a Framework for Visual Question Answering Systems
title_full Building a Framework for Visual Question Answering Systems
title_fullStr Building a Framework for Visual Question Answering Systems
title_full_unstemmed Building a Framework for Visual Question Answering Systems
title_short Building a Framework for Visual Question Answering Systems
title_sort building a framework for visual question answering systems
topic vqa
vgg19
glove
mscoco dataset
url https://journal.hcsr.gov.sy/archives/1504
work_keys_str_mv AT mayaabuhamoud buildingaframeworkforvisualquestionansweringsystems
AT wasimsafi buildingaframeworkforvisualquestionansweringsystems