Building a Framework for Visual Question Answering Systems
VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of th...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | Arabic |
Published: |
Higher Commission for Scientific Research
2025-01-01
|
Series: | Syrian Journal for Science and Innovation |
Subjects: | |
Online Access: | https://journal.hcsr.gov.sy/archives/1504 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586181631541248 |
---|---|
author | Maya Abu Hamoud Wasim Safi |
author_facet | Maya Abu Hamoud Wasim Safi |
author_sort | Maya Abu Hamoud |
collection | DOAJ |
description | VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset. |
format | Article |
id | doaj-art-24ae0a522a474f2c80cec7f160ba3a66 |
institution | Kabale University |
issn | 2959-8591 |
language | Arabic |
publishDate | 2025-01-01 |
publisher | Higher Commission for Scientific Research |
record_format | Article |
series | Syrian Journal for Science and Innovation |
spelling | doaj-art-24ae0a522a474f2c80cec7f160ba3a662025-01-26T08:27:37ZaraHigher Commission for Scientific ResearchSyrian Journal for Science and Innovation2959-85912025-01-013110.5281/zenodo.14723207Building a Framework for Visual Question Answering SystemsMaya Abu Hamoud0Wasim Safi1Syrian Virtual University_ Damascus.Higher Institute for Applied Sciences and Technology -Damascus-Syria.VQA (Visual Question Answering) systems are among the latest advancements in the fields of artificial intelligence and deep learning. They integrate image processing with natural language understanding to enable intelligent systems to answer questions related to image content. The significance of these systems lies in their ability to interpret and analyze images in a manner similar to human comprehension, making them applicable to a wide range of critical fields. VQA systems represent a crucial step towards the development of advanced AI systems that bridge the gap between computer vision and human language understanding, fostering a deeper and more integrated interaction with the real world. This study aimed to thoroughly explore and analyze the methods and techniques used in visual question answering. The focus was on developing an advanced model capable of analyzing and understanding images while responding to related queries. In this paper, we developed a VQA system utilizing artificial intelligence and deep learning techniques. We employed the VGG19 model to extract image features, while questions and answers were encoded using GloVe and Label Encoding techniques. The model was trained using the MSCOCO dataset, which contains a variety of images and related questions. The model's performance was enhanced through multiple experiments to fine-tune the training parameters. The model achieved significant accuracy compared to previous research, with an F1 Score of 44.23% for training accuracy and 42.97% for validation accuracy. The results demonstrated a slight improvement over other models that also utilized VGG19 on the same dataset. Additionally, a web platform was developed to test the system, enabling users to evaluate answer accuracy and use the model on new images or those from the dataset.https://journal.hcsr.gov.sy/archives/1504vqavgg19glovemscoco dataset |
spellingShingle | Maya Abu Hamoud Wasim Safi Building a Framework for Visual Question Answering Systems Syrian Journal for Science and Innovation vqa vgg19 glove mscoco dataset |
title | Building a Framework for Visual Question Answering Systems |
title_full | Building a Framework for Visual Question Answering Systems |
title_fullStr | Building a Framework for Visual Question Answering Systems |
title_full_unstemmed | Building a Framework for Visual Question Answering Systems |
title_short | Building a Framework for Visual Question Answering Systems |
title_sort | building a framework for visual question answering systems |
topic | vqa vgg19 glove mscoco dataset |
url | https://journal.hcsr.gov.sy/archives/1504 |
work_keys_str_mv | AT mayaabuhamoud buildingaframeworkforvisualquestionansweringsystems AT wasimsafi buildingaframeworkforvisualquestionansweringsystems |