MiniCPM-V LLaMA Model for Image Recognition: A Case Study on Satellite Datasets

This study evaluates the performance of the MiniCPM-V model on four distinct satellite image datasets: MAI, RSICD, RSSCN7, and a newly created merged dataset that combines these three. The merged dataset was developed to expand the generalization and variation of data distribution associated with th...

Full description

Saved in:
Bibliographic Details
Main Authors: Kursat Komurcu, Linas Petkevicius
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10908656/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study evaluates the performance of the MiniCPM-V model on four distinct satellite image datasets: MAI, RSICD, RSSCN7, and a newly created merged dataset that combines these three. The merged dataset was developed to expand the generalization and variation of data distribution associated with the labeling and training processes inherent in satellite image analysis. We systematically collected prediction results for each individual dataset and conducted a comparative analysis against results reported in previous studies to benchmark the model's effectiveness. The findings indicate that large language models (LLMs), such as MiniCPM-V, exhibit promising capabilities in the realm of satellite image recognition. On the RSSCN7 dataset, MiniCPM-V achieved an accuracy of 70.57%, while on RSICD it reached 62.19%, on MAI 7.01%, and on the merged dataset 43.49% . Specifically, the model demonstrated mostly high accuracy (more than 80% ) in identifying a majority of object classes across the datasets. Also, we identified, it underperformed in accurately classifying certain object categories and recognizing all objects in multilabeled images, which suggests that while the model is robust overall, there are specific areas where its performance can be enhanced. Despite these limitations, the successful recognition of most objects underscores the potential of LLMs in advancing satellite imagery analysis. These results highlight the significant potential of integrating LLMs into remote sensing applications, offering a foundation for future research aimed at improving classification accuracy and expanding the range of detectable object classes by having caption level textual information.
ISSN:1939-1404
2151-1535