Text this: A multimodal framework for enhancing E-commerce information management using vision transformers and large language models