Text this: A scalable framework for evaluating multiple language models through cross-domain generation and hallucination detection