Text this: MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion