Text this: End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models