Text this: UFM: Unified feature matching pre-training with multi-modal image assistants