Text this: Multi-modal feature fusion with multi-head self-attention for epileptic EEG signals