Text this: Transformer-based language-independent gender recognition in noisy audio environments