VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within t...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11086585/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within the embedding space. This is achieved through a combination of text-specific augmentations tailored for the nuances of natural language and a mixup-based regularisation mechanism that promotes smoother representation learning. Unlike conventional contrastive learning methods that rely on large batch sizes and hard negative mining to achieve performance, VIOLET operates exclusively on positive pairs. This eliminates the need for complex sampling strategies and significantly reduces training overhead. A key strength of VIOLET is its ability to perform consistently and robustly even with smaller batch sizes, making it an appealing choice for training on limited computational resources. Empirical results on the Semantic Textual Similarity Benchmark (STS-B) demonstrate that VIOLET achieves correlation scores on par with or exceeding several state-of-the-art sentence embedding models. These findings underscore the method’s effectiveness, scalability, and practical utility in a wide range of downstream natural language understanding tasks, particularly in settings where efficiency and stability are critical. Our implementation is provided at (<uri>https://github.com/mikhail-ram/VIOLET</uri>) |
|---|---|
| ISSN: | 2169-3536 |