Text this: Metadata Enriched Multi-Instance Contrastive Learning for High-Quality Facial Skin Visual Representations