Text this: Multi-proteins similarity-based sampling to select representative genomes from large databases