Text this: A curated benchmark dataset for molecular identification based on genome skimming