Text this: OryzaGP: rice gene and protein dataset for named-entity recognition