Text this: Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets.