Text this: Learning High-Order Features for Fine-Grained Visual Categorization with Causal Inference