Text this: Episodic Reasoning for Vision-Based Human Action Recognition