Text this: Visual Commonsense Causal Reasoning From a Still Image