Text this: A Multimodal Information Fusion Model for Robot Action Recognition with Time Series