Text this: Two-stream spatio-temporal GCN-transformer networks for skeleton-based action recognition