Text this: InterAcT: A generic keypoints-based lightweight transformer model for recognition of human solo actions and interactions in aerial videos.