Text this: Spatial-temporal attention for video-based assessment of intraoperative surgical skill