Text this: Predicting Pedestrian Counts for Crossing Scenario Based on Fused Infrared-Visual Videos