Text this: Cross-Attention Fusion of Visual and Geometric Features for Large-Vocabulary Arabic Lipreading