Text this: Improved crop row detection by employing attention-based vision transformers and convolutional neural networks with integrated depth modeling for precise spatial accuracy