Text this: A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement