Text this: Detect material volume by fusing heterogeneous camera target detection and depth estimation information