Text this: Automatic Timbre Transformation Using Enhanced Diffusion Model