Enhancing Speech-to-Text Performance Using Few Shots Learning and Usecase in Pandemic
Sajid Ali Khan, Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan.
Muhammed Aasem, Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan.
Javed Iqbal, Department of Computer Science, University of Engineering and Technology, Taxila, Pakistan.
Hina Shaukat, Department of General Surgery, Combined Military Hospital (CMH), Lahore, Pakistan.
Corresponding Author:
Sajid Ali Khan (email@sajidalikhan.com)
Abstract:
This paper proposes a deep learning model that uses few-shot learning to enhance speech-to-text performance, achieving enhanced accuracy on medical terminologies. Our work contributes towards improving speech-to-text transcription in specialized domains, making it valuable for real-world applications. We studied the challenges of accurately transcribing domain-specific language, such as medical jargon, with speech-to-text capabilities. Current solutions have limitations due to gaps in their built-in models. To address this, we proposed a customized training approach that leverages domain-specific vocabulary. Our experiments achieved a remarkable 95% accuracy rate on IBM Watson STT, outperforming AWS and Microsoft Azure.
Keywords:
Speech to Text; Deep Learning; Recurrent Neural Network; Few Shots Learning; IBM Watson