). Only use the training dataset. Incase you use the validation set that would be pretty obvious to us because Whisper overfits very easily. You can use Google Colab to finetune Whisper.
Once you have finetuned Whisper, you should deploy and create an API endpoint which can be used to transcribe audio files in mp3 format.