Using Videopipe to generate automatic subtitles for your videos.
🧰 What can I use it for?
Videopipe is currently only capable of generating 🇳🇱 Dutch subtitles, however the audio of the video can be multiple languages:
🇬🇧 English
🇳🇱 Dutch
The subtitle will be delivered back as a both a .stl and a .srt file.
The .stl or .srt file can be directly imported in AVID inside a SubCap media effect, as you are probably used to.
The .srt file is a more human readable file format and could also be used, if necessary.
Videopipe assumes that the entire video is in the same language. If multiple languages are present, it will try to transcribe it from the language that is most used in the video.
🔧 How to use it?
You can follow the following 3 steps to generate subtitles. Note that the process is slightly different for the RTL Nieuws Online Video Editorial team and the Promo Editors.
1️⃣ Export timeline: Use AVID media composer to send the video using one of the following two ‘send to playback profiles’, depending on which team you are in. Make sure to remember the filename of the video.
Please make sure that there are no spaces in the filename.
Example of some correct filenames are auto_sub_doc.mp4, auto_sub_doc.mp4. Filenames likeauto sub doc.mp4 will not work.
2️⃣ Wait for processing: Grab a ☕ and wait 8 to 20 minutesfor the video to be processed. Longer video’s take longer to process.
3️⃣ Download subtitles: You will find the subtitles in a file folder you can access, where filename is the filename of the video you sent in step 1️⃣. The folder location is different depending on the team you are in.
4️⃣ Correct subtitles manually: The generated subtitles will not be completely without mistakes. The general idea is to correct them manually to make sure they are 100% correct.
5️⃣ Send corrected subtitles back: If you have manually corrected the subtitles, it would be useful for the AI creating the subtitles to see what mistakes it has made, so it can learn and become better. If you’re working in AVID, to send back the corrected subtitles you can follow the following steps:
Go to Effect Editor and go to your SubCap effect.
Click Caption Files -> Export Caption Data...
Navigate to the folder mentioned on the right, “CORRECTED_STL_SUBS”.
Export the subtitle as a .stl with the same filename you used to send the original you sent at step 1️⃣.
Things can go wrong, even machines make mistakes. Did you wait 20 minutes and are the subtitles still not in your teams’ corresponding dropbox location? Here’s what you can do:
what the status is on the video, make sure to mention the filename in the message so we know what to look for. A Data Scientist will try to help you out as soon as possible.
🤖 How does it work?
For the curious reader. Note that in reality it is quite a complex process, but in short; the subtitles are made using a sequence of (open source) AI models:
1️⃣ Language identification: Identifying the (main) language spoken in the video.
2️⃣ Speech recognition: Identifying what has been said in the video, this will result in a collection of sentences and their timing. To do this we currently leverage
3️⃣ Translation: If the spoken language was 🇬🇧 English, the sentences are translated to 🇳🇱 Dutch.
4️⃣ Formatting: The subtitles need to be formatted to adhere to requirements, such as number of character per line, how to write out numbers, or where the linebreaks are.
5️⃣ Conversion: Converting the subtitles to the required .srt and .stl format.
🔮 Future work?
We are always working on improving the automatic subtitles, so that is an ongoing topic for us. Currently we are trying to implement a feedback loop to get back the corrected subtitles, to be able to improve our system.
Is there something you would want to be possible? New languages? Identifying the speakers? Let us know and we can see if we can shift our attention to that topic by asking in the slack channel