“We’ve received videos of online meetings conducted in Spanish from which we need to pull clips of relevant messages for a video project, but neither our agency’s video editor nor anyone on my team speaks Spanish. Could you transcribe the recordings and then translate them into timecoded English transcripts that our video agency can then use?”
So began a recent request from a client. We responded that we’d be happy to help them and, from the client’s perspective, their project was as good as taken care of. Behind the scenes, however, such a project is not as straight-forward as it might appear because before recordings can be translated, they must first be transcribed so that the translator(s) can use a source transcript.
And, in a world with AI, with each transcription job, language service providers need to determine whether it’s better to have a person or machine transcribe the recording.
The classic route
The classic way to create a transcript is for a person simply to listen to the recording and type what they hear.
Of course, even for the speediest typists, transcribing word-for-word entails pausing throughout the recording to catch up with what’s been said. And, while it can be a relatively quick endeavor to transcribe a clear recording of one person speaking in their first language, imagine transcribing a recording of someone who speaks circuitously with incorrect grammar. Imagine transcribing a conversation between people speaking in their second or third language, who may use incorrect words or dialect from their first language.
Imagine transcribing a conversation between people who repeatedly interrupt each other or frequently … umm … ah … well … like … use filler words. These scenarios all make transcribing a more time-consuming process and can result in a 20-minute recording taking even an experienced transcriber up to two-and-a-half hours to complete.
If the transcript is to serve as the basis for an article or subtitled video, then the person transcribing the audio or video needs to take the time to discern the intended meanings and insert margin notes with suggested correct phrasings, all to ensure the correct messaging will be carried through to the final product(s).
The machine route
When opting to use a machine to transcribe, it’s almost always advisable to have a person review the transcript to correct the machine’s errors. Machines can’t (yet!) discern nuances and process myriad context clues and so machines seldom figure out what’s really being said by speakers who mumble, aren’t talking in their first language, or stumble with filler words and incorrect grammar or vocabulary.
(Though, thanks to AI transcription tools’ inability to understand complicated spoken language, those who conduct post-machine reviews get to enjoy fixing amusing misheard phrases such as a CEO referring to “devil’s inventor” instead of “winter in Davos.”)
Used in the proper circumstances, machine-generated transcriptions can save clients time and money. As with all areas where AI is available, it’s important to be thoughtful about how to employ AI. When deciding whether to use a machine to generate a transcript, it’s essential to consider the client’s data protection requirements.
In the case of confidential content that’s intended for limited audience — and this can range from a CEO’s video message to the board or a recording of a meeting in which market-moving topics are discussed — one should be sure to find out before they begin work how and where the AI’s data processing and, if necessary, data storage occurs. Since most free tools use public, unsecured clouds, they are often not GDPR compliant and thus not suitable to use when transcribing confidential content. Of course, when we use AI for transcription and other tasks, we use paid or proprietary tools that are GDPR compliant.
How to choose?
At Leinhäuser, when we’re determining whether to have a person or machine transcribe content, we consider the quality of the recording and the speakers, the ultimate purpose of the transcript, and the client’s preferences.
For example, if a video features several people engaging in a dynamic conversation and speaking in dialect or with accents, it’s faster to opt for a human to do the transcription as the machine’s transcription of such a recording will most likely require time-consuming clean-up.
On the other hand, if one or more speakers in a recording used a teleprompter and/or spoke correctly and clearly, and ideally in their first language, then using a combination of a machine translation followed by a post-editing or post-processing review by a human, is often the most efficient option.
In the case of our client’s online meetings held in Spanish, with the client’s approval, we opted for the combined route of a machine transcription followed by a human post-editing.
There is no one right way to transcribe, but by evaluating a recording’s quality and knowing how the transcript will be used, it’s possible to determine the best way to go about a transcription. Tell us about your project and our audio-video team (audiovideo@leinhaeuser.com) will be happy to help you find the optimal path to take.