There are more subtitling tools for video content creators to use now than ever. And some of these tools do work for you, which can be great. Automatic captioning tools can be a great start for people trying to optimize their content to reach wider audiences. The struggle that a lot of creators face is how to use these tools in a responsible and effective way. So we thought we would share a few thoughts with you about what you should know about automatic captioning tools.
Automatic Captions: What’s the Catch?
It might sound like a dream come true: a tool that can capture speech and turn it into captions for your video. And it’s true that automatic captioning technology can be used to reduce the workload of video producers, but it is a tool that should be used responsibly. Because no matter what your subject material, your audience deserves more than garbled captions that cause more confusion than connection.
Before we talk about solutions to issues around Automatic Speech Recognition (or ASR), we first have to understand the common problems.
Why do automatic captions get things wrong?
To understand the common pitfalls of using unedited automatic captions, you have to try to think like a machine. A machine is not as good at pattern-matching, language recognition, and error correction as an experienced human captioner would be. Automatic captioning technology takes the audio information in as raw, unconnected data and outputs according to its own algorithm.
Why do automatic captions get names wrong?
Proper names of people, places, and products are a good example of something that automatic captions can get wrong. While a human captioner can identify when a name comes up and either recognize it or research it, automatic captioning technology just provides the best guess that it can based on its own protocols.
If you want an automatic captioning tool to remember your brand name, for example, you have to train the tool to recognize your brand name across different scenarios. Automatic captioning tools might not recognize your brand name from different speakers, in different accents or dialects, or with interfering background audio.
What about multiple speakers?
Recognizing your brand name from one speaker is much easier for automatic captioning tools compared to multiple speakers. But making a speech recognition technology speaker-independent (meaning able to recognize the same words coming from different speakers) is more complex than something like Siri or Alexa which calibrates to one person’s voice. Without this kind of adjustment, automatic captioning might not capture your brand name. It might come out as a different word altogether, lessening the chance for your brand name to make an impact on your audience.
What about accents or dialects in your video?
If you have different speakers in your videos, they probably don’t all speak in an identical way. Pronunciation, emphasis, and word choice can vary from person to person in a way that is easier for human captioners to adjust to compared with automatic captioning tools. There are often unfortunate biases against minority language dialects which show up as more mistakes in captions. This is demoralizing and frustrating for both your audience and for people who speak those dialects.
What if my video has difficult audio?
Overlapping dialogue, construction sounds in the background in your video, or just poor audio quality can cause major mistakes in automated speech recognition output. The output can only be as good as the input, especially without the intuitive connections that human captioners can make on the fly! Even if your video has mostly great audio, there might be certain points where that fails. And with unreviewed automatic captions, that point could be where you lose the attention of your audience.
What are unreviewed automatic captions like for viewers?
Imagine you are walking down a path on a clear day. The sun is shining and you are ambling along in the shade. Suddenly, you trip on an upturned root and the peacefulness is disrupted. You are taken out of the flow of your experience for a moment in a way that is unsurprising. This is what it is like to see a run-of-the-mill mistake in captions created by people. Maybe there is a misspelled word or name, but you can usually tell from context what the original was supposed to be. It’s unfortunate, but it’s only a small detour in your overall journey.
Now imagine that same path but instead of upturned roots you see random objects that don’t fit into the idyllic scene at all. You see jars of mayonnaise hanging from tree branches, traffic cones instead of ferns lining your path, and other out-of-place objects strewn about. This is disconcerting. You are not sure you’re walking in the woods at all anymore. As you walk along, suddenly you trip on an old vinyl record sticking out of the ground after what must have been the most ultimate game of frisbee ever played. This is what it is like to encounter mistakes in unreviewed automatic captions. There is no intuitive leap that you can make from the mistake back to the source, because the mistake was made by a machine that doesn’t have the same ability to make intuitive leaps as humans do. Instead of a misspelled name, Jeffrey instead of Jeoffrey, automatic captions might tell you that the speaker is talking about geography. And while that’s a fun subject, it can derail the attention of your audience who isn’t quite as curious about the life of Geography Bezos.
Ultimately, we can hope that automatic captions create a good starting point for your final captions. But they need to be reviewed and corrected so that your audience knows that you care about their experience. If the mistakes in your captions cause more frustration than flow, your audience will most likely feel like you are wasting their time.
How do I use automatic captioning responsibly?
The first step is to figure out what your goals are for your captions. Are you part of an organization that needs to create legally compliant captions? Are you trying to expand your audience into international markets? Are you trying to reach different language markets in your own country? Do you want your brand to be known to be committed to accessibility and inclusion?
Captions can help your content be more searchable, discoverable, and eye-catching. But after you catch your audience’s eye, will you keep it?
Commit to your content goals
If you decide to create captions for your video content, make sure that it is included as part of your strategy and workflow. Consider your video incomplete until those captions are completed. Have a plan for where your captions come from: human or machine. And, this is also crucial, have a plan for reviewing and editing those captions so that they are ready to wow your audience.
Set your standards
Create a list of high-impact vocabulary that you know will be in your videos: proper names like brands, people, events, organizations, ideologies and more. Make sure that these names are checked in each video for accuracy and consistency.
A good starting point is making sure your subtitles are readable. Make sure they don’t block too much of the screen: 2 or 3 lines maximum per subtitle. And make sure that your audience can read your subtitles at a comfortable speed. Check out our quick guide to creating captions as a starting point and add to it with tips that best suit your content.
Find a workspace that works for you
There are many captioning tools out there that can help you create, review, and edit your subtitles. You can combine whichever tools you find into a workflow. You could also try Amara’s Public Workspace to keep everything together in one place.
Ask for help when you need it
You can choose to review subtitles yourself, delegate to someone on your post-production team, or ask people online to review your subtitles. If you have fans, they might want to help you out! Create a discussion space for people and see if they want to create or review subtitles for your content. If you have patrons, this could be part of early access privileges. If you don’t ask, you’ll never know!
While you can review subtitles yourself, it might be better to ask someone else to review your subtitles before final publication. After all, two heads are better than one when it comes to review.
What is the takeaway?
Captions created by humans or machines can give your content a great SEO boost. Getting to know any new technology includes planning for its pitfalls. With automatic captioning, you have the opportunity to streamline your video production workflow. Creating captions for your audience is a great way to show that you care about nurturing and growing your connection with them. The takeaway here is that your video isn’t really complete without captions and your captions are not complete without review.
Quality captions show that you care
Videos are automatically muted on most social media sites. If your video has unreviewed captions, they might not make sense. Your audience might roll their eyes and scroll by.
Wouldn’t you rather invite them in with careful quality captioning?
Some of your audience need captions to understand videos. Deaf or hard-of-hearing people rely on captions to access and enjoy content. Seeing mistakes in automatic captions can be incredibly frustrating and ultimately drive potential audience members away.
Captioned videos are often easier to understand for non-native speakers. Think about all of the people who are non-native speakers of your language right now. If you create subtitles that don’t match what is being said in the video, people are less likely to keep watching. Wouldn’t you rather share your message with the world?
New technology is great, but we still have the responsibility to respect each other.
We hope that this post has helped you see automatic captions as a starting point. They can give you a boost to get you where you need to go, but you have to take it the rest of the way. Create quality captions to show your audience that you care!
Happy subtitling!