Using the webvtt Ruby gem to display subtitles on the page
Published: 2013-02-19 17:12:00 -0500
Using the webvtt gem, you can display on the page the WebVTT subtitles, captions, or chapters you’ve created for HTML5 video or audio. If you’re already creating WebVTT files for your media, you ought to get the most use out of it as you can. I’ll show you one way you could use them.
Why include subtitles on the page?
There are a variety of reasons why you might want to include the subtitles or captions on a page. Placing the text of your WebVTT files on the page can increase the SEO by including more content to easily index. Other reasons for wanting to put the WebVTT content on the page is to add new features which may work for your use case regardless of what the search engines are currently doing.
Media often has very little metadata to support discoverability of the content. A title and short description alone might not include the details someone is looking for. For instance a name dropped in the middle of a video might not be the main subject of the content, but it could be the important detail to some user. That user might otherwise not find your content through a search engine because the detail was hidden behind the non-indexable content of the media. Including the text from WebVTT on the page allows that text to be crawled and indexed to improve access points to media.
This point about SEO due to my uncertainty about whether crawlers are finding track element content. Are robots currently smart enough to be crawling the content behind track elements? If anyone knows whether robots are crawling track element sources, I’d love to know! If they aren’t currently, I’d have to imagine that it is only a matter of time until they do. Similarly they could be doing voice recognition of audio and video they find on the Web similar to how YouTube can do some basic speech recognition.
Also consider how reading can be faster than watching time-based media. Instead of having to watch some of a video to make decisions about whether to continue on with it, scanning a transcript can give a much quicker idea of whether the media suits the user. It also would allow for searching through the content and jumping to interesting sections. This and other non-SEO use cases could be met with client-side parsing, but there still might be reasons why you’d want to do this processing server-side.
You might also want to cover the second part of SEO by providing users with a rich discovery experience on your own site. You could have an indexing script which can parse the file for indexing in a full-text search server. Every WebVTT file begins with the string “WEBVTT”. You don’t want every video or audio file you index to come up for a search for “WEBVTT” and there are other pieces like the cue timings that you’d want to remove before indexing. Depending on how your WebVTT file has been phrased and what full-text search engine you’re using, you’d probably want to concatenate the text. Otherwise phrase queries might not work as intended, since sentences often must span cues in captioning. (OK, this paragraph isn’t about why you’d want the content on the page, but it is a good reason to have this kind of library for parsing WebVTT available for server-side processing.)
At least one video polyfill includes the ability to auto-translate tracks. I expect this feature would come to other polyfills and browsers in the future. While we’re waiting for that to happen, including the text on the page allows for normal google translate functionality to make the video more accessible.
So now that we have some reasons to need a library for parsing WebVTT files, let’s take a look at how it works.
Before showing how to display caption text on the page, let’s get started by showing how you could use the webvtt gem to read in a file and concatenate all of the text into a single string for indexing. You can install the webvtt gem with:
Here’s a short sample WebVTT file we will use for this example.
Every WebVTT file begins with the line “WEBVTT” followed by a series of cues separated by a blank line. For more on the format used here and the other features of this file format, see the WebVTT specification.
We’ll use pry to show how to read in a file for parsing and manipulation of the cue text.
Since some cues can have markup to allow for styling, you’ll want to strip those out as well:
Then you can use the
clean_full_text variable to index the text in Solr or elasticsearch.
We can also look at the cues in the WebVTT file:
Mainly what we’re interested in are the text, start, and end attributes of each cue.
Adding track to page
Let’s assume that we have a Rails application, and we’re showing a video play page for a single video. Let’s further assume that our controller knows how to get the contents of the WebVTT file and assign it to the variable
@webvtt which is available in our view. We can then create the following template which will convert our WebVTT file into a table of cue text and timestamp links.
@webvtt.cues are iterated over. Each cue text and timestamps are displayed on the page. The important line is the
link_to. The link is given a class of
transcript_jump and a
The timestamp we get in a WebVTT file has hours, minutes, and seconds separated by colons and fractional seconds separated by a full stop. The above method splits the file by colon and then pops off the seconds, minutes, and, if present, hours from the end of the resulting array. Then ActiveSupport duration methods are used to convert the hours and minutes into seconds. (This timestamp to seconds conversion might be something better done in the WebVTT gem.)
currentTime property for the video is set to the contents of the
data-video-jump-time attribute converted to an integer. Then the video is played.
While this works, eventually it’d be nice to augment or replace this with the use of Media Fragments for uses like bookmarking and annotations.
I hope you begin to see some of the possibilities for how you can use the contents of your WebVTT transcripts for more than just displaying it on top of the video. Once you have WebVTT files for your video there is more you can do with it. Let me know in the comments other ideas or how you’re currently using your transcripts.