Filename extension | .vtt |
---|---|
Internet media type | text/vtt |
Developed by | World Wide Web Consortium (W3C) |
Initial release | 10 August 2010 [1] |
Latest release | 4 April 2019 |
Type of format | Timed text |
Extended from | SRT |
Standard | W3C WebVTT |
Open format? | Yes |
Free format? | Yes |
WebVTT (Web Video Text Tracks) is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 <track>
element.
The early drafts of its specification were written by the WHATWG in 2010 after discussions about what caption format should be supported by HTML5—the main options being the relatively mature, XML-based Timed Text Markup Language (TTML) or an entirely new but more lightweight standard based on the widely used SubRip format. The final decision was for the new standard, initially called WebSRT (Web Subtitle Resource Tracks). [2] It shared the .srt
file extension and was broadly based on the SubRip format, though not fully compatible with it. [3] The prospective format was later renamed WebVTT. [4] [5] In the January 13, 2011, version of the HTML5 Draft Report, the <track>
element was introduced and the specification was updated to document WebVTT cue text rendering rules. [6] The WebVTT specification is a W3C Candidate Recommendation, and the basic features are supported by all major browsers.
Browser | Cue Text Tags | Cue Positioning | CSS Styling |
---|---|---|---|
Chrome | 35+ | ||
Android stock browser | 5.0+ | ||
Opera | 22+ | ||
Safari | 7+ (iOS: 8+) | ||
Firefox | 31+ (Android: 32+) | — | |
Microsoft Edge | 12+ | — | |
Internet Explorer | 10+ | — |
Firefox implemented WebVTT in its nightly builds (Firefox 24), but initially it was not enabled by default. The feature had to be enabled in Firefox by going to the "about:config" page and setting the value of "media.webvtt.enabled" to true. [10] YouTube began supporting WebVTT in April, 2013. [11] As of July 24, 2014, Mozilla has enabled WebVTT on Firefox by default. [12]
Subtitles in a .vtt file show online, but not when stored on a local drive.
A sample file from the W3C captioning Roger Bingham interviewing Neil deGrasse Tyson: [13]
WEBVTT 00:11.000 --> 00:13.000 <v Roger Bingham>We are in New York City 00:13.000 --> 00:16.000 <v Roger Bingham>We're actually at the Lucern Hotel, just down the street 00:16.000 --> 00:18.000 <v Roger Bingham>from the American Museum of Natural History 00:18.000 --> 00:20.000 <v Roger Bingham>And with me is Neil deGrasse Tyson 00:20.000 --> 00:22.000 <v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium 00:22.000 --> 00:24.000 <v Roger Bingham>at the AMNH. 00:24.000 --> 00:26.000 <v Roger Bingham>Thank you for walking down here. 00:27.000 --> 00:30.000 <v Roger Bingham>And I want to do a follow-up on the last conversation we did. 00:30.000 --> 00:31.500 align:right size:50% <v Roger Bingham>When we e-mailed— 00:30.500 --> 00:32.500 align:left size:50% <v Neil deGrasse Tyson>Didn't we talk about enough in that conversation? 00:32.000 --> 00:35.500 align:right size:50% <v Roger Bingham>No! No no no no; 'cos 'cos obviously 'cos 00:32.500 --> 00:33.500 align:left size:50% <v Neil deGrasse Tyson><i>Laughs</i> 00:35.500 --> 00:38.000 <v Roger Bingham>You know I'm so excited my glasses are falling off here.
In June 2013, an example was added to the specification that included a new "region" setting. [14] This feature is supported since Firefox 59 and Safari 14.1 (14.5 on iOS) but not in any other browser. [15]
The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.
An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.
A favicon, also known as a shortcut icon, website icon, tab icon, URL icon, or bookmark icon, is a file containing one or more small icons associated with a particular website or web page. A web designer can create such an icon and upload it to a website by several means, and graphical web browsers will then make use of it. Browsers that provide favicon support typically display a page's favicon in the browser's address bar and next to the page's name in a list of bookmarks. Browsers that support a tabbed document interface typically show a page's favicon next to the page's title on the tab, and site-specific browsers use the favicon as a desktop icon.
In web development, "tag soup" is a pejorative for HTML written for a web page that is syntactically or structurally incorrect. Web browsers have historically treated structural or syntax errors in HTML leniently, so there has been little pressure for web developers to follow published standards. Therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
SubRip is a free software program for Microsoft Windows which extracts subtitles and their timings from various video formats to a text file. It is released under the GNU GPL. Its subtitle format's file extension is .srt
and is widely supported. Each .srt
file is a human-readable file format where the subtitles are stored sequentially along with the timing information. Most subtitles distributed on the Internet are in this format.
The canvas element is part of HTML5 and allows for dynamic, scriptable rendering of 2D shapes and bitmap images. It is a low level, procedural model that updates a bitmap. HTML5 Canvas also helps in making 2D games.
Timed text is the presentation of text media in synchrony with other media, such as audio and video.
HTML5 is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, leading Web browser vendors in 2004.
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
The HTML5 draft specification adds video
and audio
elements for embedding video and audio in HTML documents. The specification had formerly recommended support for playback of Theora video and Vorbis audio encapsulated in Ogg containers to provide for easier distribution of audio and video over the internet by using open standards, but the recommendation was soon after dropped.
A web worker, as defined by the World Wide Web Consortium (W3C) and the Web Hypertext Application Technology Working Group (WHATWG), is a JavaScript script executed from an HTML page that runs in the background, independently of scripts that may also have been executed from the same HTML page. Web workers are often able to utilize multi-core CPUs more effectively.
Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to Microdata because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.
HTML video is a subject of the HTML specification as the standard way of playing video via the web. Introduced in HTML5, it is designed to partially replace the object element and the previous de facto standard of using the proprietary Adobe Flash plugin, though early adoption was hampered by lack of agreement as to which video coding formats and audio coding formats should be supported in web browsers. As of 2020, HTML video is the only widely supported video playback technology in modern browsers, with the Flash plugin being phased out.
HTML audio is a subject of the HTML specification, incorporating audio input, playback, and synthesis, as well as speech to text, all in the browser.
A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.
Encrypted Media Extensions (EME) is a W3C specification for providing a communication channel between web browsers and the Content Decryption Module (CDM) software which implements digital rights management (DRM). This allows the use of HTML video to play back DRM-wrapped content such as streaming video services without the use of heavy third-party media plugins like Adobe Flash or Microsoft Silverlight. The use of a third-party key management system may be required, depending on whether the publisher chooses to scramble the keys.
Timed Text Markup Language (TTML), previously referred to as Distribution Format Exchange Profile (DFXP), is an XML-based W3C standard for timed text in online media and was designed to be used for the purpose of authoring, transcoding or exchanging timed text information presently in use primarily for subtitling and captioning functions. TTML2, the second major revision of the language, was finalized on November 8, 2018. It has been adopted widely in the television industry, including by Society of Motion Picture and Television Engineers (SMPTE), European Broadcasting Union (EBU), ATSC, DVB, HbbTV and MPEG CMAF and several profiles and extensions for the language exist nowadays.