MHTML

Last updated
MHTML
Filename extension
.mht, .mhtml
Internet media type
multipart/related application/x-mimearchive
Type of format Markup language
Extended from HTML
Standard RFC 2557 (proposed 1999)

MHTML, an initialism of "MIME encapsulation of aggregate HTML documents", is a Web archive file format used to combine, in a single computer file, the HTML code and its companion resources (such as images) that are represented by external hyperlinks in the web page's HTML code. The content of an MHTML file is encoded using the same techniques that were first developed for HTML email messages, using the MIME content type multipart/related. [1] MHTML files use an .mhtml or .mht filename extension.

Contents

The first part of the file is an e-mail header. The second part is normally HTML code. Subsequent parts are additional resources identified by their original uniform resource locators (URLs) and encoded in base64 binary-to-text encoding. MHTML was proposed as an open standard, then circulated in a revised edition in 1999 as RFC 2557.

The .mhtml (Web archive) and .eml (email) filename extensions are interchangeable: either filename extension can be changed from one to the other. An .eml message can be sent by e-mail, and it can be displayed by an email client. An email message can be saved using a .mhtml or .mht filename extension and then opened for display in a web browser or for editing other programs, including word processors and text editors.

Layout

The header of an MHTML file contains metadata such as a date and time stamp, page title, the source URL, and a unique randomized boundary string for separating resources contained within the file. The boundary string is defined at the beginning and used throughout the file.

From:<SavedbyBlink> Snapshot-Content-Location:https://en.wikipedia.org/wiki/Smartphone Subject:Smartphone-Wikipedia Date:Sat, 24 Sep 2022 00:34:32 -0000MIME-Version:1.0 Content-Type:multipart/related; type="text/html"; boundary="----MultipartBoundary--GsIBda0vjy2AKIAIliwl7JMwezXDRjDAsLje9khd5l----"

Then, the page resources are contained sequentially, starting with the page's rendered HTML source code. Each resource has its own metadata header which specifies its MIME type and the original location.

------MultipartBoundary--GsIBda0vjy2AKIAIliwl7JMwezXDRjDAsLje9khd5l----Content-Type:text/htmlContent-ID:<frame-D968CEC8BB7E60A1859261A8CA5DFB4D@mhtml.blink>Content-Transfer-Encoding:binaryContent-Location:https://en.wikipedia.org/wiki/Smartphone  <!DOCTYPE html>

The MHTML file ends with a boundary string that is not followed by any data. [2]

Browser support

Some browsers support the MHTML format, either directly or through third-party extensions, but the process for saving a web page along with its resources as an MHTML file is not standardized. Due to this, a web page saved as an MHTML file using one browser may render differently on another.

Internet Explorer

As of version 5.0, IE was the first browser to support reading and saving web pages and external resources to a single MHTML file.

Microsoft Edge

As of switching to the Chromium source code, Edge supports saving as MHTML.

Opera

Support for saving web pages as MHTML files was made available in the Opera 9.0 web browser. [3] From Opera 9.50 through the rest of the Presto-based Opera product line (currently at Opera 12.16 as of 19 July 2013), the default format for saving pages is MHTML. The initial release of the new Webkit/Blink-based Opera (Opera 15) did not support MHTML, but subsequent releases (Opera 16 onwards) do.

MHTML can be enabled by typing "opera://flags#save-page-as-mhtml" at the address bar.

Google Chrome

Creating MHTML files in Google Chrome is enabled by default in version 86.

Yandex Browser

Creating MHTML (multipart/related) files in Yandex Browser is enabled by default in version 22.7.4.960 (July 2022).

Vivaldi

Similarly to Google Chrome, the Chromium-based Vivaldi browser can save webpages as MHTML files since the 2.3 release. [4]

It supports both reading and writing MHTML files by toggling the "vivaldi://flags/#save-page-as-mhtml" option.

Firefox

Mozilla Firefox does not support MHTML. [5] Until the advent of version 57 ("Firefox Quantum"), MHT files could be read and written by installing a browser extension, such as Mozilla Archive Format or UnMHT.

Safari

From version 3.1.1 onwards, Apple Inc.'s Safari web browser does not natively support the MHTML format. Instead, Safari supports the webarchive format, and the macOS version includes a print-to-PDF feature.

As with most other modern web browsers, support for MHTML files can be added to Safari via various third-party extensions.

Konqueror

As of version 3.5.7, KDE's Konqueror web browser does not support MHTML files. An extension project, mhtconv, can be used to allow saving and viewing of MHTML files.

ACCESS NetFront

NetFront 3.4 (on devices such as the Sony Ericsson K850) can view and save MHTML files.

Pale Moon

Pale Moon requires an extension to be installed to read and write MHT files. One extension is freely available, MozArchiver, a fork of Mozilla Archive Format extension.

GNOME Web

GNOME Web added support for read and save web pages in MHTML since version 3.14.1 released in September 2014. [6]

MHT viewers

There are commercial software products for viewing MHTML files and converting them to other formats, such as PDF and ePub. Some HTML editor programs can view and edit MHTML files.

MIME type

MIME type for MHTML is not well agreed upon. Used MIME types include:

Other apps

Problem Steps Recorder

Problem Steps Recorder for Windows can save its output to MHT format.

Save to Google Drive extension

The "Save to Google Drive" extension for Google Chrome can save as MHTML as one of its outputs.

Microsoft OneNote

Microsoft OneNote, starting with OneNote 2010, emails individual pages as .mht files.

Evernote

Evernote for Windows can export notes as MHT format, as an alternative to HTML or its own native .enex format.

Exploits

In May 2015, a researcher noted that attackers could build malicious documents by creating an MHT file, appending an MSO object at the end (MSO is a file format used by the Microsoft Outlook e-mail application), and renaming the resulting file with a .doc extension. [7] The delivery method would be by spam emails. [8]

In April 2019, a security researcher published details about an XML external entity (XXE) vulnerability that could be exploited when a user opens an MHT file. Since the Windows operating system is set to automatically open all MHT files, by default, in Internet Explorer, the exploit could be triggered when a user double-clicked on a file that they received via email, instant messaging, or another vector, including a different browser. [9]

Alternatives

data URI scheme

The data URI scheme offers an alternative for including separate elements such as images, style-sheets and scripts in-line when serving an HTML request or saving an HTML resource for offline use. Like the embedded content within MHTML, data URIs use Base64 encoding of the external resources (which may be binary or text) to embed them in-line within the HTML markup. HTML pages saved with external elements embedded using the data URI scheme are standard web pages, and can be opened by any modern browser, including browsers not supporting MHTML such as Mozilla Firefox. [10] Unlike MHTML, saving web pages with their external resources embedded using data URIs requires a third-party extension to be installed in the browser. [11]

Mozilla Archive Format

The Mozilla Archive Format (MAFF) is a legacy Web archive file format that was supported by Firefox from 2004 to 2018 through an add-on. [12] Unlike both MHTML and data URIs, MAFF uses a ZIP container to preserve both the HTML file and its external elements. In October 2017 the add-on developer announced the format would no longer be supported in future versions of Firefox. [13]

See also

Related Research Articles

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

<span class="mw-page-title-main">RSS</span> Family of web feed formats

RSS is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many different websites in a single news aggregator, which constantly monitor sites for new content, removing the need for the user to manually check them. News aggregators can be built into a browser, installed on a desktop computer, or installed on a mobile device.

<span class="mw-page-title-main">Mozilla Thunderbird</span> Free and open-source email client by Mozilla

Mozilla Thunderbird is free and open-source email client software which also functions as a full personal information manager with a calendar and contactbook, as well as an RSS feed reader, chat client (IRC/XMPP/Matrix), and news client. Available cross-platform, it is operated by the Mozilla Foundation's subsidiary MZLA Technologies Corporation. Thunderbird is an independent, community-driven project that is managed and overseen by the Thunderbird Council, which is elected by the Thunderbird Community. The project strategy was originally modeled after that of Mozilla's Firefox web browser and is an interface built on top of that web browser.

<span class="mw-page-title-main">Favicon</span> Icon associated with a particular web site

A favicon, also known as a shortcut icon, website icon, tab icon, URL icon, or bookmark icon, is a file containing one or more small icons associated with a particular website or web page. A web designer can create such an icon and upload it to a website by several means, and graphical web browsers will then make use of it. Browsers that provide favicon support typically display a page's favicon in the browser's address bar and next to the page's name in a list of bookmarks. Browsers that support a tabbed document interface typically show a page's favicon next to the page's title on the tab, and site-specific browsers use the favicon as a desktop icon.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

This is a comparison of both historical and current web browsers based on developer, engine, platform(s), releases, license, and cost.

about is an internal URI scheme implemented in various Web browsers to reveal internal state and built-in functions. It is an IANA officially registered scheme, and is standardized.

Mozilla Firefox has features that allow it to be distinguished from other web browsers, such as Chrome and Internet Explorer.

OpenSearch is a collection of technologies that allow the publishing of search results in a format suitable for syndication and aggregation. Introduced in 2005, it is a way for websites and search engines to publish search results in a standard and accessible format.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

<span class="mw-page-title-main">TiddlyWiki</span> Wiki software

TiddlyWiki is a personal wiki and a non-linear notebook for organising and sharing complex information. It is an open-source single page application wiki in the form of a single HTML file that includes CSS, JavaScript, embedded files such as images, and the text content. It is designed to be easy to customize and re-shape depending on application. It facilitates re-use of content by dividing it into small pieces called Tiddlers.

HTML email is the use of a subset of HTML to provide formatting and semantic markup capabilities in email that are not available with plain text: Text can be linked without displaying a URL, or breaking long URLs into multiple pieces. Text is wrapped to fit the width of the viewing window, rather than uniformly breaking each line at 78 characters. It allows in-line inclusion of images, tables, as well as diagrams or mathematical formulae as images, which are otherwise difficult to convey.

In the context of the World Wide Web, a bookmark is a Uniform Resource Identifier (URI) that is stored for later retrieval in any of various storage formats. All modern web browsers include bookmark features. Bookmarks are called favorites or Internet shortcuts in Internet Explorer and Microsoft Edge, and by virtue of that browser's large market share, these terms have been synonymous with bookmark since the First Browser War. Bookmarks are normally accessed through a menu in the user's web browser, and folders are commonly used for organization. In addition to bookmarking methods within most browsers, many external applications offer bookmarks management.

A media type is a two-part identifier for file formats and format contents transmitted on the Internet. Their purpose is somewhat similar to file extensions in that they identify the intended data format. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications. Media types were originally defined in Request for Comments RFC 2045 (MIME) Part One: Format of Internet Message Bodies in November 1996 as a part of the MIME specification, for denoting type of email message content and attachments; hence the original name, MIME type. Media types are also used by other internet protocols such as HTTP and document file formats such as HTML, for similar purposes.

webarchive is a Web archive file format available on macOS and Windows for saving and reviewing complete web pages using the Safari web browser. The webarchive format differs from a standalone HTML file because it also saves linked files such as images, CSS, and JavaScript. The webarchive format is a concatenation of source files with filenames saved in the binary plist format using NSKeyedArchiver. Support for webarchive documents was added in Safari 4 Beta on Windows and is included in subsequent versions. Safari in iOS 13 has support for web archive files. Previously there was a third party iOS app called Web Archive Viewer that provided this functionality.

In HTML, a file-select control is a component of a web form with which a user can select a local file. When the form is submitted, the file is uploaded to the web server. There, when the file arrives, some action usually takes place, such as saving the file on the web server. However, the particular action that takes place is determined by the server-side script to which the form is submitted.

The Web Open Font Format (WOFF) is a font format for use in web pages. WOFF files are OpenType or TrueType fonts, with format-specific compression applied and additional XML metadata added. The two primary goals are first to distinguish font files intended for use as web fonts from fonts files intended for use in desktop applications via local installation, and second to reduce web font latency when fonts are transferred from a server to a client over a network connection.

The Mozilla Archive Format (MAFF) is a legacy Web archive file format that was provided by Firefox through an extension, used to store one or more web pages with their associated audio, video, and other related web resources to a single file. Unlike MHTML, which uses MIME encoding within a single HTML file, MAFF compresses the page into a ZIP container file.

The HTML5 specification introduced the video element for the purpose of playing videos, partially replacing the object element. HTML5 video is intended by its creators to become the new standard way to show video on the web, instead of the previous de facto standard of using the proprietary Adobe Flash plugin, though early adoption was hampered by lack of agreement as to which video coding formats and audio coding formats should be supported in web browsers. As of 2020, HTML5 video is the only widely supported video playback technology in modern browsers, with the Flash plugin being phased out.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

References

  1. Holden, Amanda. "Difference of HTML & MHTML". Archived from the original on 17 November 2017. Retrieved 17 November 2017.
  2. "2. The MHTML File Format - Hunchly Knowledge Base". support.hunch.ly. October 17, 2018. Retrieved 24 September 2022.
  3. Santambrogio, Claudio (10 March 2006). "…and one more weekly!". Opera Software. Archived from the original on 15 January 2010. Retrieved 2009-05-15.
  4. février 6, Publié sur; Tetzchner, 2019-Par Jon von (2019-02-06). "Vivaldi Update | Auto-Stacking Tabs". Vivaldi (in French). Retrieved 2019-05-16.{{cite web}}: CS1 maint: numeric names: authors list (link)
  5. "Bug 40873 - Save as rfc 2557 MHTML; complete webpage in one file".
  6. "NEWS · master · GNOME / Epiphany". 28 July 2023.
  7. Kovacs, Eduard (May 11, 2015). "Attackers Hide Malicious Macros in MHTML Documents". SecurityWeek.Com. Retrieved April 19, 2019.
  8. Mosuela, Lordian (July 10, 2015). "New Tricks of Macro Malware". Cyren. Retrieved April 19, 2019.
  9. Cimpanu, Catalin (April 12, 2019). "Internet Explorer zero-day lets hackers steal files from Windows PCs". ZDNet. Retrieved April 19, 2019.
  10. "Data URLs - HTTP". MDN. Retrieved April 2, 2023.
  11. Brinkmann, Martin (September 3, 2018). "Save any webpage as a single file in Chrome or Firefox - gHacks Tech News". ghacks.net. Retrieved April 2, 2023.
  12. "Mozilla Archive Format Add-on - File Format Overview". amadzone. Retrieved April 2, 2023.
  13. "Firefox Addon: MAF - Mozilla Archive Format". Archived from the original on 2 November 2017. Retrieved 2 April 2023.