HTML Tidy

Last updated
HTML Tidy
Original author(s) Dave Raggett
Developer(s) HTML Tidy Advocacy Community Group
Stable release
5.8.0 [1] / 16 July 2021;2 years ago (16 July 2021)
Repository
Written in C
Operating system BSD , Linux , macOS , Microsoft Windows
Type Library, Console Application
License W3C Software License
Website www.html-tidy.org   OOjs UI icon edit-ltr-progressive.svg

HTML Tidy is a console application for correcting invalid HyperText Markup Language (HTML), detecting potential web accessibility errors, and for improving the layout and indent style of the resulting markup. It is also a cross-platform library for computer applications that provides HTML Tidy's features.

Contents

History

HTML Tidy was developed by Dave Raggett [2] of the World Wide Web Consortium (W3C). Later it was released as a SourceForge project in 2003 and managed by various maintainers. [3]

In 2012, the project was moved to GitHub, [4] and maintained by Michael Smith, also of W3C, [5] where HTML5 support was added.

In 2015, the HTML Tidy Advocacy Community Group (HTACG) was formed for management and development of HTML Tidy as a W3C Community Group. [6] [7]

HTML Tidy source code is written in ANSI C for portability. Compiled binary files are available for a variety of platforms. It is available under the W3C Software Notice and License, a permissive BSD-style license. Up-to-date versions are available as source code cloned from its GitHub Git version control repository, or in binary packages for multiple operating systems from its GitHub Releases repository.

Features

Examples of corrections to invalid or poorly constructed HTML:

See also

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

An HTML editor is a program used for editing HTML, the markup of a web page. Although the HTML markup in a web page can be controlled with any text editor, specialized HTML editors can offer convenience, added functionality, and organisation. For example, many HTML editors handle not only HTML, but also related technologies such as CSS, XML and JavaScript or ECMAScript. In some cases they also manage communication with remote web servers via FTP and WebDAV, and version control systems such as Subversion or Git. Many word processing, graphic design and page layout programs that are not dedicated to web design, such as Microsoft Word or Quark XPress, also have the ability to function as HTML editors.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

In web development, "tag soup" is a pejorative for HTML written for a web page that is syntactically or structurally incorrect. Web browsers have historically treated structural or syntax errors in HTML leniently, so there has been little pressure for web developers to follow published standards. Therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.

<span class="mw-page-title-main">W3C Markup Validation Service</span> Validator service by the World Wide Web Consortium

The Markup Validation Service is a validator by the World Wide Web Consortium (W3C) that allows Internet users to check pre-HTML5 HTML and XHTML documents for well-formed markup against a document type definition. Markup validation is an important step towards ensuring the technical quality of web pages. However, it is not a complete measure of web standards conformance. Though W3C validation is important for browser compatibility and site usability, it has not been confirmed what effect it has on search engine optimization.

The following tables compare general and technical information for many wiki software packages.

<span class="mw-page-title-main">Markdown</span> Plain text markup language

Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004, in collaboration with Aaron Swartz, as a markup language that is intended to be easy to read in its source code form. Markdown is widely used for blogging and instant messaging, and also used elsewhere in online forums, collaborative software, documentation pages, and readme files.

In HTML, the standard markup language for documents designed to be displayed in a web browser, <div> and <span> tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as <p> (paragraph), <em> (emphasis), and so on, accurately represent the semantics of the content, the additional use of <span> and <div> tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, <span> and <div> can valuably represent parts of a document so that HTML attributes such as class, id, lang, or dir can be applied.

<span class="mw-page-title-main">Strikethrough</span> Words with a horizontal line through them

Strikethrough is a typographical presentation of words with a horizontal line through their center, resulting in text like this. Contrary to censored or sanitized (redacted) texts, the words remain readable. This presentation signifies one of two meanings. In ink-written, typewritten, or other non-erasable text, the words are a mistake and not meant for inclusion. When used on a computer screen, however, it indicates deleted information, as popularized by Microsoft Word's revision and track changes features.

<span class="mw-page-title-main">HTML5</span> Fifth and previous version of HyperText Markup Language

HTML5 is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

JSDoc is a markup language used to annotate JavaScript source code files. Using comments containing JSDoc, programmers can add documentation describing the application programming interface of the code they're creating. This is then processed, by various tools, to produce documentation in accessible formats like HTML and Rich Text Format. The JSDoc specification is released under CC BY-SA 3.0, while its companion documentation generator and parser library is free software under the Apache License 2.0.

<span class="mw-page-title-main">Semantic HTML</span> HTML used to reinforce meaning of documents or webpages

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in web pages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Search engines benefit greatly from direct access to Microdata because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa and microformats.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

Media Source Extensions (MSE) is a W3C specification that allows JavaScript to send byte streams to media codecs within web browsers that support HTML video and audio. Among other possible uses, this allows the implementation of client-side prefetching and buffering code for streaming media entirely in JavaScript. It is compatible with, but should not be confused with, the Encrypted Media Extensions (EME) specification, and neither requires the use of the other, although many EME implementations are only capable of decrypting media data provided via MSE.

Crosswalk Project was an open-source web app runtime built with the latest releases of Chromium and Blink from Google. The project was founded by Intel's Open Source Technology Center in September 2013.

References

  1. . 16 July 2021 https://github.com/htacg/tidy-html5/releases/tag/5.8.0 . Retrieved 25 November 2021.{{cite web}}: Missing or empty |title= (help)
  2. Raggett, Dave. "Clean up your Web pages with HTML TIDY". W3C.org. Retrieved 2015-02-12. (Dave Raggett's legacy HTML Tidy page.)
  3. "SourceForge.net Repository - [tidy] Index of /". Tidy.cvs.sourceforge.net. Retrieved 2015-04-25.
  4. tidy-html5 on GitHub
  5. Smith, Michael. "Michael[tm] Smith". W3C.org. Retrieved 2015-02-12.
  6. "HTACG". HTACG.org. Retrieved 2015-04-25.
  7. Jim Derry (15 January 2015). "HTML Tidy Advocacy Community Group". W3.org. Retrieved 2015-04-25.