HTML Tidy

Last updated
HTML Tidy
Original author(s) Dave Raggett
Developer(s) HTML Tidy Advocacy Community Group
Stable release
5.8.0 [1] / 16 July 2021;2 years ago (16 July 2021)
Repository
Written in C
Operating system BSD, Linux , macOS , Microsoft Windows
Type Library, Console Application
License W3C Software License
Website www.html-tidy.org   OOjs UI icon edit-ltr-progressive.svg

HTML Tidy is a console application for correcting invalid HyperText Markup Language (HTML), detecting potential web accessibility errors, and for improving the layout and indent style of the resulting markup. It is also a cross-platform library for computer applications that provides HTML Tidy's features.

Contents

History

HTML Tidy was developed by Dave Raggett [2] of the World Wide Web Consortium (W3C). Later it was released as a SourceForge project in 2003 and managed by various maintainers. [3]

In 2012, the project was moved to GitHub, [4] and maintained by Michael Smith, also of W3C, [5] where HTML5 support was added.

In 2015, the HTML Tidy Advocacy Community Group (HTACG) was formed for management and development of HTML Tidy as a W3C Community Group. [6] [7]

HTML Tidy source code is written in ANSI C for portability. Compiled binary files are available for a variety of platforms. It is available under the W3C Software Notice and License, a permissive BSD-style license. Up-to-date versions are available as source code cloned from its GitHub Git version control repository, or in binary packages for multiple operating systems from its GitHub Releases repository.

Features

Examples of corrections to invalid or poorly constructed HTML:

See also

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the meaning and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

A HTML editor is a program used for editing HTML, the markup of a web page. Although the HTML markup in a web page can be controlled with any text editor, specialized HTML editors can offer convenience, added functionality, and organisation. For example, many HTML editors handle not only HTML, but also related technologies such as CSS, XML and JavaScript or ECMAScript. In some cases they also manage communication with remote web servers via FTP and WebDAV, and version control systems such as Subversion or Git. Many word processing, graphic design and page layout programs that are not dedicated to web design, such as Microsoft Word or Quark XPress, also have the ability to function as HTML editors.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The most commonly used version is HTML 4.01, which became official standard in December 1999. An HTML document is composed of a tree of simple HTML nodes, such as text nodes, and HTML elements, which add semantics and formatting to parts of document. Each element can have HTML attributes specified. Elements can also have content, including other elements and text.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

In web development, "tag soup" is a pejorative for syntactically or structurally incorrect HTML written for a web page. Because web browsers have historically treated structural or syntax errors in HTML leniently, there has been little pressure for web developers to follow published standards, and therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

<span class="mw-page-title-main">W3C Markup Validation Service</span> Validator service by the World Wide Web Consortium

The Markup Validation Service is a validator by the World Wide Web Consortium (W3C) that allows Internet users to check pre-HTML5 HTML and XHTML documents for well-formed markup against a document type definition. Markup validation is an important step towards ensuring the technical quality of web pages. However, it is not a complete measure of web standards conformance. Though W3C validation is important for browser compatibility and site usability, it has not been confirmed what effect it has on search engine optimization.

A source-code-hosting facility is a file archive and web hosting facility for source code of software, documentation, web pages, and other works, accessible either publicly or privately. They are often used by open-source software projects and other multi-developer projects to maintain revision and version history, or version control. Many repositories provide a bug tracking system, and offer release management, mailing lists, and wiki-based project documentation. Software authors generally retain their copyright when software is posted to a code hosting facilities.

<span class="mw-page-title-main">Markdown</span> Plain text markup language

Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as a markup language that is easy to read in its source code form. Markdown is widely used for blogging and instant messaging, and also used elsewhere in online forums, collaborative software, documentation pages, and readme files.

<span class="mw-page-title-main">Mercurial</span> Distributed revision-control tool for software developers

Mercurial is a distributed revision control tool for software developers. It is supported on Microsoft Windows and Unix-like systems, such as FreeBSD, macOS, and Linux.

In HTML, div and span tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as p (paragraph), em (emphasis), and so on, accurately represent the semantics of the content, the additional use of span and div tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, span and div can valuably represent parts of a document so that HTML attributes such as class, id, lang, or dir can be applied.

Strikethrough is a typographical presentation of words with a horizontal line through their center, resulting in text like this. Contrary to censored or sanitized (redacted) texts, the words remain readable. This presentation signifies one of two meanings. In ink-written, typewritten, or other non-erasable text, the words are a mistake and not meant for inclusion. When used on a computer screen, however, it indicates deleted information, as popularized by Microsoft Word's revision and track changes features. It can also be used deliberately to imply a change of thought.

<span class="mw-page-title-main">HTML5</span> Fifth and current version of hypertext markup language

HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

JSDoc is a markup language used to annotate JavaScript source code files. Using comments containing JSDoc, programmers can add documentation describing the application programming interface of the code they're creating. This is then processed, by various tools, to produce documentation in accessible formats like HTML and Rich Text Format. The JSDoc specification is released under CC BY-SA 3.0, while its companion documentation generator and parser library is free software under the Apache License 2.0.

<span class="mw-page-title-main">Semantic HTML</span> HTML used to reinforce meaning of documents or webpages

Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in web pages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents. CSS is used to suggest its presentation to human users.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

Crosswalk Project was an open-source web app runtime built with the latest releases of Chromium and Blink from Google. The project was founded by Intel's Open Source Technology Center in September 2013.

References

  1. Error: Unable to display the reference properly. See the documentation for details.
  2. Raggett, Dave. "Clean up your Web pages with HTML TIDY". W3C.org. Retrieved 2015-02-12. (Dave Raggett's legacy HTML Tidy page.)
  3. "SourceForge.net Repository - [tidy] Index of /". Tidy.cvs.sourceforge.net. Retrieved 2015-04-25.
  4. tidy-html5 on GitHub
  5. Smith, Michael. "Michael[tm] Smith". W3C.org. Retrieved 2015-02-12.
  6. "HTACG". HTACG.org. Retrieved 2015-04-25.
  7. Jim Derry (15 January 2015). "HTML Tidy Advocacy Community Group". W3.org. Retrieved 2015-04-25.