This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Initial release | May 22, 2002 |
---|---|
Stable release | 4.4.0 / July 28, 2024 |
Repository | |
Written in | Java |
Operating system | Cross-platform (JVM) |
Available in | English |
Type | Web browser |
License | Apache License 2.0 |
Website | https://htmlunit.sourceforge.io/ |
HtmlUnit is a headless web browser written in Java. It allows high-level manipulation of websites from other Java code, including filling and submitting forms and clicking hyperlinks. It also provides access to the structure and the details within received web pages. HtmlUnit emulates parts of browser behaviour including the lower-level aspects of TCP/IP and HTTP. A sequence such as getPage(url)
, getLinkWith("Click here")
, click()
allows a user to navigate through hypertext and obtain web pages that include HTML, JavaScript, Ajax and cookies. This headless browser can deal with HTTPS security, basic HTTP authentication, automatic page redirection and other HTTP headers. It allows Java test code to examine returned pages either as text, an XML DOM, or as collections of forms, tables, and links. [1]
The goal is to simulate real browsers; namely Chrome, Firefox and Edge.
The most common use of HtmlUnit is test automation of web pages, but sometimes it can be used for web scraping, or downloading website content.
The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.
Dynamic HTML, or DHTML, is a term which was used by some browser vendors to describe the combination of HTML, style sheets and client-side scripts that enabled the creation of interactive and animated documents. The application of DHTML was introduced by Microsoft with the release of Internet Explorer 4 in 1997.
A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table.
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy. During the second half of 2007, XSSed documented 11,253 site-specific cross-site vulnerabilities, compared to 2,134 "traditional" vulnerabilities documented by Symantec. XSS effects vary in range from petty nuisance to significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner network.
Ajax is a set of web development techniques that uses various web technologies on the client-side to create asynchronous web applications. With Ajax, web applications can send and retrieve data from a server asynchronously without interfering with the display and behaviour of the existing page. By decoupling the data interchange layer from the presentation layer, Ajax allows web pages and, by extension, web applications, to change content dynamically without the need to reload the entire page. In practice, modern implementations commonly utilize JSON instead of XML.
A dynamic web page is a web page constructed at runtime, as opposed to a static web page, delivered as it is stored. A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and including the setting up of more client-side processing. A client-side dynamic web page processes the web page using JavaScript running in the browser as it loads. JavaScript can interact with the page via Document Object Model (DOM), to query page state and modify it. Even though a web page can be dynamic on the client-side, it can still be hosted on a static hosting service such as GitHub Pages or Amazon S3 as long as there is not any server-side code included.
In HTML, the standard markup language for documents designed to be displayed in a web browser, <div>
and <span>
tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as <p>
(paragraph), <em>
(emphasis), and so on, accurately represent the semantics of the content, the additional use of <span>
and <div>
tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, <span>
and <div>
can valuably represent parts of a document so that HTML attributes such as class
, id
, lang
, or dir
can be applied.
Progressive enhancement is a strategy in web design that puts emphasis on web content first, allowing everyone to access the basic content and functionality of a web page, whilst users with additional browser features or faster Internet access receive the enhanced version instead. This strategy speeds up loading and facilitates crawling by web search engines, as text on a page is loaded immediately through the HTML source code rather than having to wait for JavaScript to initiate and load the content subsequently, meaning content ready for consumption "out of the box" is served immediately, and not behind additional layers.
Comet is a web application model in which a long-held HTTPS request allows a web server to push data to a browser, without the browser explicitly requesting it. Comet is an umbrella term, encompassing multiple techniques for achieving this interaction. All these methods rely on features included by default in browsers, such as JavaScript, rather than on non-default plugins. The Comet approach differs from the original model of the web, in which a browser requests a complete web page at a time.
Selenium is an open source umbrella project for a range of tools and libraries aimed at supporting browser automation. It provides a playback tool for authoring functional tests across most modern web browsers, without the need to learn a test scripting language. It also provides a test domain-specific language (Selenese) to write tests in a number of popular programming languages, including JavaScript (Node.js), C#, Groovy, Java, Perl, PHP, Python, Ruby and Scala. Selenium runs on Windows, Linux, and macOS. It is open-source software released under the Apache License 2.0.
Google Web Toolkit, or GWT Web Toolkit, is an open-source set of tools that allows web developers to create and maintain JavaScript front-end applications in Java. It is licensed under Apache License 2.0.
Aptana, Inc. is a company that makes web application development tools for use with a variety of programming languages. Aptana's main products include Aptana Studio, Aptana Cloud and Aptana Jaxer.
The Prototype JavaScript Framework is a JavaScript framework created by Sam Stephenson in February 2005 as part of Ajax support in Ruby on Rails. It is implemented as a single file of JavaScript code, usually named prototype.js
. Prototype is distributed standalone, but also as part of larger projects, such as Ruby on Rails, script.aculo.us and Rico. As of March 2021, according to w3techs, Prototype is used by 0.6% of all websites.
The Yahoo! User Interface Library (YUI) is a discontinued open-source JavaScript library for building richly interactive web applications using techniques such as Ajax, DHTML, and DOM scripting. YUI includes several cores CSS resources. It is available under a BSD License. Development on YUI began in 2005 and Yahoo! properties such as My Yahoo! and the Yahoo! front page began using YUI in the summer of that year. YUI was released for public use in February 2006. It was actively developed by a core team of Yahoo! engineers.
jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animations, and Ajax. It is free, open-source software using the permissive MIT License. As of August 2022, jQuery is used by 77% of the 10 million most popular websites. Web analysis indicates that it is the most widely deployed JavaScript library by a large margin, having at least three to four times more usage than any other JavaScript library.
Firebug is a discontinued free and open-source web browser extension for Mozilla Firefox that facilitated the live debugging, editing, and monitoring of any website's CSS, HTML, DOM, XHR, and JavaScript.
ThunderHawk is a discontinued web browser from Bitstream available for a full range of operating systems in high end and mass-market mobile phones and personal digital assistants. It is basically meant for mobile operators and original equipment manufacturers and not meant to download for normal users.
A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of loading entire new pages. The goal is faster transitions that make the website feel more like a native app.
A headless browser is a web browser without a graphical user interface.
Capybara is a web-based test automation software that simulates scenarios for user stories and automates web application testing for behavior-driven software development. It is written in the Ruby programming language.