HtmlUnit

Last updated
HtmlUnit
Initial releaseMay 22, 2002 (2002-05-22)
Stable release
4.1.0 / April 28, 2024;3 days ago (2024-04-28)
Repository
Written in Java
Operating system Cross-platform (JVM)
Available in English
Type Web browser
License Apache License 2.0
Website https://htmlunit.sourceforge.io/

HtmlUnit is a headless web browser written in Java. It allows high-level manipulation of websites from other Java code, including filling and submitting forms and clicking hyperlinks. It also provides access to the structure and the details within received web pages. HtmlUnit emulates parts of browser behaviour including the lower-level aspects of TCP/IP and HTTP. A sequence such as getPage(url), getLinkWith("Click here"), click() allows a user to navigate through hypertext and obtain web pages that include HTML, JavaScript, Ajax and cookies. This headless browser can deal with HTTPS security, basic HTTP authentication, automatic page redirection and other HTTP headers. It allows Java test code to examine returned pages either as text, an XML DOM, or as collections of forms, tables, and links. [1]

Contents

The goal is to simulate real browsers; namely Chrome, Firefox and Edge.

The most common use of HtmlUnit is test automation of web pages, but sometimes it can be used for web scraping, or downloading website content.

Benefits

Drawbacks

Used technologies

Libraries using HtmlUnit

See also

Related Research Articles

<span class="mw-page-title-main">Document Object Model</span> Convention for representing and interacting with objects in HTML, XHTML, and XML documents

The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.

Dynamic HTML, or DHTML, is a term which was used by some browser vendors to describe the combination of HTML, style sheets and client-side scripts that enabled the creation of interactive and animated documents. The application of DHTML was introduced by Microsoft with the release of Internet Explorer 4 in 1997.

<span class="mw-page-title-main">Bookmarklet</span> Web browser bookmark containing JavaScript code

A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table.

Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy. During the second half of 2007, XSSed documented 11,253 site-specific cross-site vulnerabilities, compared to 2,134 "traditional" vulnerabilities documented by Symantec. XSS effects vary in range from petty nuisance to significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner network.

In software engineering, the terms frontend and backend refer to the separation of concerns between the presentation layer (frontend), and the data access layer (backend) of a piece of software, or the physical infrastructure or hardware. In the client–server model, the client is usually considered the frontend and the server is usually considered the backend, even when some presentation work is actually done on the server itself.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

Ajax is a set of web development techniques that uses various web technologies on the client-side to create asynchronous web applications. With Ajax, web applications can send and retrieve data from a server asynchronously without interfering with the display and behaviour of the existing page. By decoupling the data interchange layer from the presentation layer, Ajax allows web pages and, by extension, web applications, to change content dynamically without the need to reload the entire page. In practice, modern implementations commonly utilize JSON instead of XML.

In HTML, <div> and <span> tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as <p> (paragraph), <em> (emphasis), and so on, accurately represent the semantics of the content, the additional use of <span> and <div> tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, <span> and <div> can valuably represent parts of a document so that HTML attributes such as class, id, lang, or dir can be applied.

<span class="mw-page-title-main">Progressive enhancement</span> Web design strategy putting emphasis web content first

Progressive enhancement is a strategy in web design that puts emphasis on web content first, allowing everyone to access the basic content and functionality of a web page, while users with additional browser features or faster Internet access receive the enhanced version instead. This strategy speeds up loading and facilitates crawling by web search engines, as text on a page is loaded immediately through the HTML source code rather than having to wait for JavaScript to initiate and load the content subsequently, meaning content ready for consumption "out of the box" is served immediately, and not behind additional layers.

Selenium is an open source umbrella project for a range of tools and libraries aimed at supporting browser automation. It provides a playback tool for authoring functional tests across most modern web browsers, without the need to learn a test scripting language. It also provides a test domain-specific language (Selenese) to write tests in a number of popular programming languages, including JavaScript (Node.js), C#, Groovy, Java, Perl, PHP, Python, Ruby and Scala. Selenium runs on Windows, Linux, and macOS. It is open-source software released under the Apache License 2.0.

<span class="mw-page-title-main">Google Web Toolkit</span> Free Java library

Google Web Toolkit, or GWT Web Toolkit, is an open-source set of tools that allows web developers to create and maintain JavaScript front-end applications in Java. It is licensed under Apache License 2.0.

<span class="mw-page-title-main">Aptana</span> Text editor

Aptana, Inc. is a company that makes web application development tools for use with a variety of programming languages. Aptana's main products include Aptana Studio, Aptana Cloud and Aptana Jaxer.

<span class="mw-page-title-main">YUI Library</span>

The Yahoo! User Interface Library (YUI) is a discontinued open-source JavaScript library for building richly interactive web applications using techniques such as Ajax, DHTML, and DOM scripting. YUI includes several cores CSS resources. It is available under a BSD License. Development on YUI began in 2005 and Yahoo! properties such as My Yahoo! and the Yahoo! front page began using YUI in the summer of that year. YUI was released for public use in February 2006. It was actively developed by a core team of Yahoo! engineers.

jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animations, and Ajax. It is free, open-source software using the permissive MIT License. As of August 2022, jQuery is used by 77% of the 10 million most popular websites. Web analysis indicates that it is the most widely deployed JavaScript library by a large margin, having at least three to four times more usage than any other JavaScript library.

QF-Test from Quality First Software is a cross-platform software tool for automated testing of programs via the graphical user interface. The program is specialized on cross-browser test automation of static and dynamic web-based applications. Version 4.1 added support for MacOS and the Apple Safari and Microsoft Edge browsers via the Selenium WebDriver. RESTful web service testing. From version 5.0, Windows applications can also be tested and modern C++ applications. Version 5.3 added support for the Chrome DevTools protocol, which allows browsers to be controlled using CDP drivers.

<span class="mw-page-title-main">Firebug (software)</span> Web development add-on for Firefox

Firebug is a discontinued free and open-source web browser extension for Mozilla Firefox that facilitated the live debugging, editing, and monitoring of any website's CSS, HTML, DOM, XHR, and JavaScript.

Front-end web development is the development of the graphical user interface of a website through the use of HTML, CSS, and JavaScript so users can view and interact with that website.

A headless browser is a web browser without a graphical user interface.

Capybara is a web-based test automation software that simulates scenarios for user stories and automates web application testing for behavior-driven software development. It is written in the Ruby programming language.

<span class="mw-page-title-main">Vue.js</span> Open-source JavaScript library for building user interfaces

Vue.js is an open-source model–view–viewmodel front end JavaScript library for building user interfaces and single-page applications. It was created by Evan You, and is maintained by him and the rest of the active core team members.

References

  1. "HtmlUnit Home". Sourceforge. Retrieved 30 August 2019.
  2. Beust, Cédric; Suleiman, Hani (15 October 2007). Next Generation Java Testing: TestNG and Advanced Concepts. Pearson Education. ISBN   9780132702270 . Retrieved 30 August 2019.
  3. "HtmlUnit Driver". Github. Retrieved 30 August 2019.
  4. "Testing HTML Unit". GWT Project. Retrieved 30 August 2019.

Bibliography

Further reading