Headless browser

Last updated

A headless browser is a web browser without a graphical user interface.

Contents

Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods. [1]

Since version 59 of Google Chrome [2] [3] and version 56 [4] of Firefox, [5] there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS. [6]

Use cases

The main use cases for headless browsers are:

Other uses

Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax. [7]

Headless browsers have also been misused in various ways:

However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers. [3] There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

Usage

As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

Test automation

Some test automation software and frameworks include headless browsers as part of their testing apparati. [3]

Alternatives

Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom [17] is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than full browsers, but are unable to correctly interpret many popular websites. [18] [19] [20]

Another is HtmlUnit, a headless browser written in Java. HtmlUnit uses the Rhino engine to provide JavaScript and Ajax support as well as partial rendering capability. [21] [22]

List of headless browsers

These are various software that provide headless browser APIs.

Another noted earlier effort was envjs in 2008 from John Resig, which was a simulated browser environment written in JavaScript for the Rhino engine. [29]

See also

Related Research Articles

In software engineering, the terms frontend and backend refer to the separation of concerns between the presentation layer (frontend), and the data access layer (backend) of a piece of software, or the physical infrastructure or hardware. In the client–server model, the client is usually considered the frontend and the server is usually considered the backend, even when some presentation work is actually done on the server itself.

Selenium is an open source umbrella project for a range of tools and libraries aimed at supporting browser automation. It provides a playback tool for authoring functional tests across most modern web browsers, without the need to learn a test scripting language. It also provides a test domain-specific language (Selenese) to write tests in a number of popular programming languages, including JavaScript (Node.js), C#, Groovy, Java, Perl, PHP, Python, Ruby and Scala. Selenium runs on Windows, Linux, and macOS. It is open-source software released under the Apache License 2.0.

<span class="mw-page-title-main">Google Web Toolkit</span> Free Java library

Google Web Toolkit, or GWT Web Toolkit, is an open-source set of tools that allows web developers to create and maintain JavaScript front-end applications in Java. It is licensed under Apache License 2.0.

jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animations, and Ajax. It is free, open-source software using the permissive MIT License. As of August 2022, jQuery is used by 77% of the 10 million most popular websites. Web analysis indicates that it is the most widely deployed JavaScript library by a large margin, having at least three to four times more usage than any other JavaScript library.

QF-Test from Quality First Software is a cross-platform software tool for automated testing of programs via the graphical user interface. The program is specialized on cross-browser test automation of static and dynamic web-based applications. Version 4.1 added support for MacOS and the Apple Safari and Microsoft Edge browsers via the Selenium WebDriver. RESTful web service testing. From version 5.0, Windows applications can also be tested and modern C++ applications. Version 5.3 added support for the Chrome DevTools protocol, which allows browsers to be controlled using CDP drivers.

iMacros Browser-based application for macro recording, editing and playback

iMacros is a browser-based application for macro recording, editing and playback for web automation and testing. It is provided as a standalone application and extension for Mozilla Firefox, Google Chrome, and Internet Explorer web browsers. Developed by iOpus/Ipswitch, It adds record and replay functionality similar to that found in web testing and form filler software. The macros can be combined and controlled via JavaScript. Demo macros and JavaScript code examples are included with the software. Running strictly JavaScript-based macros was removed in later versions of iMacros browser extensions. However, users can use alternative browser like Pale Moon, based on older versions of Mozilla Firefox to use JavaScript files for web-based automated testing with Moon Tester Tool.

V8 is a JavaScript and WebAssembly engine developed by Google for its Chrome browser. V8 is free and open-source software that is part of the Chromium project and also used separately in non-browser contexts, notably the Node.js runtime system.

<span class="mw-page-title-main">John Resig</span> American software engineer and creator of jQuery

John Resig is an American software engineer and entrepreneur, best known as the creator and lead developer of the jQuery JavaScript library. As of 2021, he works as the chief software architect at Khan Academy.

<span class="mw-page-title-main">Google Closure Tools</span> JavaScript developer toolkit

Google Closure Tools is a set of tools to help developers build rich web applications with JavaScript. It was developed by Google for use in their web applications such as Gmail, Google Docs and Google Maps. As of 2023, the project had over 230K LOCs not counting the embedded Mozilla Rhino compiler.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

HTML audio is a subject of the HTML specification, incorporating audio input, playback, and synthesis, all in the browser.

<span class="mw-page-title-main">PDF.js</span> PDF viewer in JavaScript included in Mozilla Firefox

PDF.js is a JavaScript library that renders Portable Document Format (PDF) files using the web standards-compliant HTML5 Canvas. The project is led by the Mozilla Corporation after Andreas Gal launched it in 2011.

<span class="mw-page-title-main">Test Studio</span> Software test automation tool

Progress Telerik Test Studio is a Windows-based software test automation tool for web and desktop that supports functional testing, software performance testing, load testing and RESTful API testing developed by Telerik. The tool ships with a plugin for Visual Studio and a standalone app that use the same repositories and file formats. Test Studio supports HTML, AJAX, Silverlight, ASP.NET MVC, JavaScript, WPF, Angular, React, ASP.NET AJAX, ASP.NET Core, and Blazor. Any application that runs on .NET 5, .NET Core, .NET 6 or higher can be automated with Test Studio. Test Studio supports cross-browser testing for Internet Explorer, Firefox, Microsoft Edge, and Chrome.

Web Components are a set of features that provide a standard component model for the web allowing for encapsulation and interoperability of individual HTML elements. Web Components are popular approach to build microfrontends.

<span class="mw-page-title-main">PhantomJS</span> Headless browser

PhantomJS is a discontinued headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based unit tests in a headless system like a continuous integration environment. PhantomJS is based on WebKit making it a similar browsing environment to Safari and Google Chrome. It is open-source software released under the BSD License.

<span class="mw-page-title-main">WebAssembly</span> Cross-platform assembly language and bytecode designed for execution in web browsers

WebAssembly defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating interactions between such programs and their host environment.

HtmlUnit is a headless web browser written in Java. It allows high-level manipulation of websites from other Java code, including filling and submitting forms and clicking hyperlinks. It also provides access to the structure and the details within received web pages. HtmlUnit emulates parts of browser behaviour including the lower-level aspects of TCP/IP and HTTP. A sequence such as getPage(url), getLinkWith("Click here"), click allows a user to navigate through hypertext and obtain web pages that include HTML, JavaScript, Ajax and cookies. This headless browser can deal with HTTPS security, basic HTTP authentication, automatic page redirection and other HTTP headers. It allows Java test code to examine returned pages either as text, an XML DOM, or as collections of forms, tables, and links.

This is a list of articles related to the JavaScript programming language.

<span class="mw-page-title-main">Katalon Studio</span> Automation testing software tool

Katalon Platform is an automation testing software tool developed by Katalon, Inc. The software is built on top of the open-source automation frameworks Selenium, Appium with a specialized IDE interface for web, API, mobile and desktop application testing. Its initial release for internal use was in January 2015. Its first public release was in September 2016. In 2018, the software acquired 9% of market penetration for UI test automation, according to The State of Testing 2018 Report by SmartBear.

Playwright is an open-source automation library for browser testing and web scraping developed by Microsoft and launched on 31 January 2020, which has since become popular among programmers and web developers.

References

  1. "What is a headless browser?". arhg.net. 7 October 2009.
  2. "Getting Started with Headless Chrome". developers.google.com. 27 April 2017.
  3. 1 2 3 Bekerman, Dima (2018-11-28). "Headless Chrome: DevOps Love It, So Do Hackers, Here's Why | Imperva". Blog. Retrieved 2021-02-22.
  4. "Firefox 56 release notes". developer.mozilla.org. 26 February 2023.
  5. "Headless mode - browser support". developer.mozilla.org. Archived from the original on 2018-06-03. Retrieved 2017-08-31.
  6. "Quick Start". phantomjs.org.
  7. Mueller, John (2009-10-07). "Official Google Webmaster Central Blog: A proposal for making AJAX crawlable". Official Google Webmaster Central Blog.
  8. Rawlings, Matt (2013-11-20). "Headless Browser Botnet Used in 150 hour DDoS attack". Business 2 Community.
  9. Mello Jr., John P. (2014-03-25). "Headless Web Traffic Threatens Internet Economy". ecommercetimes.com.
  10. Raywood, Dan (2014-04-01). "Headless browsers: legitimate software that enables attack". ITProPortal.
  11. Mueller, Neal. "Credential stuffing". owasp.org.
  12. Sheth, Himanshu (2020-11-17). "Selenium 4 Is Now W3C Compliant: All You Need To Know".
  13. "GitHub - Playwright". GitHub . Retrieved 2021-04-11.
  14. "Github - Puppeteer". GitHub . Retrieved 2021-04-11.
  15. Silva, Francisco (2019-05-29). "From capybara-webkit to Headless Chrome and ChromeDriver". Blog | Imaginary Cloud. Retrieved 2021-02-22.
  16. Bintz, John. "jasmine-headless-webkit -- The fastest way to run your Jasmine specs!". johnbintz.github.io. Retrieved 2021-02-22.
  17. "JSDOM at GitHub - Pretending to be a visual browser". GitHub . Retrieved 2021-04-18.
  18. "assaf/zombie". GitHub.
  19. "ヘルペスが口や目からうつる?感染した時の症状と病院の治療方法とは". www.envjs.com. Archived from the original on 2015-02-23. Retrieved 2015-03-13.
  20. "JavaScriptMVC - EnvJS". javascriptmvc.com.
  21. Mike Bowler. "HtmlUnit – Welcome to HtmlUnit". sourceforge.net.
  22. "Platform (Vaadin 7.3.4 API)". vaadin.com. 6 November 2014.
  23. "scrapinghub/splash". GitHub. 20 December 2021.
  24. "DARPA - Open Catalog". Archived from the original on 2015-05-28. Retrieved 2015-05-28.
  25. "Zombie". labnotes.org.
  26. SimpleBrowserDotNet/SimpleBrowser, SimpleBrowserDotNet, 2021-02-10, retrieved 2021-02-22
  27. DotNetBrowser Examples, TeamDev, 2021-03-12, retrieved 2021-03-12
  28. "DotNetBrowser". TeamDev. 2021-05-05.
  29. Resig, John (2008-10-12). "env-js: A pure-JavaScript browser environment" via GitHub.