Crawljax

Last updated

Crawljax is a free and open source web crawler for automatically crawling and analyzing dynamic Ajax-based Web applications. [1]

One major point of difference between Crawljax and other traditional web crawlers is that Crawljax is an event-driven dynamic crawler, capable of exploring JavaScript-based DOM state changes. Crawljax can be used to crawl and create a static mirror of any Ajax web application. [2]

Related Research Articles

Dynamic HTML, or DHTML, is a term which was used by some browser vendors to describe the combination of HTML, style sheets and client-side scripts that enabled the creation of interactive and animated documents. The application of DHTML was introduced by Microsoft with the release of Internet Explorer 4 in 1997.

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

Distributed web crawling is a distributed computing technique whereby Internet search engines employ many computers to index the Internet via web crawling. Such systems may allow for users to voluntarily offer their own computing and bandwidth resources towards crawling web pages. By spreading the load of these tasks across many computers, costs that would otherwise be spent on maintaining large computing clusters are avoided.

<span class="mw-page-title-main">Web application</span> Application that uses a web browser as a client

A web application is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

Jakarta Faces, formerly Jakarta Server Faces and JavaServer Faces (JSF) is a Java specification for building component-based user interfaces for web applications. It was formalized as a standard through the Java Community Process as part of the Java Platform, Enterprise Edition. It is an MVC web framework that simplifies the construction of user interfaces (UI) for server-based applications by using reusable UI components in a page.

Ajax is a set of web development techniques that uses various web technologies on the client-side to create asynchronous web applications. With Ajax, web applications can send and retrieve data from a server asynchronously without interfering with the display and behaviour of the existing page. By decoupling the data interchange layer from the presentation layer, Ajax allows web pages and, by extension, web applications, to change content dynamically without the need to reload the entire page. In practice, modern implementations commonly utilize JSON instead of XML.

<span class="mw-page-title-main">Dynamic web page</span> Type of web page

A dynamic web page is a web page constructed at runtime, as opposed to a static web page, delivered as it is stored. A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and including the setting up of more client-side processing. A client-side dynamic web page processes the web page using JavaScript running in the browser as it loads. JavaScript can interact with the page via Document Object Model (DOM), to query page state and modify it. Even though a web page can be dynamic on the client-side, it can still be hosted on a static hosting service such as GitHub Pages or Amazon S3 as long as there is not any server-side code included.

A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build and deploy web applications on the World Wide Web. Web frameworks aim to automate the overhead associated with common activities performed in web development. For example, many web frameworks provide libraries for database access, templating frameworks, and session management, and they often promote code reuse. Although they often target development of dynamic web sites, they are also applicable to static websites.

Echo is a web application framework created by the company NextApp. The latest iteration, Echo3, allows writing applications in either server-side Java or client-side JavaScript. Server-side applications do not require developer knowledge of HTML, HTTP, or JavaScript. Client-side JavaScript-based applications do not require a server, but can communicate with one via AJAX.

MochiKit is a light-weight JavaScript library written and maintained by Bob Ippolito.

Comet is a web application model in which a long-held HTTPS request allows a web server to push data to a browser, without the browser explicitly requesting it. Comet is an umbrella term, encompassing multiple techniques for achieving this interaction. All these methods rely on features included by default in browsers, such as JavaScript, rather than on non-default plugins. The Comet approach differs from the original model of the web, in which a browser requests a complete web page at a time.

<span class="mw-page-title-main">Google Web Toolkit</span> Free Java library

Google Web Toolkit, or GWT Web Toolkit, is an open-source set of tools that allows web developers to create and maintain JavaScript front-end applications in Java. It is licensed under Apache License 2.0.

Google Developers is Google's site for software development tools and platforms, application programming interfaces (APIs), and technical resources. The site contains documentation on using Google developer tools and APIs—including discussion groups and blogs for developers using Google's developer products.

A JavaScript library is a library of pre-written JavaScript code that allows for easier development of JavaScript-based applications, especially for AJAX and other web-centric technologies. They can be included in a website by embedding it directly in the HTML via a script tag.

<span class="mw-page-title-main">RichFaces</span>

RichFaces was an open source Ajax-enabled component library for JavaServer Faces, hosted by JBoss. It allows easy integration of Ajax capabilities into enterprise application development. It reached its end-of-life in June 2016.

A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of a web browser loading entire new pages. The goal is faster transitions that make the website feel more like a native app.

ZK is an open-source Ajax Web application framework, written in Java, that enables creation of graphical user interfaces for Web applications with little required programming knowledge.

References

  1. Mesbah, Ali; Van Deursen, Arie; Lenselink, Stefan (2012). "Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes". ACM Transactions on the Web. 6: 1–30. doi:10.1145/2109205.2109208. S2CID   1351916.
  2. "Understand JavaScript SEO Basics | Google Search Central".