Data Toolbar

Last updated
Data Toolbar
Developer(s) DataTool Services
Operating system Microsoft Windows
Type Browser toolbar, Web scraping
Website www.datatoolbar.com

Data Toolbar is a Web scraping computer software add-on to the Internet Explorer, Mozilla Firefox, and Google Chrome Web browsers that collects and converts the structured data from Web pages into a tabular format that can be loaded into a spreadsheet or database management program. [1]

Contents

Algorithm

The program implements a variation of the genetic tree-matching algorithm with respect to nested lists. [2] That is, inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm. [3]

Features

Similar tools

Sources

  1. "A guide to the mortgage banking industry's leading providers of high-tech products and services". The Journal for Mortgage Banking Professionals. Zackin Publications. 25 (2): 14. January 2011.
  2. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira A Brief Survey of Web Data Extraction Tools Archived 2011-07-06 at the Wayback Machine ACM SIGMOD Volume 31 Issue 2
  3. Nitin Jindal, Bing Liu A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction Proceedings of the Tenth SIAM International Conference on Data Mining, 2010

Related Research Articles

A computing platform, digital platform, or software platform is an environment in which software is executed. It may be the hardware or the operating system (OS), a web browser and associated application programming interfaces, or other underlying software, as long as the program code is executed using the services provided by the platform. Computing platforms have different abstraction levels, including a computer architecture, an OS, or runtime libraries. A computing platform is the stage on which computer programs can run.

<span class="mw-page-title-main">ActiveX</span> Software framework by Microsoft introduced in 1996

ActiveX is a deprecated software framework created by Microsoft that adapts its earlier Component Object Model (COM) and Object Linking and Embedding (OLE) technologies for content downloaded from a network, particularly from the World Wide Web. Microsoft introduced ActiveX in 1996. In principle, ActiveX is not dependent on Microsoft Windows operating systems, but in practice, most ActiveX controls only run on Windows. Most also require the client to be running on an x86-based computer because ActiveX controls contain compiled code.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

MHTML, an initialism of "MIME encapsulation of aggregate HTML documents", is a Web archive file format used to combine, in a single computer file, the HTML code and its companion resources that are represented by external hyperlinks in the web page's HTML code. The content of an MHTML file is encoded using the same techniques that were first developed for HTML email messages, using the MIME content type multipart/related. MHTML files use an .mhtml or .mht filename extension.

This is a comparison of both historical and current web browsers based on developer, engine, platform(s), releases, license, and cost.

Mozilla Firefox has features that allow it to be distinguished from other web browsers, such as Chrome and Internet Explorer.

Netscape Plugin Application Programming Interface (NPAPI) is a deprecated application programming interface (API) for web browser plugins, initially developed for Netscape Navigator 2.0 in 1995 and subsequently adopted by other browsers.

Windows Presentation Foundation (WPF) is a free and open-source graphical subsystem originally developed by Microsoft for rendering user interfaces in Windows-based applications. WPF, previously known as "Avalon", was initially released as part of .NET Framework 3.0 in 2006. WPF uses DirectX and attempts to provide a consistent programming model for building applications. It separates the user interface from business logic, and resembles similar XML-oriented object models, such as those implemented in XUL and SVG.

<span class="mw-page-title-main">Jmol</span> Open-source Java viewer for 3D chemical structures

Jmol is computer software for molecular modelling chemical structures in 3-dimensions. Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g., in chemistry and biochemistry. It is written in the programming language Java, so it can run on the operating systems Windows, macOS, Linux, and Unix, if Java is installed. It is free and open-source software released under a GNU Lesser General Public License (LGPL) version 2.0. A standalone application and a software development kit (SDK) exist that can be integrated into other Java applications, such as Bioclipse and Taverna.

A browser extension is a software module for customizing a web browser. Browsers typically allow users to install a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and styling of web pages.

A browser toolbar is a toolbar that resides within a browser's window. All major web browsers provide support to browser toolbar development as a way to extend the browser's GUI and functionality. Browser toolbars are considered to be a particular kind of browser extensions that present a toolbar. Browser toolbars are specific to each browser, which means that a toolbar working on a browser does not work on another one. All browser toolbars must be installed in the corresponding browser before they can be used and require updates when new versions are released.

iMacros Browser-based application for macro recording, editing and playback

iMacros is a browser-based application for macro recording, editing and playback for web automation and testing. It is provided as a standalone application and extension for Mozilla Firefox, Google Chrome, and Internet Explorer web browsers. Developed by iOpus/Ipswitch, It adds record and replay functionality similar to that found in web testing and form filler software. The macros can be combined and controlled via JavaScript. Demo macros and JavaScript code examples are included with the software. Running strictly JavaScript-based macros was removed in later versions of iMacros browser extensions. However, users can use alternative browser like Pale Moon, based on older versions of Mozilla Firefox to use JavaScript files for web-based automated testing with Moon Tester Tool.

<span class="mw-page-title-main">Site-specific browser</span> Software application for browsing a particular website

A site-specific browser (SSB) is a software application that is dedicated to accessing pages from a single source (site) on a computer network such as the Internet or a private intranet. SSBs typically simplify the more complex functions of a web browser by excluding the menus, toolbars and browser GUI associated with functions that are external to the workings of a single site. These applications are typically started by a desktop icon which is usually a favicon.

<span class="mw-page-title-main">Google Chrome</span> Web browser developed by Google

Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, and also for Android, where it is the default browser. The browser is also the main component of ChromeOS, where it serves as the platform for web applications.

<span class="mw-page-title-main">WebGL</span> JavaScript bindings for OpenGL in web browsers

WebGL is a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins. WebGL is fully integrated with other web standards, allowing GPU-accelerated usage of physics, image processing, and effects in the HTML canvas. WebGL elements can be mixed with other HTML elements and composited with other parts of the page or page background.

HTML5 Audio is a subject of the HTML5 specification, incorporating audio input, playback, and synthesis, as well as in the browser. iOS

<span class="mw-page-title-main">Test Studio</span> Software test automation tool

Progress Telerik Test Studio is a Windows-based software test automation tool for web and desktop that supports functional testing, software performance testing, load testing and RESTful API testing developed by Telerik. The tool ships with a plugin for Visual Studio and a standalone app that use the same repositories and file formats. Test Studio supports HTML, AJAX, Silverlight, ASP.NET MVC, JavaScript, WPF, Angular, React, ASP.NET AJAX, ASP.NET Core, and Blazor. Any application that runs on .NET 5, .NET Core, .NET 6 or higher can be automated with Test Studio. Test Studio supports cross-browser testing for Internet Explorer, Firefox, Microsoft Edge, and Chrome.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

Data scraping is a technique where a computer program extracts data from human-readable output coming from another program.

OutWit Hub is a Web data extraction software application designed to automatically extract information from online or local resources. It recognizes and grabs links, images, documents, contacts, recurring vocabulary and phrases, rss feeds and converts structured and unstructured data into formatted tables which can be exported to spreadsheets or databases. The first version was released in 2010. Version 9.0 was released in January 2020.