MetaCrawler

Last updated
MetaCrawler
Metacrawler logo 2018.png
Type of site
Metasearch engine
Available inEnglish
OwnerSystem1
URL metacrawler.com
RegistrationNo
LaunchedJuly 7, 1995;28 years ago (1995-07-07)

MetaCrawler is a search engine. It is a registered trademark of InfoSpace and was created by Erik Selberg.

Contents

It was originally a metasearch engine, as its name suggests. Throughout its lifetime it combined web search results from sources including Google, Yahoo!, Bing (formerly Live Search), Ask.com, About.com, MIVA, LookSmart and other search engine programs. MetaCrawler also provided users the option to search for images, video, news, business and personal telephone directories, and for a while even audio.

History

MetaCrawler was originally developed in 1994 at the University of Washington by graduate student Erik Selberg and Professor Oren Etzioni as Erik Selberg's Ph.D. qualifying project. [1] Originally, it was created in order to provide a reliable abstraction layer to web search engine programs in order to study semantic structure on the World Wide Web. However, it was a useful service in its own right, and had a number of research challenges. MetaCrawler was not, however, the first metasearch engine on the World Wide Web. That feat belongs to SavvySearch, developed at the Colorado State University, albeit launched just four months prior to MetaCrawler. [2]

MetaCrawler was originally operating on four Digital Equipment Corporation AlphaStations [3] and processing several hundred thousand queries per day. This was starting to create significant bandwidth load at UW. It became clear that MetaCrawler needed to have some method of paying for the queries it was forwarding to the primary search engines. Some time after the search engine launched, NetBot, Inc., which was cofounded by Etzioni, [4] was initiated to commercialize MetaCrawler [5] and three other UW programs: Ahoy! The HomePage Finder, Occam, and ShopBot. Ahoy! and Occam were never actually commercialized. NetBot then combined the core of MetaCrawler with ShopBot to create a meta-shopping website, Jango. [6]

MetaCrawler launched on July 7, 1995. [7]

MetaCrawler site in 1996 Metacrawler screenshot 1996.png
MetaCrawler site in 1996

As of late 1995, MetaCrawler logged over 7,000 search queries per week, and accessed six services: Galaxy, InfoSeek, Lycos, Open Text, WebCrawler and Yahoo. [8] By late 1996, there were over 150,000 queries per day. [9]

MetaCrawler's owners were unable to determine a reasonable business model, so in January 1997 they sold it to another Internet startup company, Go2Net, [10] in which Microsoft co-founder Paul Allen later invested a 54 percent stake. [11] Go2Net went public in April that year, registering on Nasdaq. [12] MetaCrawler had about 30,000 daily visitors at the start of 1997, but by mid 1998 jumped to 275,000. [13]

Old MetaCrawler logo, used c. 1997 to 2003 Metacrawler old logo.png
Old MetaCrawler logo, used c. 1997 to 2003

NetBot would eventually be purchased by Excite in October 1997 for $35 million, where Jango became part of the Excite Network Shopping Channel. [14] Both Selberg and Etzioni resumed working for UW until 1999, when they joined Go2Net for a year, quitting just prior to Go2Net's acquisition by InfoSpace, Inc. in July 2000 for $4.2 billion. [15] By that time, Go2Net had purchased another metasearch engine, Dogpile. [16]

In 2014, MetaCrawler was merged into another one of InfoSpace's search engines, Zoo.com, [17] which was originally launched in 2006. [18] The MetaCrawler domain at first redirected to Zoo.com, [19] [20] but was afterwards changed to redirect to msxml.excite.com, the search page for Excite, also operated by InfoSpace. [21] [22]

In July 2016, InfoSpace was sold by parent company Blucora to OpenMail for $45 million, putting MetaCrawler under the ownership of OpenMail. [23] OpenMail was later renamed System1. [24]

In 2017, MetaCrawler relaunched as its own search engine. [25]

See also

Related Research Articles

In general computing, a search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler, an indexer, and a database. The crawler traverses a document collection, deconstructs document text, and assigns surrogates for storage in the search engine index. Online search engines store images, link data and metadata for the document as well.

WebCrawler is a search engine, and one of the oldest surviving search engines on the web today. For many years, it operated as a metasearch engine. WebCrawler was the first web search engine to provide full text search.

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.

<span class="mw-page-title-main">Excite (web portal)</span> Internet portal

Excite is an American web portal operated by IAC that provides a variety of outsourced content including news and weather, a metasearch engine, and a user homepage. In the United States, the main Excite homepage had long been a personal start page called My Excite. Excite once operated a webmail service commonly known as Excite Mail until August 31, 2021.

<span class="mw-page-title-main">Dogpile</span> Metasearch engine

Dogpile is a metasearch engine for information on the World Wide Web that fetches results from Google, Yahoo!, Yandex, Bing, and other popular search engines, including those from audio and video content providers such as Yahoo!.

<span class="mw-page-title-main">Metasearch engine</span> ALO.Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

<span class="mw-page-title-main">Startpage.com</span> Privacy-focused search engine based in the Netherlands

Startpage is a Dutch search engine company that highlights privacy as its distinguishing feature. The website advertises that it allows users to obtain Google Search results while protecting users' privacy by not storing personal information or search data and removing all trackers. Startpage.com also includes an Anonymous View browsing feature that allows users the option to open search results via proxy for increased anonymity.

<span class="mw-page-title-main">Singingfish</span>

Singingfish was an audio/video search engine that powered audio video search for Windows Media Player, WindowsMedia.com, RealOne/RealPlayer, Real Guide, AOL Search, Dogpile, Metacrawler and Singingfish.com, among others. Launched in 2000, it was one of the earliest and longest lived search engines dedicated to multimedia content. Acquired in 2003 by AOL, it was slowly folded into the AOL search offerings and all web hits from RMC TV to Singingfish were being redirected to AOL Video and as of February 2007 Singingfish had ceased to exist as a separate service.

<span class="mw-page-title-main">InfoSpace</span>

Infospace, Inc. was an American company that offered private label search engine, online directory, and provider of metadata feeds. The company's flagship metasearch site was Dogpile and its other notable consumer brands were WebCrawler and MetaCrawler. After a 2012 rename to Blucora, the InfoSpace business unit was sold to data management company OpenMail.

<span class="mw-page-title-main">Search engine</span> Software system that is designed to search for information on the World Wide Web

A search engine is a software system that finds web pages that match a web search. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of hyperlinks to web pages, images, videos, infographics, articles, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories and social bookmarking sites, which are maintained by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Any internet-based content that cannot be indexed and searched by a web search engine falls under the category of deep web.

A comparison shopping website, sometimes called a price comparison website, price analysis tool, comparison shopping agent, shopbot, aggregator or comparison shopping engine, is a vertical search engine that shoppers use to filter and compare products based on price, features, reviews and other criteria. Most comparison shopping sites aggregate product listings from many different retailers but do not directly sell products themselves, instead earning money from affiliate marketing agreements. In the United Kingdom, these services made between £780m and £950m in revenue in 2005. Hence, E-commerce accounted for an 18.2 percent share of total business turnover in the United Kingdom in 2012. Online sales already account for 13% of the total UK economy, and its expected to increase to 15% by 2017. There is a huge contribution of comparison shopping websites in the expansion of the current E-commerce industry.

<span class="mw-page-title-main">Info.com</span>

Info is a metasearch engine, which as of 2013, provided results from search engines Google, Yahoo!, Ask, Bing, Yandex, and Open Directory. As of 2004, news search was powered by Topix.net, Info.com's web search engine information was powered by Shopping.com and Info.com had White Page and Yellow Page search. As of 2013, Info.com also had search plugins for Google Chrome, Internet Explorer and Firefox.

PolyCola, previously known as GahooYoogle, is a metasearch engine which was created by Arbel Hakopian.

<span class="mw-page-title-main">LeapFish</span>

LeapFish.com was a search aggregator that retrieved results from other portals and search engines, including Google, Bing and Yahoo!, and also search engines of blogs, videos etc. It was a registered trademark of Dotnext Inc, launched on 3 November 2008.

Netbot was the first commercial Internet price comparison service. Founded by University of Washington Computer Science professors Oren Etzioni and Daniel S. Weld the company was funded by ARCH Venture Partners, Alta Partners and the Madrona Venture Group, and the University of Washington was also a shareholder. Netbot introduced the Jango comparison shopping “agent” first as a browser plug-in and later as a server product. In addition, the company operated MetaCrawler, a metasearch engine, before licensing it to Go2Net. In October 1997, Netbot was acquired by the Excite portal for $35M.

<span class="mw-page-title-main">Oren Etzioni</span> American computer scientist and entrepreneur

Oren Etzioni is an American entrepreneur, Professor Emeritus of computer science, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). On June 15, 2022, he announced that he will step down as CEO of AI2 effective September 30, 2022. After that time, he will continue as a board member and advisor. Etzioni will also take the position of Technical Director of the AI2 Incubator.

<span class="mw-page-title-main">Searx</span> Metasearch engine

Searx is a free and open-source metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. To this end, Searx does not share users' IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked, preventing user-profiling-based results modification. By default, Searx queries are submitted via HTTP POST, to prevent users' query keywords from appearing in webserver logs. Searx was inspired by the Seeks project, though it does not implement Seeks' peer-to-peer user-sourced results ranking.

Zoo.com is a metasearch engine, which as of 2006, provided results from search engines and other sources, including Google, Yahoo! and Wikipedia. Also as of 2006, Zoo.com provided news content from ABC News, Fox News and Yahoo! News.

References

  1. "Erik Selberg IoT Conference Bio" . Retrieved 2019-09-25.
  2. jeff.dalton. "Meta-Search Part I: The Beginning". Archived from the original on 2019-01-26. Retrieved 2019-01-29.
  3. "Federated Search". federatedsearchblog.com. Retrieved 2019-01-29.
  4. "Robots Mailing List Archive: The Metacrawler, Reborn". webdoc.gwdg.de. Retrieved 2019-01-29.
  5. "ASEE PRISM - Apr 1999 - Cover Story - Building Strategic Partnerships". www.prism-magazine.org. Archived from the original on 2015-09-09. Retrieved 2019-01-29.
  6. "Shopping Engine History - SingleFeed, Shopping Engines made easy". www.singlefeed.com. Retrieved 2019-01-29.
  7. "Multi-Service Search and Comparison Using the MetaCrawler". www.w3.org. Retrieved 2019-02-12.
  8. "Multi-Service Search and Comparison Using the MetaCrawler". homes.cs.washington.edu. Retrieved 2019-01-29.
  9. "The MetaCrawler Architecture for Resource Aggregation on the Web". homes.cs.washington.edu. Retrieved 2019-01-29.
  10. "Robots Mailing List Archive: The Metacrawler, Reborn". webdoc.gwdg.de. Retrieved 2019-01-29.
  11. "Paul Allen sets 54-percent stake in Go2Net - Mar. 15, 1999". money.cnn.com. Retrieved 2019-01-29.
  12. Court, Randolph (1998-04-23). "Go2net's Low-Overhead Plan". Wired. ISSN   1059-1028 . Retrieved 2019-01-29.
  13. King, Suzanne. "O.P. Web site developer is Seattle-bound". American City Business Journals . Retrieved 2019-01-29.
  14. "Excite to buy NetBot". CNET. Retrieved 2019-01-29.
  15. Francisco, August Cole, Bambi. "InfoSpace to acquire Go2Net". MarketWatch. Retrieved 2019-01-29.{{cite web}}: CS1 maint: multiple names: authors list (link)
  16. "Google Added To Go2Net's MetaCrawler and Dogpile Metasearch Services – News announcements – News from Google – Google". googlepress.blogspot.com. Retrieved 2019-01-29.
  17. "MetaCrawler". 2014-01-01. Archived from the original on 2014-01-01. Retrieved 2019-01-29.
  18. "New Kid-Friendly Search Engine Zoo.com Offers Wealth of Information Without the Worry". www.businesswire.com. 2006-11-14. Retrieved 2019-01-29.
  19. "What is Zoo?". www.computerhope.com. Retrieved 2019-01-29.
  20. "Module 1". mason.gmu.edu. Archived from the original on 2019-01-26. Retrieved 2019-01-29.
  21. "Søgemaskiner - Search-on.dk JR Corp". www.search-on.dk. Retrieved 2019-01-29.
  22. "MetaCrawler". Archived from the original on 2016-01-01. Retrieved 2019-01-29.
  23. "Blucora to sell InfoSpace business for $45 million". Seattle Times. July 5, 2016.
  24. "System1 raises $270 million for 'consumer intent' advertising". L.A. Biz. Retrieved 2017-12-01.
  25. "MetaCrawler". 2017-04-03. Archived from the original on 2017-04-03. Retrieved 2019-01-29.