ALIWEB

Last updated
ALIWEB
Type of site
Search engine
URL ALIWEB at the Wayback Machine (archived 18 June 1997)
LaunchedMay 1994;29 years ago (1994-05)
Current statusDefunct

ALIWEB (Archie-Like Indexing for the Web) is considered the second Web search engine after JumpStation.

First announced in November 1993 [1] by developer Martijn Koster while working at Nexor, and presented in May 1994 [2] at the First International Conference on the World Wide Web at CERN in Geneva, ALIWEB preceded WebCrawler by several months. [3]

ALIWEB allowed users to submit the locations of index files on their sites [3] [4] which enabled the search engine to include webpages and add user-written page descriptions and keywords. This empowered webmasters to define the terms that would lead users to their pages, and also avoided setting bots (e.g. the Wanderer, JumpStation) which used up bandwidth. As relatively few people submitted their sites, ALIWEB was not very widely used.

Martijn Koster, who was also instrumental in the creation of the Robots Exclusion Standard, [5] [6] detailed the background and objectives of ALIWEB with an overview of its functions and framework in the paper he presented at CERN. [2]

Koster is not associated with a commercial website posing as ALIWEB. [7]

See also

Related Research Articles

Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head elements and attributes.

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Website</span> Set of related web pages served from a single domain

A website is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, education, commerce, entertainment, or social media. Hyperlinking between web pages guides the navigation of the site, which often starts with a home page. The most-visited sites are Google, YouTube, and Facebook.

Web design encompasses many different skills and disciplines in the production and maintenance of websites. The different areas of web design include web graphic design; user interface design ; authoring, including standardised code and proprietary software; user experience design ; and search engine optimization. Often many individuals will work in teams covering different aspects of the design process, although some designers will cover them all. The term "web design" is normally used to describe the design process relating to the front-end design of a website including writing markup. Web design partially overlaps web engineering in the broader scope of web development. Web designers are expected to have an awareness of usability and be up to date with web accessibility guidelines.

<span class="mw-page-title-main">Archie (search engine)</span> FTP search engine

Archie is a tool for indexing FTP archives, allowing users to more easily identify specific files. It is considered the first Internet search engine. The original implementation was written in 1990 by Alan Emtage, then a postgraduate student at McGill University in Montreal, Canada.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

In the context of the World Wide Web, deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website, rather than the website's home page. The URL contains all the information needed to point to a particular item. Deep linking is different from mobile deep linking, which refers to directly linking to in-app content using a non-HTTP URI.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. Googlebot was created to function concurrently on thousands of machines in order to enhance its performance and adapt to the expanding size of the internet. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines often calculate a site's importance according to the number of hyperlinks from other sites, the nofollow setting allows website authors to indicate that the presence of a link is not an endorsement of the target site's importance.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

<span class="mw-page-title-main">History of the World Wide Web</span> Information system running in the Internet

The World Wide Web is a global information medium that users can access via computers connected to the Internet. The term is often mistakenly used as a synonym for the Internet, but the Web is a service that operates over the Internet, just as email and Usenet do. The history of the Internet and the history of hypertext date back significantly further than that of the World Wide Web.

<span class="mw-page-title-main">JumpStation</span>

JumpStation was the first WWW search engine that behaved, and appeared to the user, the way current web search engines do. It started indexing on 12 December 1993 and was announced on the Mosaic "What's New" webpage on 21 December 1993. It was hosted at the University of Stirling in Scotland.

Yandex Search is a search engine owned by the company Yandex, based in Russia. In January 2015, Yandex Search generated 51.2% of all of the search traffic in Russia according to LiveInternet.

<span class="mw-page-title-main">Nexor</span>

Nexor Limited is a privately held company based in Nottingham, providing products and services to safeguard government, defence and critical national infrastructure computer systems. It was originally known as X-Tel Services Limited.

Martijn Koster is a Dutch software engineer noted for his pioneering work on Internet searching.

References

  1. Martijn Koster (30 November 1993). "ANNOUNCEMENT: ALIWEB (Archie-Like Indexing for the WEB)". comp.infosystems).
  2. 1 2 "List of PostScript files for the WWW94 advance proceedings". First International Conference on the World-Wide Web. June 1994. Archived from the original on 2018-05-08. Retrieved 2007-06-03. Title: "Aliweb - Archie-Like Indexing in the Web." Author: Martijn Koster. Institute: NEXOR Ltd., UK. PostScript, Size: 213616, Printed: 10 pages
  3. 1 2 Chris Sherman (3 December 2002). "Happy Birthday, Aliweb!". Search Engine Watch . Archived from the original on 2006-10-17. Retrieved 2007-01-03.
  4. Wes Sonnenreich (1997). "A History of Search Engines". John Wiley & Sons website.
  5. Martijn Koster. "Robots Exclusion". robotstxt.org. Archived from the original on 2007-11-07. Retrieved 2007-06-03.
  6. Martijn Koster. "Robots in the Web: threat or treat?". Reprinted with permission from ConneXions, The Interoperability Report, Volume 9, No. 4, April 1995. Archived from the original on 2007-01-02. Retrieved 2007-01-03.
  7. Martijn Koster. "Historical Web Services: ALIWEB". Martijn Koster's Historical Web Services page. Archived from the original on 2007-01-16. Note that I have nothing to do with aliweb.com. It appears some marketing company has taken the old aliweb code and data, and are using it as a site for advertising purposes. Their search results are worthless. Their claim to have trademarked "aliweb" I have been unable to confirm in patent searches. My recommendation is that you avoid them.