Archie (search engine)

Last updated

Archie
Type of site
Web search engine
OwnerAlan Emtage
URL archie.icm.edu.pl/archie_eng.html
Launched10 September 1990;33 years ago (1990-09-10) [1]
Current statusOffline [2]

Archie is a tool for indexing FTP archives, allowing users to more easily identify specific files. It is considered the first Internet search engine. [3] The original implementation was written in 1990 by Alan Emtage, then a postgraduate student at McGill University in Montreal, Canada. [4] [5] [6] [7]

Contents

Screenshot of Archie Archie search engine.png
Screenshot of Archie

Archie was superseded by other, more sophisticated search engines, including Jughead and Veronica. These were in turn superseded by search engines like Yahoo! in 1995 and Google in 1998. Work on Archie ceased in the late 1990s. A legacy Archie server was maintained for historic purposes in Poland at University of Warsaw's Interdisciplinary Centre for Mathematical and Computational Modelling.

Origin

Archie first appeared in 1986, while Emtage was the systems manager at the McGill University School of Computer Science. His predecessor had attempted to persuade the institution to connect to the Internet, but due to the expensive cost — roughly $35,000 per year for a sluggish link to Boston — it had been challenging to persuade the appropriate parties that the investment was worthwhile. [8]

The name derives from the word "archive" without the v. Emtage has said that contrary to popular belief, there was no association with the Archie Comics. [9] Despite this, other early Internet search technologies such as Jughead and Veronica were named after characters from the comics. Anarchie, one of the earliest graphical FTP clients was named for its ability to perform Archie searches.

How Archie worked

The earliest versions of Archie would simply search a list of public anonymous File Transfer Protocol (FTP) sites using the Telnet protocol and create index files available via FTP. To view the contents of a file, it had first to be downloaded. The indexes are updated on a regular basis (contacting each roughly once a month, so as not to waste too many resources of the remote servers) and requested a listing. These listings were stored in local files to be searched using the Unix grep command.

The developers populated the engine's servers with databases of anonymous FTP host directories. [10] This was used to find specific file titles since the list was plugged in to a searchable database of FTP sites. [11] Archie did not recognize natural language requests nor index the content inside the files. Therefore, users had to know the title of the file they wanted. The ability to index the content inside the files was later introduced by Gopher.

Development

Emtage and Heelan wrote a script allowing people to log in and search collected information using the Telnet protocol at the host "archie.mcgill.ca" [132.206.2.3]. [12] Later, more efficient front- and back-ends were developed, and the system spread from a local tool, to a network-wide resource, and a popular service available from multiple sites around the Internet. The collected data would be exchanged between the neighbouring Archie servers. The servers could be accessed in multiple ways: using a local client (such as archie or xarchie); telnetting to a server directly; sending queries by electronic mail; [13] and later via a World Wide Web interface. At the peak of its popularity, the Archie search engine accounted for 50% of Montreal Internet traffic. [14]

In 1992, Emtage, along with Deutsch and some financial help from McGill University, formed Bunyip Information Systems with a licensed commercial version of the Archie search engine used by millions of people worldwide. Heelan followed them into Bunyip soon after, where he together with Bibi Ali and Sandro Mazzucato significantly updated the Archie database and indexed web pages. Work on the search engine ceased in the late 1990s.

See also

Related Research Articles

The Gopher protocol is a communication protocol designed for distributing, searching, and retrieving documents in Internet Protocol networks. The design of the Gopher protocol and user interface is menu-driven, and presented an alternative to the World Wide Web in its early stages, but ultimately fell into disfavor, yielding to HTTP. The Gopher ecosystem is often regarded as the effective predecessor of the World Wide Web.

Telnet is a client/server application protocol that provides access to virtual terminals of remote systems on local area networks or the Internet. It is a protocol for bidirectional 8-bit communications. Its main goal was to connect terminal devices and terminal-oriented processes.

Wide Area Information Server (WAIS) is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" (Z39.50:1988) to search index databases on remote computers. It was developed in 1990 as a project of Thinking Machines, Apple Computer, Dow Jones, and KPMG Peat Marwick.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a plain-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

WebDAV is a set of extensions to the Hypertext Transfer Protocol (HTTP), which allows user agents to collaboratively author contents directly in an HTTP web server by providing facilities for concurrency control and namespace operations, thus allowing Web to be viewed as a writeable, collaborative medium and not just a read-only medium. WebDAV is defined in RFC 4918 by a working group of the Internet Engineering Task Force (IETF).

Veronica was a search engine system for the Gopher protocol, released in November 1992 by Steven Foster and Fred Barrie at the University of Nevada, Reno.

This page provides an index of articles thought to be Internet or Web related topics.

Jughead is a search engine system for the Gopher protocol. It is distinct from Veronica in that it searches a single server at a time.

inetd is a super-server daemon on many Unix systems that provides Internet services. For each configured service, it listens for requests from connecting clients. Requests are served by spawning a process which runs the appropriate executable, but simple services such as echo are served by inetd itself. External executables, which are run on request, can be single- or multi-threaded. First appearing in 4.3BSD, it is generally located at /usr/sbin/inetd. inetd is based on the (service) activator pattern

<span class="mw-page-title-main">Alan Emtage</span> Bajan computer scientist

Alan Emtage is a Bajan-Canadian computer scientist who conceived and implemented the first version of Archie, a pre-Web Internet search engine for locating material in public FTP archives. It is widely considered the world's first Internet search engine.

HyTelnet was an early attempt to create a universal or at least simpler interface for the various Telnet-based information resources available before the World Wide Web. It was first developed in 1990 by Peter Scott, then at the University of Saskatchewan in Saskatoon, Canada. Using a client written by Earl Fogel, HyTelnet offered its users a primitive terminal-based GUI that allowed them to browse a directory of Telnet-based resources and then access them in a relatively standardized manner. On-line help was available, and there were frequent updates made available to its database which sites could download.

Rick Gates is an Internet pioneer mostly known for organizing The Internet Hunt and developing the concept of Interpedia. He studied at the Graduate Library School at the University of Arizona.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

<span class="mw-page-title-main">McGill University School of Computer Science</span>

The School of Computer Science is an academic department in the Faculty of Science at McGill University in Montreal, Quebec, Canada. The School is the second most funded computer science department in Canada. As of 2024, it has 46 faculty members, 60 Ph.D. students and 100 Master's students.

<span class="mw-page-title-main">Proxy list</span>

A proxy list is a list of open HTTP/HTTPS/SOCKS proxy servers all on one website. Proxies allow users to make indirect network connections to other computer network services. Proxy lists include the IP addresses of computers hosting open proxy servers, meaning that these proxy servers are available to anyone on the internet. Proxy lists are often organized by the various proxy protocols the servers use. Many proxy lists index, which can be used without changing browser settings.

An anonymizer or an anonymous proxy is a tool that attempts to make activity on the Internet untraceable. It is a proxy server computer that acts as an intermediary and privacy shield between a client computer and the rest of the Internet. It accesses the Internet on the user's behalf, protecting personal information of the user by hiding the client computer's identifying information such as IP addresses. Anonymous proxy is the opposite of transparent proxy, which sends user information in the connection request header.

Banner grabbing is a technique used to gain information about a computer system on a network and the services running on its open ports. Administrators can use this to take inventory of the systems and services on their network. However, an intruder can use banner grabbing in order to find network hosts that are running versions of applications and operating systems with known exploits.

The Internet Hunt was a monthly online game and search training tool, conceived and conducted by Rick Gates, as Director of Library Automation UC Santa Barbara, which began 31 August 1992, before the World Wide Web.

Agora was a World Wide Web email browser and was a proof of concept to help people to use the full internet. Agora was an email-based web browser designed for non-graphic terminals and to help people without full access to the internet such as in developing countries or without a permanent internet connection. Similar to W3Gate, Agora was a server application designed to fetch HTML documents through e-mail rather than http.

<span class="mw-page-title-main">Timeline of web search engines</span>

This page provides a full timeline of web search engines, starting from the WHOis in 1982, the Archie search engine in 1990, and subsequent developments in the field. It is complementary to the history of web search engines page that provides more qualitative detail on the history.

References

  1. Deutsch, Peter (11 September 1990). "[next] An Internet archive server server (was about Lisp)" . Retrieved 29 December 2017.
  2. Search seems to be dead (timeout). Last update of the database: 2011 ()
  3. "The First Search Engine, Archie". Archived from the original on 21 June 2007. Retrieved 26 May 2007.
  4. "Archie". PC Magazine. Retrieved 20 September 2020.
  5. Alexandra Samuel (21 February 2017). "Meet Alan Emtage, the Black Technologist Who Invented ARCHIE, the First Internet Search Engine". ITHAKA . Retrieved 20 September 2020.
  6. loop news barbados (30 August 2019). "Alan Emtage- a Barbadian you should know". loopnewsbarbados.com. Retrieved 28 April 2022.
  7. Dino Grandoni, Alan Emtage (April 2013). "Alan Emtage: The Man Who Invented The World's First Search Engine (But Didn't Patent It)". HuffPost . Retrieved 21 September 2020.
  8. "Article by Kevin Savetz (, )". 9 July 2015. Archived from the original on 9 July 2015. Retrieved 18 March 2023.
  9. BBC Radio 4 - Saturday Live, 7 November 2009
  10. West, Nicholas. A Rough Guide to the Internet. Lulu.com. ISBN   9781471005374.
  11. Ledford, Jerri L. (2015). Search Engine Optimization Bible. Hoboken, NJ: John Wiley & Sons. p. 4. ISBN   9780470452646.
  12. "Peter Deutsch: archie - An Electronic Directory Service for the Internet" . Retrieved 23 February 2012.
  13. "EFF's (Extended) Guide to the Internet - Your Friend Archie". www2.cs.duke.edu. 12 September 1994. Retrieved 8 January 2020.
  14. Deutsch, P. (2000). "Archie-a Darwinian development process". IEEE Internet Computing. 4: 69–71. doi:10.1109/4236.815865 . Retrieved 14 December 2023.

Further reading