Ahmia

Last updated
Ahmia
Ahmia screenshot.svg
Type of site
Web search engine
Created byJuha Nurmi [1]
URL ahmia.fi
juhanurmihxlp77nkq76byazcldy2hlmovfu2epvl5ankdibsot4csyd.onion Tor-logo-2011-flat.svg (Accessing link help)
Launched2014;11 years ago (2014) [1]
Current statusOnline

Ahmia is a clearnet search engine for Tor's onion services created by Juha Nurmi in 2014. [2] Ahmia is accessible through both its clearweb website and its onion service version. It is one of the primary tools used by Tor users to discover and access onion websites. [3]

Contents

Overview

Developed during the 2014 Google Summer of Code by Juha Nurmi with support from the Tor Project, [1] the open source [4] . Ahmia indexes onion websites on the Tor network. [5] The search engine is open-source: the crawler component is based on Scrapy, [6] the index component is built with Elasticsearch, [7] and the website component is developed with Django. [8]

Ahmia has a strict policy of filtering child sexual abuse material, and since October 2023, Ahmia has expanded its filter to include all sexually related searches due to the unfortunate widespread distribution and search of child sexual abuse on Tor. [9] In a study in Scientific Reports explained the filtering policies for Ahmia and its role in combating the distribution of illicit content on the Tor network. The paper also acknowledged the contributions of the first author, Juha Nurmi, the creator of Ahmia as he expanded filtering policies in November 2023. According to the scientific publication, the decision to broaden content filtering was the result of the research findings, which showed that 11 percent of search sessions sought child sexual abuse material on Tor and that around one-fifth of onion websites hosted such unlawful content. [10]

The service partners with GlobaLeaks's submissions and Tor2web statistics for hidden service discovery [11] and as of July 2015 has indexed about 5000 sites. [12] Ahmia is also affiliated with Hermes Center for Transparency and Digital Rights, an organization that promotes transparency and freedom-enabling technologies. [13]

In July 2015 the site published a list of hundreds of fraudulent clones of web pages (including such sites as DuckDuckGo, as well a dark web page). [14] [15] According to Nurmi, "someone runs a fake site on a similar address to the original one and tries to fool people with that" with the intent of scamming people (e.g. gathering bitcoin money by spoofing bitcoin addresses). [16]

See also

Related Research Articles

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

robots.txt Filename used to indicate portions for web crawling

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid search traffic rather than direct traffic, referral traffic, social media traffic, or paid traffic.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

<span class="mw-page-title-main">YaCy</span> Peer-to-peer search engine

YaCy is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003. The engine is written in Java and distributed on several hundred computers, as of September 2006, so-called YaCy-peers.

<span class="mw-page-title-main">.onion</span> Pseudo–top-level internet domain

.onion is a special-use top-level domain name designating an anonymous onion service, which was formerly known as a "hidden service", reachable via the Tor network. Such addresses are not actual DNS names, and the .onion TLD is not in the Internet DNS root, but with the appropriate proxy software installed, Internet programs such as web browsers can access sites with .onion addresses by sending the request through the Tor network.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

A vertical search engine is distinct from a general web search engine, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the Library of Congress, Mocavo, Nuroa, Trulia, and Yelp.

<span class="mw-page-title-main">Tor (network)</span> Free and open-source anonymity network based on onion routing

Tor is a free overlay network for enabling anonymous communication. Built on free and open-source software and more than seven thousand volunteer-operated relays worldwide, users can have their Internet traffic routed via a random path through the network.

<span class="mw-page-title-main">DuckDuckGo</span> American software company and Web search engine

DuckDuckGo is an American software company focused on online privacy, whose flagship product is a search engine of the same name. Founded by Gabriel Weinberg in 2008, its later products include browser extensions and a custom DuckDuckGo web browser. Headquartered in Paoli, Pennsylvania, DuckDuckGo is a privately held company with about 200 employees. The company's name is a reference to the children's game duck, duck, goose.

<span class="mw-page-title-main">BTDigg</span> Search engine

BTDigg is the first Mainline DHT search engine. It participated in the BitTorrent DHT network, supporting the network and making correspondence between magnet links and a few torrent attributes which are indexed and inserted into a database. For end users, BTDigg provides a full-text database search via a Web interface. The Web part of its search system retrieved proper information by a user's text query. The Web search supported queries in European and Asian languages. The project name was an acronym of BitTorrent Digger. It went offline in June 2016, reportedly due to index spam. However in 2024 it appears to be back online but due to IP Filtering. A lot of IP Addresses cannot access btdig.com but they can access the tor link due to how tor works.

Seeks is a free and open-source project licensed under the GNU Affero General Public License version 3 (AGPL-3.0-or-later). It exists to create an alternative to the current market-leading search engines, driven by user concerns rather than corporate interests. The original manifesto was created by Emmanuel Benazera and Sylvio Drouin and published in October 2006. The project was under active development until April 2014, with both stable releases of the engine and revisions of the source code available for public use. In September 2011, Seeks won an innovation award at the Open World Forum Innovation Awards. The Seeks source code has not been updated since April 28, 2014 and no Seeks nodes have been usable since February 6, 2016.

Elasticsearch is a search engine based on Apache Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Official clients are available in Java, .NET (C#), PHP, Python, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine.

<span class="mw-page-title-main">The Hidden Wiki</span> Defunct Tor wiki

The Hidden Wiki was a dark web MediaWiki wiki operating as a Tor hidden service that could be anonymously edited after registering on the site. The main page served as a directory of links to other .onion sites.

The dark web is the World Wide Web content that exists on darknets that use the Internet but require specific software, configurations, or authorization to access. Through the dark web, private computer networks can communicate and conduct business anonymously without divulging identifying information, such as a user's location. The dark web forms a small part of the deep web, the part of the web not indexed by web search engines, although sometimes the term deep web is mistakenly used to refer specifically to the dark web.

Grams was a search engine for Tor based darknet markets launched in April 2014, and closed in December 2017. The service allowed users to search multiple darknet markets for products like drugs and guns from a simple search interface, and also provided the capability for its users to hide their transactions through its bitcoin tumbler Helix.

StormCrawler is an open-source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache License and is written mostly in Java.

References

  1. 1 2 3 "Tor ♥ Ahmia Project: Supporting Google Summer of Code 2014". Tor Project. Retrieved 6 January 2025.
  2. Nurmi, Juha. "About Ahmia". Ahmia. Retrieved 5 January 2025.
  3. Winter, Philipp; Edmundson, Anne; Roberts, Laura M.; Dutkowska-Żuk, Agnieszka; Chetty, Marshini; Feamster, Nick (2018). How do Tor users interact with onion services? (PDF). 27th USENIX Security Symposium (USENIX Security 18). Retrieved 6 January 2025.
  4. Greif, Björn (14 July 2015). "Gefälschte .onion-Websites spähen Tor-Nutzer aus" (in German). ZDNet. Retrieved 4 August 2015.
  5. "Google Can't Search the Deep Web, So How Do Deep Web Search Engines Work?: Networks Course blog for INFO 2040/CS 2850/Econ 2040/SOC 2090" . Retrieved 2019-03-07.
  6. "Ahmia Crawler: Open-source crawler for Ahmia search engine". GitHub. Retrieved 6 January 2025.
  7. "Ahmia Index: Elasticsearch-based indexing component for Ahmia search engine". GitHub. Retrieved 6 January 2025.
  8. "Ahmia Site: Open-source website component for Ahmia search engine". GitHub. Retrieved 6 January 2025.
  9. Nurmi, Juha. "Ahmia Legal Disclaimer". Ahmia. Retrieved 5 January 2025.
  10. Nurmi, Juha; Paju, Arttu; Brumley, Billy Bob; Insoll, Tegan; Ovaska, Anna K.; Soloveva, Valeriia; Vaaranen-Valkonen, Nina; Aaltonen, Mikko; Arroyo, David (2024-04-03). "Investigating child sexual abuse material availability, searches, and users on the anonymous Tor network for a public health intervention strategy". Scientific Reports. 14: 7849. arXiv: 2404.14112 . doi:10.1038/s41598-024-58346-7 . Retrieved 2025-01-06.
  11. "About us" . Retrieved 3 August 2015.
  12. Leyden, John (7 Jul 2015). "Heart of Darkness: Mass of clone scam sites appear". The Register. Retrieved 3 August 2015.
  13. "The new search engines shining a light on the Deep Web". The Kernel. 2014-09-28. Archived from the original on 2020-03-27. Retrieved 2019-03-07.
  14. MacGregor, Alice (1 July 2015). "Hundreds of Dark Web mirror sites 'booby-trapping' Tor users". Archived from the original on 20 July 2015. Retrieved 3 August 2015.
  15. Marwan, Peter (14 July 2015). "Anonymität von TOR-Nutzern durch Fake-Websites gefährdet" (in German). ITespresso. Retrieved 4 August 2015.
  16. Weissman, Cale Guthrie (July 2, 2015). "Someone is creating fake websites on the dark web to try to lure in and hack people". Business Insider. Retrieved 2019-03-07.