Type of site | Web search engine |
---|---|
Created by | Juha Nurmi [1] |
URL | ahmia juhanurmihxlp77nkq76byazcldy2hlmovfu2epvl5ankdibsot4csyd.onion (Accessing link help) |
Launched | 2014[1] |
Current status | Online |
Ahmia is a clearnet search engine for Tor's onion services created by Juha Nurmi in 2014. [2] Ahmia is accessible through both its clearweb website and its onion service version. It is one of the primary tools used by Tor users to discover and access onion websites. [3]
Developed during the 2014 Google Summer of Code by Juha Nurmi with support from the Tor Project, [1] the open source [4] . Ahmia indexes onion websites on the Tor network. [5] The search engine is open-source: the crawler component is based on Scrapy, [6] the index component is built with Elasticsearch, [7] and the website component is developed with Django. [8]
Ahmia has a strict policy of filtering child sexual abuse material, and since October 2023, Ahmia has expanded its filter to include all sexually related searches due to the unfortunate widespread distribution and search of child sexual abuse on Tor. [9] In a study in Scientific Reports explained the filtering policies for Ahmia and its role in combating the distribution of illicit content on the Tor network. The paper also acknowledged the contributions of the first author, Juha Nurmi, the creator of Ahmia as he expanded filtering policies in November 2023. According to the scientific publication, the decision to broaden content filtering was the result of the research findings, which showed that 11 percent of search sessions sought child sexual abuse material on Tor and that around one-fifth of onion websites hosted such unlawful content. [10]
The service partners with GlobaLeaks's submissions and Tor2web statistics for hidden service discovery [11] and as of July 2015 has indexed about 5000 sites. [12] Ahmia is also affiliated with Hermes Center for Transparency and Digital Rights, an organization that promotes transparency and freedom-enabling technologies. [13]
In July 2015 the site published a list of hundreds of fraudulent clones of web pages (including such sites as DuckDuckGo, as well a dark web page). [14] [15] According to Nurmi, "someone runs a fake site on a similar address to the original one and tries to fool people with that" with the intent of scamming people (e.g. gathering bitcoin money by spoofing bitcoin addresses). [16]
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid search traffic rather than direct traffic, referral traffic, social media traffic, or paid traffic.
Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.
The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.
A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.
YaCy is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003. The engine is written in Java and distributed on several hundred computers, as of September 2006, so-called YaCy-peers.
.onion is a special-use top-level domain name designating an anonymous onion service, which was formerly known as a "hidden service", reachable via the Tor network. Such addresses are not actual DNS names, and the .onion TLD is not in the Internet DNS root, but with the appropriate proxy software installed, Internet programs such as web browsers can access sites with .onion addresses by sending the request through the Tor network.
A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.
A vertical search engine is distinct from a general web search engine, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the Library of Congress, Mocavo, Nuroa, Trulia, and Yelp.
Tor is a free overlay network for enabling anonymous communication. Built on free and open-source software and more than seven thousand volunteer-operated relays worldwide, users can have their Internet traffic routed via a random path through the network.
DuckDuckGo is an American software company focused on online privacy, whose flagship product is a search engine of the same name. Founded by Gabriel Weinberg in 2008, its later products include browser extensions and a custom DuckDuckGo web browser. Headquartered in Paoli, Pennsylvania, DuckDuckGo is a privately held company with about 200 employees. The company's name is a reference to the children's game duck, duck, goose.
BTDigg is the first Mainline DHT search engine. It participated in the BitTorrent DHT network, supporting the network and making correspondence between magnet links and a few torrent attributes which are indexed and inserted into a database. For end users, BTDigg provides a full-text database search via a Web interface. The Web part of its search system retrieved proper information by a user's text query. The Web search supported queries in European and Asian languages. The project name was an acronym of BitTorrent Digger. It went offline in June 2016, reportedly due to index spam. However in 2024 it appears to be back online but due to IP Filtering. A lot of IP Addresses cannot access btdig.com but they can access the tor link due to how tor works.
Seeks is a free and open-source project licensed under the GNU Affero General Public License version 3 (AGPL-3.0-or-later). It exists to create an alternative to the current market-leading search engines, driven by user concerns rather than corporate interests. The original manifesto was created by Emmanuel Benazera and Sylvio Drouin and published in October 2006. The project was under active development until April 2014, with both stable releases of the engine and revisions of the source code available for public use. In September 2011, Seeks won an innovation award at the Open World Forum Innovation Awards. The Seeks source code has not been updated since April 28, 2014 and no Seeks nodes have been usable since February 6, 2016.
Elasticsearch is a search engine based on Apache Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Official clients are available in Java, .NET (C#), PHP, Python, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine.
The Hidden Wiki was a dark web MediaWiki wiki operating as a Tor hidden service that could be anonymously edited after registering on the site. The main page served as a directory of links to other .onion sites.
The dark web is the World Wide Web content that exists on darknets that use the Internet but require specific software, configurations, or authorization to access. Through the dark web, private computer networks can communicate and conduct business anonymously without divulging identifying information, such as a user's location. The dark web forms a small part of the deep web, the part of the web not indexed by web search engines, although sometimes the term deep web is mistakenly used to refer specifically to the dark web.
Grams was a search engine for Tor based darknet markets launched in April 2014, and closed in December 2017. The service allowed users to search multiple darknet markets for products like drugs and guns from a simple search interface, and also provided the capability for its users to hide their transactions through its bitcoin tumbler Helix.
StormCrawler is an open-source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache License and is written mostly in Java.