Google hacking

Last updated

[1] Google hacking, also named Google dorking, [2] [3] is a hacker technique that uses Google Search and other Google applications to find security holes in the configuration and computer code that websites are using.

Contents

Basics

Google hacking involves using operators in the Google search engine to locate specific sections of text on websites that are evidence of vulnerabilities, for example specific versions of vulnerable Web applications. A search query with intitle:admbook intitle:Fversion filetype:php would locate PHP web pages with the strings "admbook" and "Fversion" in their titles, indicating that the PHP based guestbook Admbook is used, an application with a known code injection vulnerability. It is normal for default installations of applications to include their running version in every page they serve, for example, "Powered by XOOPS 2.2.3 Final", which can be used to search for websites running vulnerable versions.

Devices connected to the Internet can be found. A search string such as inurl:"Mode=" will find public web cameras.

History

The concept of "Google hacking" dates back to August 2002, when Chris Sullo included the "nikto_google.plugin" in the 1.20 release of the Nikto vulnerability scanner. [4] In December 2002 Johnny Long began to collect Google search queries that uncovered vulnerable systems and/or sensitive information disclosures – labeling them googleDorks. [5]

The list of Google Dorks grew into a large dictionary of queries, which were eventually organized into the original Google Hacking Database (GHDB) in 2004. [6] [7]

Concepts explored in Google hacking have been extended to other search engines, such as Bing [8] and Shodan. [9] Automated attack tools [10] use custom search dictionaries to find vulnerable systems and sensitive information disclosures in public systems that have been indexed by search engines. [11]

Google Dorking has been involved in some notorious cybercrime cases, such as the Bowman Avenue Dam hack [12] and the CIA breach where around 70% of its worldwide networks were compromised. [13] Star Kashman, a legal scholar, has been one of the first to study the legality of this technique. [14] Kashman argues that while Google Dorking is technically legal, it has often been used to carry out cybercrime and frequently leads to violations of the Computer Fraud and Abuse Act. [15] Her research has highlighted the legal and ethical implications of this technique, emphasizing the need for greater attention and regulation to be applied to its use.

Protection

Robots.txt is a well known file for search engine optimization and protection against Google dorking. It involves the use of robots.txt to disallow everything or specific endpoints (hackers can still search robots.txt for endpoints) which prevents Google bots from crawling sensitive endpoints such as admin panels.

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

CiteSeerX is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Computer Fraud and Abuse Act</span> 1986 United States cybersecurity law

The Computer Fraud and Abuse Act of 1986 (CFAA) is a United States cybersecurity bill that was enacted in 1986 as an amendment to existing computer fraud law, which had been included in the Comprehensive Crime Control Act of 1984. Prior to computer-specific criminal laws, computer crimes were prosecuted as mail and wire fraud, but the applying law was often insufficient.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.

A black hat is a computer hacker who violates laws or ethical standards for nefarious purposes, such as cybercrime, cyberwarfare, or malice. These acts can range from piracy to identity theft. A Black hat is often referred to as a "cracker".

Footprinting is the technique used for gathering information about computer systems and the entities they belong to. To get this information, a hacker might use various tools and technologies. This information is very useful to a hacker who is trying to crack a whole system.

<span class="mw-page-title-main">Web API</span> HTTP-based application programming interface on the web

A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side. A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

Google Search Console is a web service by Google which allows webmasters to check indexing status, search queries, crawling errors and optimize visibility of their websites.

<span class="mw-page-title-main">DuckDuckGo</span> American software company and Web search engine

DuckDuckGo is an American software company that offers a number of products intended to help people protect their online privacy. The flagship product is a search engine that has been praised by privacy advocates. Subsequent products include extensions for all major web browsers and a custom DuckDuckGo web browser.

Cyberweapons are commonly defined as malware agents employed for military, paramilitary, or intelligence objectives as part of a cyberattack. This includes computer viruses, trojans, spyware, and worms that can introduce malicious code into existing software, causing a computer to perform actions or processes unintended by its operator.

Shodan is a search engine that lets users search for various types of servers connected to the internet using a variety of filters. Some have also described it as a search engine of service banners, which is metadata that the server sends back to the client. This can be information about the server software, what options the service supports, a welcome message or anything else that the client can find out before interacting with the server.

<span class="mw-page-title-main">Cassidy Wolf</span> American model

Cassidy Marie Wolf is an American TV host, model and beauty queen who was crowned Miss Teen USA 2013.

Blackshades is a malicious trojan horse used by hackers to control infected computers remotely. The malware targets computers using operating systems based on Microsoft Windows. According to US officials, over 500,000 computer systems have been infected worldwide with the software.

SCADA Strangelove is an independent group of information security researchers founded in 2012, focused on security assessment of industrial control systems (ICS) and SCADA.

<span class="mw-page-title-main">Vault 7</span> CIA files on cyber war and surveillance

Vault 7 is a series of documents that WikiLeaks began to publish on 7 March 2017, detailing the activities and capabilities of the United States Central Intelligence Agency (CIA) to perform electronic surveillance and cyber warfare. The files, dating from 2013 to 2016, include details on the agency's software capabilities, such as the ability to compromise cars, smart TVs, web browsers including Google Chrome, Microsoft Edge, Mozilla Firefox, and Opera, the operating systems of most smartphones including Apple's iOS and Google's Android, and computer operating systems including Microsoft Windows, macOS, and Linux. A CIA internal audit identified 91 malware tools out of more than 500 tools in use in 2016 being compromised by the release. The tools were developed by the Operations Support Branch of the CIA.

<span class="mw-page-title-main">Searx</span> Metasearch engine

Searx is a free and open-source metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. To this end, Searx does not share users' IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked, preventing user-profiling-based results modification. By default, Searx queries are submitted via HTTP POST, to prevent users' query keywords from appearing in webserver logs. Searx was inspired by the Seeks project, though it does not implement Seeks' peer-to-peer user-sourced results ranking.

References

  1. Schennikova, N. V. (2016). "LINGUISTIC PROJECTION: POLEMIC NOTES". University proceedings. Volga region. Humanities (4). doi: 10.21685/2072-3024-2016-4-12 . ISSN   2072-3024.
  2. "Term Of The Day: Google Dorking - Business Insider". Business Insider . Archived from the original on June 19, 2020. Retrieved January 17, 2016.
  3. Google dork query Archived January 16, 2020, at the Wayback Machine , techtarget.com
  4. "nikto-versions/nikto-1.20.tar.bz2 at master · sullo/nikto-versions". GitHub. Archived from the original on August 30, 2023. Retrieved August 30, 2023.
  5. "googleDorks created by Johnny Long". Johnny Long. Archived from the original on December 8, 2002. Retrieved December 8, 2002.
  6. "Google Hacking Database (GHDB) in 2004". Johnny Long. Archived from the original on July 7, 2007. Retrieved October 5, 2004.
  7. Google Hacking for Penetration Testers, Volume 1. Johnny Long. 2005. ISBN   1931836361.
  8. "Bing Hacking Database (BHDB) v2". Bishop Fox. July 15, 2013. Archived from the original on June 8, 2019. Retrieved August 27, 2014.
  9. "Shodan Hacking Database (SHDB) - Part of SearchDiggity tool suite". Bishop Fox. Archived from the original on June 8, 2019. Retrieved June 21, 2013.
  10. "SearchDiggity - Search Engine Attack Tool Suite". Bishop Fox. July 15, 2013. Archived from the original on June 8, 2019. Retrieved August 27, 2014.
  11. "Google Hacking History". Bishop Fox. July 15, 2013. Archived from the original on June 3, 2019. Retrieved August 27, 2014.
  12. "Seven Iranians Working for Islamic Revolutionary Guard Corps-Affiliated Entities Charged for Conducting Coordinated Campaign of Cyber Attacks Against U.S. Financial Sector". UNITED STATES DEPARTMENT OF JUSTICE. Archived from the original on September 24, 2023. Retrieved March 27, 2023.
  13. Gallagher, Sean. "How did Iran find Cia Spies? They googled it". Ars Technica. Archived from the original on October 18, 2023. Retrieved March 27, 2023.
  14. Kashman, Star (2023). "GOOGLE DORKING OR LEGAL HACKING: FROM THE CIA COMPROMISE TO YOUR CAMERAS AT HOME, WE ARE NOT AS SAFE AS WE THINK". Wash. J. L. Tech. & Arts. 18 (2).
  15. Kashman, Star (2023). "GOOGLE DORKING OR LEGAL HACKING: FROM THE CIA COMPROMISE TO YOUR CAMERAS AT HOME, WE ARE NOT AS SAFE AS WE THINK". Washington Journal of Law, Technology & Arts. 18 (2): 1. Archived from the original on October 23, 2023. Retrieved March 27, 2023.