[1] Google hacking, also named Google dorking, [2] [3] is a hacker technique that uses Google Search and other Google applications to find security holes in the configuration and computer code that websites are using.
Google hacking involves using operators in the Google search engine to locate specific sections of text on websites that are evidence of vulnerabilities, for example specific versions of vulnerable Web applications. A search query with intitle:admbook intitle:Fversion filetype:php
would locate PHP web pages with the strings "admbook" and "Fversion" in their titles, indicating that the PHP based guestbook Admbook is used, an application with a known code injection vulnerability. It is normal for default installations of applications to include their running version in every page they serve, for example, "Powered by XOOPS 2.2.3 Final", which can be used to search for websites running vulnerable versions.
Devices connected to the Internet can be found. A search string such as inurl:"Mode="
will find public web cameras.
The concept of "Google hacking" dates back to August 2002, when Chris Sullo included the "nikto_google.plugin" in the 1.20 release of the Nikto vulnerability scanner. [4] In December 2002 Johnny Long began to collect Google search queries that uncovered vulnerable systems and/or sensitive information disclosures – labeling them googleDorks. [5]
The list of Google Dorks grew into a large dictionary of queries, which were eventually organized into the original Google Hacking Database (GHDB) in 2004. [6] [7]
Concepts explored in Google hacking have been extended to other search engines, such as Bing [8] and Shodan. [9] Automated attack tools [10] use custom search dictionaries to find vulnerable systems and sensitive information disclosures in public systems that have been indexed by search engines. [11]
Google Dorking has been involved in some notorious cybercrime cases, such as the Bowman Avenue Dam hack [12] and the CIA breach where around 70% of its worldwide networks were compromised. [13] Star Kashman, a legal scholar, has been one of the first to study the legality of this technique. [14] Kashman argues that while Google Dorking is technically legal, it has often been used to carry out cybercrime and frequently leads to violations of the Computer Fraud and Abuse Act. [15] Her research has highlighted the legal and ethical implications of this technique, emphasizing the need for greater attention and regulation to be applied to its use.
Robots.txt is a well known file for search engine optimization and protection against Google dorking. It involves the use of robots.txt to disallow everything or specific endpoints (hackers can still search robots.txt for endpoints) which prevents Google bots from crawling sensitive endpoints such as admin panels.
Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
CiteSeerX is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.
The Computer Fraud and Abuse Act of 1986 (CFAA) is a United States cybersecurity bill that was enacted in 1986 as an amendment to existing computer fraud law, which had been included in the Comprehensive Crime Control Act of 1984. Prior to computer-specific criminal laws, computer crimes were prosecuted as mail and wire fraud, but the applying law was often insufficient.
Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.
The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.
A black hat is a computer hacker who violates laws or ethical standards for nefarious purposes, such as cybercrime, cyberwarfare, or malice. These acts can range from piracy to identity theft. A Black hat is often referred to as a "cracker".
Footprinting is the technique used for gathering information about computer systems and the entities they belong to. To get this information, a hacker might use various tools and technologies. This information is very useful to a hacker who is trying to crack a whole system.
A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side. A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application.
A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.
Google Search Console is a web service by Google which allows webmasters to check indexing status, search queries, crawling errors and optimize visibility of their websites.
DuckDuckGo is an American software company that offers a number of products intended to help people protect their online privacy. The flagship product is a search engine that has been praised by privacy advocates. Subsequent products include extensions for all major web browsers and a custom DuckDuckGo web browser.
Cyberweapons are commonly defined as malware agents employed for military, paramilitary, or intelligence objectives as part of a cyberattack. This includes computer viruses, trojans, spyware, and worms that can introduce malicious code into existing software, causing a computer to perform actions or processes unintended by its operator.
Shodan is a search engine that lets users search for various types of servers connected to the internet using a variety of filters. Some have also described it as a search engine of service banners, which is metadata that the server sends back to the client. This can be information about the server software, what options the service supports, a welcome message or anything else that the client can find out before interacting with the server.
Cassidy Marie Wolf is an American TV host, model and beauty queen who was crowned Miss Teen USA 2013.
Blackshades is a malicious trojan horse used by hackers to control infected computers remotely. The malware targets computers using operating systems based on Microsoft Windows. According to US officials, over 500,000 computer systems have been infected worldwide with the software.
SCADA Strangelove is an independent group of information security researchers founded in 2012, focused on security assessment of industrial control systems (ICS) and SCADA.
Vault 7 is a series of documents that WikiLeaks began to publish on 7 March 2017, detailing the activities and capabilities of the United States Central Intelligence Agency (CIA) to perform electronic surveillance and cyber warfare. The files, dating from 2013 to 2016, include details on the agency's software capabilities, such as the ability to compromise cars, smart TVs, web browsers including Google Chrome, Microsoft Edge, Mozilla Firefox, and Opera, the operating systems of most smartphones including Apple's iOS and Google's Android, and computer operating systems including Microsoft Windows, macOS, and Linux. A CIA internal audit identified 91 malware tools out of more than 500 tools in use in 2016 being compromised by the release. The tools were developed by the Operations Support Branch of the CIA.
Searx is a free and open-source metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. To this end, Searx does not share users' IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked, preventing user-profiling-based results modification. By default, Searx queries are submitted via HTTP POST, to prevent users' query keywords from appearing in webserver logs. Searx was inspired by the Seeks project, though it does not implement Seeks' peer-to-peer user-sourced results ranking.