Bingbot

Last updated
Bingbot
Original author(s) Microsoft
Type Web crawler
Website Bingbot FAQ

Bingbot is a web-crawling robot (type of internet bot), deployed by Microsoft October 2010 to supply Bing. [1] It collects documents from the web to build a searchable index for the Bing (search engine). It performs the same as Google's Googlebot . [2]

Contents

Behavior

A typical user agent string for Bingbot is "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)". This appears in the web server logs to tell the webmaster who is requesting a file. Each webmaster is able to use the included agent identifier, "bingbot", to disallow or allow access to their site (by default access is allowed). [3] If they don't want to grant access, they can use the Robots Exclusion Standard to block it (relying on the assumed good behaviour of bingbot), or use other server specific means (relying on the web server to do the blocking). [4]

Related Research Articles

Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head elements and attributes.

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

The noindex value of an HTML robots meta tag requests that automated Internet bots avoid indexing a web page. Reasons why one might want to use this meta tag include advising robots not to index a very large database, web pages that are very transitory, web pages that are under development, web pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.

Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine optimization (SEO), which adjusts or rewrites website content and site architecture to achieve a higher ranking in search engine results pages to enhance pay per click (PPC) listings and increase the Call to action (CTA) on the website.

Msnbot was a web-crawling robot, deployed by Microsoft to collect documents from the web to build a searchable index for the MSN Search engine. It went into beta in 2004, and had full public release in 2005. The month of October 2010 saw the official retirement of msnbot from most active web crawling duties and its replacement by bingbot.

A sitemap is a list of pages of a web site within a domain.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines often calculate a site's importance according to the number of hyperlinks from other sites, the nofollow setting allows website authors to indicate that the presence of a link is not an endorsement of the target site's importance.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

Microsoft engineering groups are the operating divisions of Microsoft. Starting in April 2002, Microsoft organised itself into seven groups, each an independent financial entity. In September 2005, Microsoft announced a reorganization of its then seven groups into three. In July 2013, Microsoft announced another reorganization into five engineering groups and six corporate affairs groups. A year later, in June 2015, Microsoft reformed into three engineering groups. In September 2016, a new group was created to focus on artificial intelligence and research. On March 29, 2018, a new structure merged all of these into three.

Google Search Console is a web service by Google which allows webmasters to check indexing status, search queries, crawling errors and optimize visibility of their websites.

<span class="mw-page-title-main">Bing Webmaster Tools</span>

Bing Webmaster Tools is a free service as part of Microsoft's Bing search engine which allows webmasters to add their websites to the Bing index crawler, see their site's performance in Bing and a lot more. The service also offers tools for webmasters to troubleshoot the crawling and indexing of their website, submission of new URLs, Sitemap creation, submission and ping tools, website statistics, consolidation of content submission, and new content and community resources.

BotSeer was a Web-based information system and search tool used for research on Web robots and trends in Robot Exclusion Protocol deployment and adherence. It was created and designed by Yang Sun, Isaac G. Councill, Ziming Zhuang and C. Lee Giles. BotSeer was in operation from 2007 to 2010, approximately.

<span class="mw-page-title-main">80legs</span> Web crawling service

80legs is a web crawling service that allows its users to create and run web crawls through its software as a service platform.

Yandex Search is a search engine owned by the company Yandex, based in Russia. In January 2015, Yandex Search generated 51.2% of all of the search traffic in Russia according to LiveInternet.

References

  1. "BingBot Crawl Activity Surging?" . Retrieved 2016-07-16.
  2. Shenoy, Aravind; Prabhu, Anirudh (2016-07-26). Introducing SEO: Your quick-start guide to effective SEO practices. Apress. ISBN   978-1-4842-1854-9.
  3. Team, IntroBooks. Microsoft Bing's Algorithm Explained. IntroBooks.
  4. Siegler, M. G. (2009-07-17). "Ignore That Scary MSNbot, It's Just The Friendly BingBot - Unless It Attacks!". TechCrunch. Retrieved 2023-10-22.