Perplexity AI

Last updated
Perplexity AI, Inc.
Company type Private
Industry Artificial intelligence
Genre Search engine
FoundedAugust 2022;2 years ago (August 2022)
Founders
  • Aravind Srinivas
  • Andy Konwinski
  • Denis Yarats
  • Johnny Ho
Headquarters,
US
Key people
Aravind Srinivas (CEO)
Number of employees
55 [1]  (2024)
Perplexity AI
Screenshot of Perplexity - What is Vilnius.png
Screenshot of Perplexity (2024)
Products
Website www.perplexity.ai OOjs UI icon edit-ltr-progressive.svg

Perplexity AI is an AI-powered research and conversational search engine that answers queries using natural language predictive text. It is based in San Francisco, California.

Contents

Founded in 2022, Perplexity generates answers using sources from the web and cites links within the text response. [4] Perplexity works on a freemium model; the free product uses the company's standalone large language model (LLM) that incorporates natural language processing (NLP) capabilities, while the paid version Perplexity Pro has access to GPT-4, Claude 3.5, Mistral Large, Llama 3 and an Experimental Perplexity Model. [2] [4] [1] In Q1 2024, it had reached 15 million monthly users. [5]

History

Perplexity was founded in 2022 by Aravind Srinivas, Denis Yarats, Johnny Ho and Andy Konwinski, engineers with backgrounds in back-end systems, AI and machine learning. Yarats, the CTO, was an AI research scientist at Meta, while Srinivas, the CEO, worked at OpenAI as an AI researcher. Ho, the Chief Strategy Officer, worked as an engineer at Quora, then as a quantitative trader on Wall Street, and Konwinski was among the founding team at Databricks. [6]

As of 2024, Perplexity has raised $165 million in funding, valuing the company at over $1 billion. [1] Investors include Jeff Bezos, Nvidia, Databricks, Bessemer Venture Partners, Susan Wojcicki, Jeff Dean, Yann LeCun, Andrej Karpathy, Nat Friedman, and Garry Tan. [7] [8] [1]

In mid-2024, Perplexity reached a 3 billion USD valuation. [9]

Functionality

Perplexity's main product is its search engine, which relies on natural language processing. [7] It utilizes the context of the user queries to provide a personalized search result. Perplexity summarizes the search results and produces a text with inline citations. [7]

Perplexity's paid variant, the "Pro" mode (formerly Copilot), asks the user clarifying questions to refine queries. It enables users to upload and analyze local files, including images, alongside generating images using AI. Additionally, it provides access to an API. [7] Perplexity launched a new enterprise version of its product in April 2024. [1]

In May 2024, Perplexity launched a new feature called Pages, which generates a customizable webpage based on user prompts. Pages utilizes Perplexity’s AI search models to gather information and create a research presentation that can be published and shared with others. [10]

Controversies

In June 2024, Forbes publicly criticized Perplexity for their use of Forbes' content. According to Forbes, Perplexity published a story which was largely copied from a proprietary Forbes article, without mentioning or prominently citing Forbes. In response, Srinivas said that the feature had some "rough edges" and accepted feedback, but maintained that Perplexity only "aggregates" rather than plagiarizes information. [11] [12]

Later that month, separate investigations by the magazine Wired and web developer Robb Knight found that Perplexity does not respect the robots.txt standard, which allows websites to stop web crawlers from scraping content, reportedly despite Perplexity claiming the opposite. Perplexity also lists the IP address ranges and user agent strings of their web crawlers publicly, but according to Wired and Robb Knight, they use undisclosed IP addresses and spoofed user agent strings when ignoring robots.txt. [13] [14] In response, Srinivas stated in a phone interview that "Perplexity is not ignoring the Robot Exclusions Protocol... We don't just rely on our own web crawlers, we rely on third-party web crawlers as well." Srinivas explained that the web crawler identified by Wired was owned by a third-party provider. [15] Wired also stated that, in some cases, Perplexity may be summarizing "not actual news articles but reconstructions of what they say based on URLs and traces of them left in search engines like extracts and metadata, offering summaries purporting to be based on direct access to the relevant text." [13] When asked whether Perplexity would cease scraping Wired content using third parties, Srinivas responded that "it's complicated." [15]

Amazon Web Services, which hosts the Perplexity crawler, has a terms of service clause prohibiting its users from ignoring the robots.txt standard. Amazon began a "routine" investigation into the company's usage of Amazon Elastic Compute Cloud. [16]

In July 2024, Perplexity announced the launch of a new publishers' program to share ad revenue with partners. [17]

Related Research Articles

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

robots.txt Internet protocol

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Googlebot</span> Web crawler used by Google

Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.

A sitemap is a list of pages of a web site within a domain.

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines often calculate a site's importance according to the number of hyperlinks from other sites, the nofollow setting allows website authors to indicate that the presence of a link is not an endorsement of the target site's importance.

<span class="mw-page-title-main">Search engine</span> Software system for finding relevant information on the Web

A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.

BotSeer was a Web-based information system and search tool used for research on Web robots and trends in Robot Exclusion Protocol deployment and adherence. It was created and designed by Yang Sun, Isaac G. Councill, Ziming Zhuang and C. Lee Giles. BotSeer was in operation from 2007 to 2010, approximately.

Daybees Search is a web-based vertical search engine for events. Daybees was the first application of its type designed specifically for events.

Common Crawl is a nonprofit 501(c)(3) organization that crawls the web and freely provides its archives and datasets to the public. Common Crawl's web archive consists of petabytes of data collected since 2008. It completes crawls generally every month.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.

Daniel Gross is an American businessperson who co-founded Cue, led artificial intelligence efforts at Apple, served as a partner at Y Combinator, and is a notable technology investor in companies like Uber, Instacart, Figma, GitHub, Airtable, Rippling, CoreWeave, Character.ai, Perplexity.ai, and others.

Data scraping is a technique where a computer program extracts data from human-readable output coming from another program.

Notion is a productivity and note-taking web application developed by Notion Labs, Inc. It is an online only organizational tool on many different operating systems, with options for both free and paid subscriptions. It is based in San Francisco, California, United States.

<span class="mw-page-title-main">Search engine cache</span>

A search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.

<span class="mw-page-title-main">Alation</span> American software company

Alation is an enterprise software company based in Silicon Valley.

<span class="mw-page-title-main">Glean Technologies</span> American AI firm

Glean is an American technology company specializing in enterprise-grade artificial intelligence (AI) and search capabilities. According to Bloomberg, its valuation is near $2 billion.

References

  1. 1 2 3 4 5 Ghaffary, Shirin (2024-04-23). "AI Search Startup Perplexity Valued at $1 Billion in Funding Round". Bloomberg News . Archived from the original on 2024-04-24.
  2. 1 2 "Startup Perplexity Challenges Google With AI Search" . The Wall Street Journal . January 4, 2024. Archived from the original on 2024-01-10. Retrieved 2024-01-10.
  3. "Perplexity Free based on GPT-3.5". discord.com. Perplexity Community Moderator "IceLavaMan". Retrieved 8 October 2024.
  4. 1 2 Singh, Shubham (2024-01-06). "Perplexity AI raises $73.6M in funding round led by Nvidia, Bezos, now valued at $522M". Business Today . Retrieved 2024-01-15.
  5. Nast, Condé; Goode, Lauren (2024-03-21). "Perplexity's Founder Was Inspired by Sundar Pichai. Now They're Competing to Reinvent Search". WIRED. Archived from the original on 2024-07-16. Retrieved 2024-07-27. The startup says its user base has grown to 15 million active users, growing 50 percent from the 10 million reported just two months ago.
  6. "AI-powered search engine Perplexity AI lands $26M, launches iOS app". TechCrunch . 2023-04-04. Archived from the original on 2024-03-05. Retrieved 2024-05-05.
  7. 1 2 3 4 Wiggers, Kyle (2024-01-04). "AI-powered search engine Perplexity AI, now valued at $520M, raises $73.6M". TechCrunch . Archived from the original on 2024-01-07. Retrieved 2024-01-07.
  8. "Announcing our series A funding round and mobile app launch". Perplexity.ai. 2023-04-28. Archived from the original on 2024-04-22. Retrieved 2024-04-24.
  9. "SoftBank to invest in search startup Perplexity AI at $3 bln valuation". Reuters. 2024-06-27. Archived from the original on 2024-08-02. Retrieved 2024-08-02. Japanese technology investor SoftBank Group Corp's (9984.T), Vision Fund 2 is investing between $10 million and $20 million in U.S. search startup Perplexity AI at a valuation of $3 billion, Bloomberg News reported on Thursday.
  10. David, Emilia (2024-05-30). "Perplexity will research and write reports". The Verge . Archived from the original on 2024-06-20. Retrieved 2024-06-24.
  11. O'Brien, Matt (15 June 2024). "AI startup Perplexity wants to upend search business. News outlet Forbes says it's ripping them off". Associated Press . Archived from the original on 20 June 2024. Retrieved 20 June 2024.
  12. Lane, Randall (11 June 2024). "Why Perplexity's Cynical Theft Represents Everything That Could Go Wrong With AI". Forbes . Retrieved 20 June 2024.
  13. 1 2 Mehrotra, Dhruv; Marchman, Tim (19 June 2024). "Perplexity Is a Bullshit Machine". Wired . Archived from the original on 20 June 2024. Retrieved 20 June 2024.
  14. "Perplexity AI Is Lying about Their User Agent". Robb Knight. 15 June 2024. Archived from the original on 20 June 2024. Retrieved 20 June 2024.
  15. 1 2 Sullivan, Mark (June 21, 2024). "Perplexity CEO Aravind Srinivas responds to plagiarism and infringement accusations". Fast Company . Retrieved June 24, 2024.
  16. Mehrotra, Dhruv; Couts, Andrew (June 27, 2024). "Amazon Is Investigating Perplexity Over Claims of Scraping Abuse". Wired . Retrieved July 3, 2024.
  17. Robison, Kylie (2024-07-30). "Perplexity is cutting checks to publishers following plagiarism accusations". The Verge. Retrieved 2024-08-04.