Company type | Private |
---|---|
Industry | Artificial intelligence |
Genre | Search engine |
Founded | August 2019 |
Founders |
|
Headquarters | , US |
Key people | Aravind Srinivas (CEO) |
Services |
|
Number of employees | 100 [1] [2] (2024) |
Website | perplexity.ai |
Perplexity AI is a conversational search engine that uses large language models (LLMs) to answer queries using sources from the web and cites links within the text response. [3] [4] Its developer, Perplexity AI, Inc., is based in San Francisco, California. [5]
Perplexity was founded in 2022 by Aravind Srinivas, Andy Konwinski, Denis Yarats and Johnny Ho, engineers with backgrounds in back-end systems, artificial intelligence (AI) and machine learning:
Perplexity works on a freemium model. It also has an enterprise version of its product. [2]
The free model uses the company's standalone LLM based on GPT-3.5 with browsing. [7] [8]
It uses the context of the user queries to provide a personalized search result. Perplexity summarizes the search results and produces a text with inline citations. [8]
Perplexity also enables users to use Pages to generate customizable webpage and research presentations based on user prompts. [9]
On 18 November 2024, Perplexity launched its shopping hub to attract users, backed by Amazon and leading AI chipmaker Nvidia. This will give users product cards which will show relevant items in response to asked questions about shopping. [12]
Internal Knowledge Search enables Pro and Enterprise Pro users to search across web content and internal documents simultaneously. Users can upload and search through Excel, Word, PDF, and other common file formats. Enterprise Pro users have a limit of 500 files for upload and indexing. [13]
In October 2024, introduced new finance-related features, including looking up stock prices and company earnings data. The tool provides real-time stock quotes and price tracking, industry peer comparisons and basic financial analysis tools. The platform sources its financial data from Financial Modeling Prep (FMP) to ensure accuracy. [14] [15]
Perplexity Spaces was released in October 2024 as an AI-powered collaboration hub. The platform allows users to create customized knowledge spaces that combine web searches with personal file integration. Users can upload up to 50 different documents, with a 25MB size limit per file. [16]
As of 2024, Perplexity has raised $165 million in funding, valuing the company at over $1 billion. [2]
As of December 2024, Perplexity closed a $500 million round of funding that elevates its valuation to $9 billion. [14] [17] [18]
In July 2024, Perplexity announced the launch of a new publishers' program to share ad revenue with partners. [19]
Perplexity AI plans to introduce ads [20] [21] on its search platform by Q4 of 2024. [22]
In June 2024, Forbes publicly criticized Perplexity for use of their content.
According to Forbes, Perplexity published a story which was largely copied from a proprietary Forbes article, without mentioning or prominently citing Forbes.
In response, Srinivas said that the feature had some "rough edges" and accepted feedback, but maintained that Perplexity only "aggregates" rather than plagiarizes information. [24] [25]
In June 2024, separate investigations by the magazine Wired and web developer Robb Knight found that Perplexity does not respect the robots.txt standard, which allows websites to stop web crawlers from scraping content, reportedly despite Perplexity claiming the opposite.
Perplexity also lists the IP address ranges and user agent strings of their web crawlers publicly, but according to Wired and Robb Knight, they use undisclosed IP addresses and spoofed user agent strings when ignoring robots.txt. [26] [27]
Wired also stated that, in some cases, Perplexity may be summarizing:
"not actual news articles but reconstructions of what they say based on URLs and traces of them left in search engines like extracts and metadata, offering summaries purporting to be based on direct access to the relevant text." [26]
In response, Srinivas stated in a phone interview that:
"Perplexity is not ignoring the Robot Exclusions Protocol... We don't just rely on our own web crawlers, we rely on third-party web crawlers as well."
Srinivas explained that the web crawler identified by Wired was owned by a third-party provider. [28]
When asked whether Perplexity would cease scraping Wired content using third parties, Srinivas responded that "it's complicated." [28]
Amazon Web Services, which hosts the Perplexity crawler, has a terms of service clause prohibiting its users from ignoring the robots.txt standard.
Amazon began a "routine" investigation into the company's usage of Amazon Elastic Compute Cloud. [29]
In October 2024, The New York Times (NYT) sent a cease-and-desist notice to Perplexity to stop accessing and using NYT content, claiming that Perplexity is violating its copyright by scraping data from its website. [30]
NYT is also suing OpenAI and Microsoft for copyright infringement for similarly using millions of its articles to train the large language models that power ChatGPT. [31]
The cease-and-desist notice sent by NYT lawyers read in part:
"Perplexity and its business partners have been unjustly enriched by using, without authorization, The Times's expressive, carefully written and researched, and edited journalism without a license." [32]
Perplexity plans to respond to the notice by October 30, 2024. [30]
The same month, Dow Jones and New York Post filed a lawsuit against Perplexity, alleging copyright infringement. The lawsuit also alleges that Perplexity attributed quotes to an article on F-16 jets for Ukraine that never appeared in the original article. [33]
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid search traffic rather than direct traffic, referral traffic, social media traffic, or paid traffic.
Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler and a mobile crawler.
A sitemap is a list of pages of a web site within a domain.
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Yandex LLC is a Russian technology company that provides Internet-related products and services including a web browser, search engine, cloud computing, web mapping, online food ordering, streaming media, online shopping, and a ridesharing company.
A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.
Microsoft Bing, commonly referred to as Bing, is a search engine owned and operated by Microsoft. The service traces its roots back to Microsoft's earlier search engines, including MSN Search, Windows Live Search, and Live Search. Bing offers a broad spectrum of search services, encompassing web, video, image, and map search products, all developed using ASP.NET.
Box, Inc. is a public company based in Redwood City, California. It develops and markets cloud-based content management, collaboration, and file sharing tools for businesses. Box was founded in 2005 by Aaron Levie and Dylan Smith. Initially, it focused on consumers, but around 2009 and 2010 Box pivoted to focus on business users. The company raised about $500 million over numerous funding rounds before going public in 2015. Its software allows users to store and manage files in an online folder system accessible from any device. Users can then comment on the files, share them, apply workflows, and implement security and governance policies.
Imgur is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and memes, particularly those posted on Reddit.
A content farm or content mill is a company that employs freelance creators or uses automated tools to generate a large amount of web content which is specifically designed to satisfy algorithms for maximal retrieval by search engines, known as SEO. Their main goal is to generate advertising revenue through attracting page views, as first exposed in the context of social spam.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
QANDA is an AI-based learning platform developed by Mathpresso Inc., a South Korea-based education technology company. Its best known feature is a solution search, which uses optical character recognition technology to scan problems and provide step-by-step solutions and learning content.
You.com is an AI assistant that began as a personalization-focused search engine. While still offering web search capabilities, You.com has evolved to prioritize a chat-first AI assistant.
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.
ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and launched in 2022. It is currently based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.
The dead Internet theory is an online conspiracy theory that asserts, due to a coordinated and intentional effort, the Internet now consists mainly of bot activity and automatically generated content manipulated by algorithmic curation to control the population and minimize organic human activity. Proponents of the theory believe these social bots were created intentionally to help manipulate algorithms and boost search results in order to manipulate consumers. Some proponents of the theory accuse government agencies of using bots to manipulate public perception. The date given for this "death" is generally around 2016 or 2017. The dead Internet theory has gained traction because many of the observed phenomena are quantifiable, such as increased bot traffic, but the literature on the subject does not support the full theory.
GPTZero is an artificial intelligence detection software developed to identify artificially generated text, such as those produced by large language models.
SearchGPT is a search engine developed by OpenAI. It combines traditional search engine features with generative pretrained transformers (GPT) to generate responses, including citations to external websites.