Amazon Polly

Last updated
Amazon Polly
Initial releaseNovember 29, 2016
Available in41 languages
Type Speech synthesis
Website aws.amazon.com/polly/

Amazon Polly is a cloud service by Amazon Web Services, a subsidiary of Amazon.com, that converts text into spoken audio. [1] [2] [3] It allows developers to create speech-enabled applications and products. [4] It was launched in November 2016 [5] [6] [7] and (as of December 2024) includes 100+ voices across 41 language variants [8] , some of which are Neural Text-to-Speech voices of higher quality. Users include Duolingo, a language education platform. [9]

See also

Related Research Articles

<span class="mw-page-title-main">Amazon Web Services</span> On-demand cloud computing company

Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered, pay-as-you-go basis. Clients will often use this in combination with autoscaling. These cloud computing web services provide various services related to networking, compute, storage, middleware, IoT and other processing capacity, as well as software tools via AWS server farms. This frees clients from managing, scaling, and patching hardware and operating systems. One of the foundational services is Amazon Elastic Compute Cloud (EC2), which allows users to have at their disposal a virtual cluster of computers, with extremely high availability, which can be interacted with over the internet via REST APIs, a CLI or the AWS console. AWS's virtual computers emulate most of the attributes of a real computer, including hardware central processing units (CPUs) and graphics processing units (GPUs) for processing; local/RAM memory; hard-disk (HDD)/SSD storage; a choice of operating systems; networking; and pre-loaded application software such as web servers, databases, and customer relationship management (CRM).

<span class="mw-page-title-main">Google Translate</span> Multilingual neural machine translation service

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of December 2024, Google Translate supports 249 languages and language varieties at various levels. It served over 200 million people daily in May 2013, and over 500 million total users as of April 2016, with more than 100 billion words translated daily.

<span class="mw-page-title-main">Danny Lange</span> Danish computer scientist

Danny B. Lange is a Danish computer scientist who has worked on machine learning for IBM, Microsoft, Amazon Web Services, Uber, and Unity Technologies.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

<span class="mw-page-title-main">Figure Eight Inc.</span> American software company

Figure Eight was a human-in-the-loop machine learning and artificial intelligence company based in San Francisco.

<span class="mw-page-title-main">Siri</span> Software-based personal assistant from Apple

Siri is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, tvOS, audioOS, and visionOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches, and preferences, returning individualized results.

<span class="mw-page-title-main">Duolingo</span> American educational technology company

Duolingo, Inc., is an American educational technology company that produces learning apps and provides language certification. Duolingo offers courses on 44 languages, ranging from English, French, and Spanish to less commonly studied languages such as Welsh, Irish, and Navajo, and even constructed languages such as Klingon. It also offers courses on music and math. The learning method incorporates gamification to motivate users with points, rewards and interactive lessons featuring spaced repetition. The app promotes short, daily lessons for consistent-phased practice.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

Amazon Echo, often shortened to Echo, is a brand of smart speakers developed by Amazon. Echo devices connect to the voice-controlled intelligent personal assistant service Alexa, which will respond when a user says "Alexa". Users may change this wake word to "Amazon", "Echo", "Computer", and other options. The features of the device include voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, and playing audiobooks, in addition to providing weather, traffic and other real-time information. It can also control several smart devices, acting as a home automation hub.

Annapurna Labs is an Israeli microelectronics company. Since January 2015 it has been a wholly owned subsidiary of Amazon.com. Amazon reportedly acquired the company for its Amazon Web Services division for US$350–370M.

Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of load balancing.

<span class="mw-page-title-main">Google Assistant</span> AI-powered digital assistant from Google

Google Assistant is a virtual assistant software application developed by Google that is primarily available on home automation and mobile devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations, unlike the company's previous virtual assistant, Google Now.

This is a timeline of Amazon Web Services, which offers a suite of cloud computing services that make up an on-demand computing platform.

Amazon Alexa, or, Alexa, is a virtual assistant technology largely based on a Polish speech synthesizer named Ivona, bought by Amazon in 2013. It was first used in the Amazon Echo smart speaker and the Amazon Echo Dot, Echo Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of natural language processing for tasks such as voice interaction, music playback, creating to-do lists, setting alarms, streaming podcasts, playing audiobooks, providing weather, traffic, sports, other real-time information and news. Alexa can also control several smart devices as a home automation system. Alexa capabilities may be extended by installing "skills" such as weather programs and audio features. It performs these tasks using automatic speech recognition, natural language processing, and other forms of weak AI.

Amazon Lex is a service for building conversational interfaces into any application using voice and text. It powers the Amazon Alexa virtual assistant. In April 2017, the platform was released to the developer community, and suggested that it could be used for conversational interfaces including Web, mobile apps, robots, toys, drones, and more. Amazon already had launched Alexa Voice Services, which developers can use to integrate Alexa into their own devices, like smart speakers, alarm clocks, etc.; however, Lex will not require that end users interact with the Alexa assistant per se, but rather any type of assistant or interface. As of February 2018, users can now define a response for Amazon Lex chatbots directly from the AWS management console.

<span class="mw-page-title-main">Witlingo</span> Software as a service company

Witlingo is a B2B Software as a Service (SaaS) company that enables businesses and organization to engage with members of their communities by using the latest innovations in Human Language Technology and Conversational AI, such Speech recognition, Natural Language Processing, IVR, Virtual Assistant apps on Smartphone platforms(iOS and Android), Chatbots, and Digital audio.

<span class="mw-page-title-main">Speechmatics</span>

Speechmatics is a technology company based in Cambridge, England, which develops automatic speech recognition software (ASR) based on recurrent neural networks and statistical language modelling. Speechmatics was originally named Cantab Research Ltd when founded in 2006 by speech recognition specialist Dr. Tony Robinson.

Amazon SageMaker AI is a cloud-based machine-learning platform that allows the creation, training, and deployment by developers of machine-learning (ML) models on the cloud. It can be used to deploy ML models on embedded systems and edge-devices. The platform was launched in November 2017.

Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by, a number of United States government agencies, including U.S. Immigration and Customs Enforcement (ICE) and Orlando, Florida police, as well as private entities.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

References

  1. Dignan, Larry (September 20, 2018). "Amazon's slew of new Echo, Alexa devices obscures new developer tools, features". ZDNet.
  2. "Amazon Polly". TechTarget . Retrieved 22 September 2018.
  3. Perez, Sarah (February 8, 2018). "Amazon launches a Polly WordPress plugin that turns blog posts into audio, including podcasts". TechCrunch.
  4. Alawadhi, Neha (August 29, 2018). "AWS announces addition of Hindi language support for Amazon Polly". moneycontrol.com.
  5. "AWS makes Amazon Polly talk in Hindi in addition to Indian English". Digit. August 29, 2018.
  6. "Amazon announces three new AI services called Lex, Polly and recognition for AWS". Firstpost.com. December 1, 2016.
  7. Lardinois, Frederic (November 30, 2016). "Amazon launches Amazon AI to bring its machine learning smarts to developers". TechCrunch.
  8. "Languages Supported by Amazon Polly - Amazon Polly". docs.aws.amazon.com. Retrieved 17 December 2024.
  9. "Powering Language Learning on Duolingo with Amazon Polly". 12 May 2017.