ElevenLabs

Last updated

ElevenLabs Inc.
Company typePrivate company
Industry Artificial intelligence
Founded2022
Founders
  • Piotr Dąbkowski (CTO)
  • Mateusz Staniszewski (CEO)
HeadquartersNew York City, United States
Website elevenlabs.io

ElevenLabs is a software company that specializes in developing natural-sounding speech synthesis and text-to-speech software, using artificial intelligence and deep learning.

Contents

It has been recognized as one of the major companies behind the ongoing AI Spring. [1]

History

ElevenLabs was co-founded in 2022 by Piotr Dąbkowski, an ex-Google machine learning engineer and Mateusz Staniszewski, an ex-Palantir deployment strategist. [2] Both were raised in Poland, and their inspiration for founding ElevenLabs reportedly came from watching inadequately dubbed American films. [3] [4]

Dąbkowski and Staniszewski initially considered different funding options, including the possibility of collaborating with a startup accelerator. In January 2023 they revealed having secured a $2 million pre-seed round. The startup's specialization in AI voice intelligence, a still-emerging field in Europe, played a significant role in attracting investors. The pre-seed funding was primarily led by Credo Ventures, and joined by Concept Ventures. [5]

In January 2023, ElevenLabs publicly released its beta platform. [6]

In June 2023, ElevenLabs raised a $19 million Series A funding round at a valuation of about $100 million, [7] [8] despite the company having no office and only 15 employees. [4] [8] The funding round was co-led by the venture capital firm Andreessen Horowitz, ex-GitHub CEO Nat Friedman, and entrepreneur Daniel Gross. It also saw participation from prominent individuals such as SV Angel, Mike Krieger (co-founder of Instagram), Brendan Iribe (co-founder of Oculus), Mustafa Suleyman (co-founder of Deepmind), and Tim O'Reilly (founder of O'Reilly Media). It was also announced that Andreessen Horowitz would be joining ElevenLabs' board. [3]

On January 22, 2024, ElevenLabs raised an additional $80 million in Series B funding raising the total valuation of the company to $1.1 billion. The funding round was led by Andreessen Horowitz, Friedman, Gross, and Sequoia Capital. Additionally, the company announced a series of new products, such as their Voice Marketplace, AI Dubbing Studio, and mobile app. [9]

Products

ElevenLabs is primarily known for its browser-based, AI-assisted text-to-speech software, Speech Synthesis, which can produce lifelike speech by synthesizing vocal emotion and intonation. [10] The company states its software is built to adjust the intonation and pacing of delivery based on the context of language input used. [11] It uses advanced algorithms to analyze the contextual aspects of text, aiming to detect emotions like anger, sadness, happiness, or alarm, which enables the system to understand the user's sentiment, [12] resulting in achieving a more realistic and human-like inflection. The startup is in the process of patenting this technology. [5] Through its beta site, users can submit text and generate audio files from a selection of default voices. Paying users are given the ability to upload custom voice samples to create new vocal styles using the company's voice cloning tool. [13]

Voice Library is the company's feature for sharing unique voice profiles created using their Voice Design technology. These pre-designed voice profiles allow users to select a voice that best suits their needs, rather than creating one from scratch. [14] Another tool called VoiceLab allows users to clone voices from just a few short snippets of audio and can create entirely new synthetic voices. [3]

On 20 June 2023, ElevenLabs released an AI recognition tool called the AI Speech Classifier, which it claims is the first of its kind. [3] The tool is accessible through an API and designed to determine if an uploaded audio sample originates from ElevenLabs' proprietary AI technology. [4] The company has expressed its intention to collaborate with other AI developers in creating a universal detection system that could be adopted industry-wide. [15]

In July 2023, ElevenLabs announced "Projects", a tool for creating long-form spoken content such as audiobooks and dialogue segments with contextually-aware synthetic or custom voices. [4] [16] The tool was released in September. In August, ElevenLabs expanded its voice generation capabilities to 28 languages. Using an in-house AI model, it automatically detects languages like Korean, Dutch, and Vietnamese, allowing for "emotionally rich" multilingual speech generation. The company also announced that its technology had officially exited its beta phase. [17] [18]

In October 2023, ElevenLabs presented "AI Dubbing," a tool that is able to translate speech into more than 20 languages. The feature is capable of preserving the speaker's original voice, emotions, and intonation, by employing proprietary methods to handle tasks like noise removal, speaker differentiation, transcription, and synchronization of translated speech with the original audio. [19]

Uses

ElevenLabs' use cases span a range of sectors.

Content creators have used ElevenLabs for podcasts, narration, and comedy shows. [20] [21] [22] In March 2023, comedian Drew Carey used ElevenLabs' voice cloning tool to recreate his voice for an episode of his radio show, Friday Night Freakout. [11] In April 2023, Polish TV and radio presenter Jaroslaw Kuzniar has also used a synthesized version of his voice to deliver a series of podcasts on the War in Ukraine. [23] Seth Godin has also used ElevenLabs to narrate his AI-focused podcast. [3]

In March 2023, Super-Hi-Fi, a streaming automation service, partnered with ElevenLabs to launch a fully automated radio service called "AI Radio", using ElevenLabs' software to voice its virtual DJ from prompts generated with ChatGPT. [24] ElevenLabs has also been employed for narrating games and voicing game characters in partnerships with Swedish game developer Paradox Interactive and the United Kingdom-based Magicave. [3] [25]

Publishers and authors have used ElevenLabs to narrate audiobooks and newsletters. [5] [26] On 13 June 2023, Storytel announced an exclusive partnership with the company. In this collaboration, ElevenLabs will create voices tailored specifically to Storytel's core markets and to produce AI-narrated audiobooks. A voice-changing feature called VoiceSwitcher was implemented to enhance personalization for users, providing unique listening experiences customized for each individual. [27] [28]

ElevenLabs has been used to generate audio for dubbing videos in different languages, including by content creators. [5] [8] The platform has the capability to accurately replicate almost any accent in any language. [29] Celebrity fans have used ElevenLabs to create inspirational messages using the voices of their favorite celebrities. [30]

In February, VICE reporter Joseph Cox published findings that he had recorded five minutes of himself talking and then used ElevenLabs to create voice deepfakes that defeated a bank's voice-authentication system. [31]

ElevenLabs sets explicit guidelines regarding the use of its technology, forbidding the cloning of voices for abusive purposes such as fraud, discrimination, hate speech, or online abuse, although it does support the use of its platform for “caricature, parody and satire” and “artistic and political speech contributing to public debates." The company asserts its authority to suspend the accounts and content of users found in violation of these guidelines, and it also highlights its commitment to cooperate with authorities and report any illegal activities in accordance with applicable laws. [3] In January, the company admitted that its platform has been used for “voice cloning misuse cases” [32] and toughened its safeguards against vexatious use of its technology. [33]

Reception

Following its launch in January 2023, ElevenLabs gained rapid momentum and was commended for its voice output quality, fast generation times, and a "generous free tier". It has also been praised for its ability to accurately pronounce names with unique or uncommon pronunciations, addressing a common shortcoming in similar tools that often cater primarily to Western names. [34] The company reached over one million registered users between its launch and June 2023. [3] [4] [35]

Criticism and controversy

ElevenLabs was criticized after users were able to abuse its software to generate controversial statements in the vocal style of celebrities, public officials, and other famous individuals, [36] [37] [38] [39] [33] particularly attracting attention after users on 4chan used the tool to share hateful messages. [40] [15] The software's ability to closely copy real voices has raised ethical concerns, with critics considering it a form of deepfaking. [41] In response, the company said it would work on mitigating potential abuse through safeguards and identity verification. [6] The company has subsequently limited access to its voice cloning feature to paid subscribers, [42] citing the requirement to provide payment information as means for improving accountability, [43] and has implemented bans on users who repeatedly violate the terms of service.

In the leadup to the January 2024 New Hampshire democratic primary, AI-generated robocalls seemingly from Joe Biden encouraging voters to skip voting on the day of the primary were sent out to thousands of residents. The New Hampshire attorney general's office launched an investigation into the incident and linked it to a company based in Texas, with audio experts concluding the call was made using ElevenLabs. In response to the incident, CEO Mati Staniszewski stated that the company was “dedicated to preventing the misuse of audio AI tools” but provided no comment on specific incidents. [44]

Additional concerns have been raised over the ethics of the source of ElevenLabs' training data, with multiple voice actors claiming ElevenLabs used samples of their voices without their consent. [45] ElevenLabs, along with other companies in its category, has thus been seen as a potential challenge to the voice acting sector. [18]

See also

Related Research Articles

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Digital audio workstation</span> Electronic device or application software used for recording, editing and producing audio files

A digital audio workstation is an electronic device or application software used for recording, editing and producing audio files. DAWs come in a wide variety of configurations from a single software program on a laptop, to an integrated stand-alone unit, all the way to a highly complex configuration of numerous components controlled by a central computer. Regardless of configuration, modern DAWs have a central interface that allows the user to alter and mix multiple recordings and tracks into a final produced piece.

<span class="mw-page-title-main">Impersonator</span> Art form or criminal act

An impersonator is someone who imitates or copies the behavior or actions of another. There are many reasons for impersonating someone:

<span class="mw-page-title-main">Audible (service)</span> Online audiobook and podcast service

Audible is an American online audiobook and podcast service that allows users to purchase and stream audiobooks and other forms of spoken word content. This content can be purchased individually or under a subscription model where the user receives "credits" that can be redeemed for content monthly and receive access to a curated on-demand library of content. Audible is the United States' largest audiobook producer and retailer. The service is owned by Audible, a wholly-owned subsidiary of Amazon.com, Inc., headquartered in Newark, New Jersey.

<span class="mw-page-title-main">Human image synthesis</span> Computer generation of human images

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work .

Music and artificial intelligence is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

Uniphore is a conversational automation technology company. Uniphore sells software for conversational analytics, conversational assistants and conversational security. The company is headquartered in Palo Alto, California, with offices in the United States, Singapore, India, Japan, Spain, and Israel. Its products are used by up to 75,000 customer service agents during approximately 160 million interactions per month.

Deepfakes are synthetic media that have been digitally manipulated to replace one person's likeness convincingly with that of another. It can also refer to computer-generated images of human subjects that do not exist in real life. While the act of creating fake content is not new, deepfakes leverage tools and techniques from machine learning and artificial intelligence, including facial recognition algorithms and artificial neural networks such as variational autoencoders (VAEs) and generative adversarial networks (GANs). In turn the field of image forensics develops techniques to detect manipulated images.

Digital cloning is an emerging technology, that involves deep-learning algorithms, which allows one to manipulate currently existing audio, photos, and videos that are hyper-realistic. One of the impacts of such technology is that hyper-realistic videos and photos makes it difficult for the human eye to distinguish what is real and what is fake. Furthermore, with various companies making such technologies available to the public, they can bring various benefits as well as potential legal and ethical concerns.

<span class="mw-page-title-main">Otter.ai</span> Transcription software company

Otter.ai, Inc. is a Mountain View, California-based technology company that develops speech to text transcription applications using artificial intelligence and machine learning. Its software, called Otter, shows captions for live speakers, and generates written transcriptions of speech.

Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

Deepfake pornography, or simply fake pornography, is a type of synthetic porn that is created via altering already-existing pornographic material by applying deepfake technology to the faces of the actors. The use of deepfake porn has sparked controversy because it involves the making and sharing of realistic videos featuring non-consenting individuals, typically female celebrities, and is sometimes used for revenge porn. Efforts are being made to combat these ethical concerns through legislation and technology-based solutions.

<span class="mw-page-title-main">15.ai</span> Real-time text-to-speech tool using artificial intelligence

15.ai is a non-commercial freeware artificial intelligence web application that generates natural emotive high-fidelity text-to-speech voices from an assortment of fictional characters from a variety of media sources. Developed by a pseudonymous MIT researcher under the name 15, the project uses a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate and serve emotive character voices faster than real-time, particularly those with a very small amount of trainable data.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

<span class="mw-page-title-main">Krisp</span> US-based audio processing company

Krisp is an Armenian AI-based audio processing software company that offers real-time noise and voice suppression technology. The company was founded in 2017 in Yerevan, Armenia, by Davit Baghdasaryan and Artavazd Minasyan, and is based in Berkeley, California.

Synthesia is a synthetic media generation company that develops software used to create AI generated video content. It is based in London, England.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is artificial intelligence capable of generating text, images, videos, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

<span class="mw-page-title-main">AI boom</span> Rapid progress in artificial intelligence

The AI boom, or AI spring, is the ongoing period of rapid progress in the field of artificial intelligence (AI). Prominent examples include protein folding prediction led by Google DeepMind and generative AI led by OpenAI.

Respeecher is a Ukrainian software company developing speech synthesis software enabling one person to speak in the voice of another particular person using artificial intelligence.

In late January 2024, sexually explicit AI-generated deepfake images of American musician Taylor Swift were proliferated on social media platforms 4chan and X. The images led Microsoft to enhance Microsoft Designer's text-to-image model to prevent future abuse. Several artificial images of Swift of a sexual or violent nature were quickly spread, with one post reported to have been seen over 47 million times before its eventual removal. These images prompted responses from anti sexual assault advocacy groups, US politicians, Swift's fans, Microsoft CEO Satya Nadella, among others, and it has been suggested that Swift's influence could result in new legislation regarding the creation of deepfake pornography.

References

  1. Kanetkar, Callum Burroughs, Riddhi. "The FOMO is real for venture capitalists paying big premiums to invest in AI startups right now". Business Insider. Retrieved December 10, 2023.{{cite web}}: CS1 maint: multiple names: authors list (link)
  2. Kanetkar, Riddhi. "This startup, founded by ex-Google and Palantir staffers, uses AI to generate realistic voiceovers. Here's the 14-slide pitch deck ElevenLabs used to raise $2 million". Business Insider. Retrieved February 9, 2023.
  3. 1 2 3 4 5 6 7 8 "Now hear this: Voice cloning AI startup ElevenLabs nabs $19M from a16z and other heavy hitters". VentureBeat. June 20, 2023. Retrieved July 25, 2023.
  4. 1 2 3 4 5 Wiggers, Kyle (June 20, 2023). "Voice-generating platform ElevenLabs raises $19M, launches detection tool". TechCrunch. Retrieved July 25, 2023.
  5. 1 2 3 4 Kanetkar, Riddhi. "Hot AI startup ElevenLabs, founded by ex-Google and Palantir staff, is set to raise $18 million at a $100 million valuation. Check out the 14-slide pitch deck it used for its $2 million pre-seed". Business Insider. Retrieved July 25, 2023.
  6. 1 2 "A new AI voice tool is already being abused to make deepfake celebrity audio clips". Engadget. Retrieved February 3, 2023.
  7. "The trials and tribulations of AI voice tech". Financial Times. June 21, 2023. Retrieved July 25, 2023.
  8. 1 2 3 Hunt, Simon (June 20, 2023). "AI firm ElevenLabs achieves $100 million valuation within months of launch". Evening Standard. Retrieved July 25, 2023.
  9. "ElevenLabs Releases New Voice AI Products and Raises $80M Series B". January 22, 2024.
  10. "Generative AI comes for cinema dubbing: Audio AI startup ElevenLabs raises pre-seed". Sifted. January 23, 2023. Retrieved February 3, 2023.
  11. 1 2 Ashworth, Boone (April 12, 2023). "AI Can Clone Your Favorite Podcast Host's Voice". Wired. Retrieved April 25, 2023.
  12. WIRED Staff. "This Podcast Is Not Hosted by AI Voice Clones. We Swear". Wired. ISSN   1059-1028 . Retrieved July 25, 2023.
  13. Frauenfelder, Mark (January 12, 2023). "Software lets you design new synthetic voices from scratch". Boing Boing. Retrieved February 3, 2023.
  14. "As Generative AI booms, this British startup secures $2M to imitate human voices — TFN". Tech Funding News. January 25, 2023. Retrieved February 5, 2023.
  15. 1 2 Thompson, Stuart A. (March 12, 2023). "Making Deepfakes Gets Cheaper and Easier Thanks to A.I." The New York Times. ISSN   0362-4331 . Retrieved July 25, 2023.
  16. Bonk, Lawrence. "ElevenLabs' Powerful New AI Tool Lets You Make a Full Audiobook in Minutes". Lifewire. Retrieved July 25, 2023.
  17. "ElevenLabs' AI Voice Generator Can Now Fake Your Voice in 30 Languages". Gizmodo. August 22, 2023. Retrieved September 25, 2023.
  18. 1 2 Wiggers, Kyle (August 22, 2023). "ElevenLabs' voice-generating tools launch out of beta". TechCrunch. Retrieved September 25, 2023.
  19. Sharma, Shubham (October 10, 2023). "ElevenLabs introduces AI Dubbing, translating video and audio into 20 languages". VentureBeat. Retrieved November 28, 2023.
  20. Knibbs, Kate. "Generative AI Podcasts Are Here. Prepare to Be Bored". Wired. ISSN   1059-1028 . Retrieved July 25, 2023.
  21. Suciu, Peter. "Arrested Succession Parody On YouTube Features 'Narration' By AI-Generated Ron Howard". Forbes. Retrieved July 25, 2023.
  22. Fadulu, Lola (July 6, 2023). "Can A.I. Be Funny? This Troupe Thinks So". The New York Times. ISSN   0362-4331 . Retrieved July 25, 2023.
  23. "Sztuczna inteligencja czyta głosem Jarosława Kuźniara. Rewolucja w radiu i podcastach". Press.pl (in Polish). April 9, 2023. Retrieved April 25, 2023.
  24. McLane, Paul (March 29, 2023). "AI Radio Demonstrates AI Partnership". Radioworld. Retrieved April 25, 2023.
  25. "Magicave announces Beneath The Six game with an AI narrator". VentureBeat. July 6, 2023. Retrieved July 26, 2023.
  26. "AI-Generated Voice Firm Clamps Down After 4chan Makes Celebrity Voices for Abuse". www.vice.com. January 30, 2023. Retrieved February 3, 2023.
  27. Anderson, Porter (June 13, 2023). "'AI Voices' in Audiobooks: Storytel in ElevenLabs Partnership". Publishing Perspectives. Retrieved July 25, 2023.
  28. "Storytel enters strategic partnership with ElevenLabs and announces upcoming launch of new VoiceSwitcher feature". Bloomberg.com. June 13, 2023. Retrieved July 25, 2023.
  29. Wise, James (June 30, 2023). "Imagine your child calling for money. Except it's not them – it's an AI scam". The Guardian. ISSN   0261-3077 . Retrieved July 25, 2023.
  30. Hunter-Tilney, Ludovic (May 27, 2023). "Can AI make me a musical star?". Financial Times. Retrieved July 25, 2023.
  31. Newman, Lily Hay. "AI-Generated Voice Deepfakes Aren't Scary Good—Yet". Wired. ISSN   1059-1028 . Retrieved July 25, 2023.
  32. Hern, Alex; Milmo, Dan (February 24, 2023). "Everything you wanted to know about AI – but were afraid to ask". The Guardian. ISSN   0261-3077 . Retrieved July 25, 2023.
  33. 1 2 Milmo, Dan; Hern, Alex (May 20, 2023). "Elections in UK and US at risk from AI-driven disinformation, say experts". The Guardian. ISSN   0261-3077 . Retrieved July 25, 2023.
  34. Desai, Saahil (July 17, 2023). "A Voicebot Just Left Me Speechless". The Atlantic. Retrieved September 25, 2023.
  35. "Your AI Clone Can Fool Family, Your Bank, But Not Your Video Meeting - Tech News Briefing - WSJ Podcasts". WSJ. Retrieved July 25, 2023.
  36. Jimenez, Jorge (January 31, 2023). "AI company promises changes after 'voice cloning' tool used to make celebrities say awful things". PC Gamer. Retrieved February 3, 2023.
  37. "People Are Still Terrible: AI Voice-Cloning Tool Misused for Deepfake Celeb Clips". PCMag Middle East. January 31, 2023. Retrieved July 25, 2023.
  38. "Internet Up in Arms as 4Chan User Uses AI Voice Simulator To Deepfake Emma Watson's Voice, Makes Her Read Hitler's Autobiography – FandomWire". fandomwire.com. February 2, 2023. Retrieved February 3, 2023.
  39. "The generative A.I. software race has begun". Fortune. Retrieved February 3, 2023.
  40. Vincent, James (January 31, 2023). "4chan users embrace AI voice clone tool to generate celebrity hatespeech". The Verge. Retrieved February 3, 2023.
  41. "Seeing is believing? Global scramble to tackle deepfakes". news.yahoo.com. Retrieved February 3, 2023.
  42. @elevenlabsio (January 31, 2023). "Thank you everyone for your advice. We love what you're creating, but a set of actors use our tech for malicious purposes. We decided to take the following steps to address the issues:" (Tweet). Retrieved April 25, 2023 via Twitter.
  43. @elevenlabsio (January 31, 2023). "This will keep our tools accessible while allowing us to fight potential misuse. Payment details won't always prevent abuse, but they make VoiceLab users less anonymous and force them to think twice before sharing improper content" (Tweet). Retrieved April 25, 2023 via Twitter.
  44. Knibbs, Kate. "Researchers Say the Deepfake Biden Robocall Was Likely Made With Tools From AI Startup ElevenLabs". Wired. ISSN   1059-1028 . Retrieved February 15, 2024.
  45. "Your Favorite Voice Actors Call Out AI Sites Copying Voices Without Consent". Kotaku. February 13, 2023. Retrieved December 10, 2023.