Speechmatics

Last updated

Speechmatics Ltd
Company typePrivately held company
Industry Speech recognition
Founded2006
Founder Tony Robinson
Headquarters Cambridge, UK
Number of locations
Cambridge, UK, London, UK, Chennai, India, Brno, Czech Republic
Area served
Global
Key people
Katy Wigdahl (CEO)
ProductsAutomatic Speech Recognition (ASR), Cloud-based ASR, Speech-to-text, Autonomous Speech Recognition
Revenue11,342,008 Euro (2021)  OOjs UI icon edit-ltr-progressive.svg
Number of employees
100-250
Website www.speechmatics.com

Speechmatics is a technology company based in Cambridge, England, which develops automatic speech recognition software (ASR) based on recurrent neural networks and statistical language modelling. Speechmatics was originally named Cantab Research Ltd when founded in 2006 by speech recognition specialist Dr. Tony Robinson. [1] [2]

Contents

Speechmatics offers its speech recognition for solution and service providers to integrate into their stack regardless of their industry or use case. [3] Businesses use Speechmatics to understand and transcribe human-level speech into text regardless of any gender or demographic barrier. The technology can be deployed on-premises and in public and private cloud. [4] [5]

History

Speechmatics was founded in 2006 by Tony Robinson who pioneered in the application of recurrent neural networks to speech recognition. [6] [7] [8] He was one of the early people who has discovered the practical capabilities of deep neural networks and how they can be used to benefit speech recognition. [9]

In 2014, the company led the development of a billion-word text corpus for measuring progress in statistical language modelling and placed the corpus into the public domain to help accelerate the development of speech recognition technology. [10]

In 2017, the company announced they had developed a new computational method for creating new language models at speed. [11] Around the same time Speechmatics announced a partnership with Qatar Computing Research Institute (QCRI) to develop advanced Arabic speech to text services. [12]

In 2018, Speechmatics became the first ASR provider to develop a Global English language pack which incorporates all dialects and accents of English into one single model. [13]

In 2019, the company raised £6.35 million in venture capital investment in a Series A funding round. [14] With investment from Albion Venture Capital, IQ Capital, and Amadeus Capital Partners, Speechmatics were able to scale into a fast-growth technology start-up. In the same year, the company wins a Queen's Award for Innovation. [15] [16]

In 2020, Speechmatics began scaling beyond its product development and into physical geographic locations. The company opened offices in Brno, Czech Republic, Denver, USA and Chennai, India. [17] [18]

In March 2021, Speechmatics announced its launch on the Microsoft Azure Marketplace to offer any-context speech recognition technology at scale. The ability to consume Speechmatics’ speech recognition engine directly in the Microsoft Azure technology stack enables businesses to start using the technology quickly without barriers to adoption. [19]

In December 2021, Speechmatics and consumer AI startup Personal.ai announced their partnership to offer individuals a personal AI that empower them to never forget their conversations, spoken notes, reminders, details of what they said during a meeting, and more — no matter the dialect of English that they use or accent that they carry. [20]

In March 2023, Speechmatics released Ursa - a groundbreaking speech-to-text engine setting a new benchmark in transcription accuracy. Ursa, trained on millions of hours of audio data, captures spoken words in noisy and challenging environments. [21]

In July 2024, Speechmatics introduced the world to Flow - the ultimate API for voice interactions. Flow allows businesses to build inclusive, seamless and responsive speech interactions into their products. [22]

Product and services

In February 2018, Speechmatics launched Global English, a single English language pack supporting all major English accents for use in speech-to-text transcription. Global English (GE) was trained through spoken data by users from 40 countries and billions of words drawn from global sources, making it comprehensive and accurate accent-agnostic transcription solutions on the market. [23] [24]

In November 2020, the company launched the first Global Spanish language pack on the market that supports all major Spanish accents. Global Spanish (GS) is a single Spanish language pack trained on data drawn across a wide range of diverse sources – specifically those from Latin America – making it the most accurate and comprehensive accent-independent Spanish language pack for speech-to-text. [25]

In October 2021, Speechmatics launched its ‘Autonomous Speech Recognition’ software. [26] [27] Using the latest techniques in deep learning and with the introduction of its breakthrough self-supervised models, Speechmatics outperforms Amazon, Apple, Google, and Microsoft in the company's latest step towards its mission to understand all voices. [28] [29]

Awards and recognition

Speechmatics was named in the FT 1000: Europe's Fastest Growing Companies list for consecutive four years from 2019 to 2022. [30] [31]

In 2018, the company won SME National Business Awards in High Growth Business of the Year. [32]

In 2019, Speechmatics won 2019 Queen's Award for Enterprise in Innovation category. [33] [34]

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Nuance Communications, Inc. is an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software.

<span class="mw-page-title-main">Google Translate</span> Multilingual neural machine translation service

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of October 2024, Google Translate supports 243 languages and language varieties at various levels. It served over 200 million people daily in May 2013, and over 500 million total users as of April 2016, with more than 100 billion words translated daily.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) is a type of recurrent neural network (RNN) aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". The name is made in analogy with long-term memory and short-term memory and their relationship, studied by cognitive psychologists since early 20th century.

Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type what they hear. By using transcription hot keys, the manual transcription can be accelerated, the sound filtered, equalized or have the tempo adjusted when the clarity is not great. With speech recognition technology, transcriptionists can automatically convert recordings to text transcripts by opening recordings in a PC and uploading them to a cloud for automatic transcription, or transcribe recordings in real-time by using digital dictation. Depending on quality of recordings, machine generated transcripts may still need to be manually verified. The accuracy rate of the automatic transcription depends on several factors such as background noises, speakers' distance to the microphone, and accents.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is a subset of machine learning methods based on neural networks with representation learning. The field takes inspiration from biological neuroscience and is centered around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

<span class="mw-page-title-main">Braina</span> Intelligent personal assistant & dictation software

Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer. The name Braina is a short form of "Brain Artificial".

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.

Uniphore is a conversational automation technology company. Uniphore sells software for conversational analytics, conversational assistants, and conversational security. The company is headquartered in Palo Alto, California, with offices in the United States, Singapore, India, Japan, Spain, and Israel. Its products are used by up to 75,000 customer service agents during approximately 160 million interactions per month.

Alice is a Russian intelligent personal assistant for Android, iOS and Windows operating systems and Yandex's own devices developed by Yandex. Alice was officially introduced on 10 October 2017. Aside from common tasks, such as internet search or weather forecasts, it can also run applications and chit-chat. Alice is also the virtual assistant used for the Yandex Station smart speaker.

Clarifai is an independent artificial intelligence company that specializes in computer vision, natural language processing, and audio recognition. One of the first deep learning platforms having been founded in 2013, Clarifai provides an AI platform for unstructured image, video, text, and audio data. Its platform supports the full AI lifecycle for data exploration, data labeling, model training, evaluation and inference around images, video, text, and audio data. Headquartered in Washington DC and with employees in the US, Canada, Argentina, Estonia and India Clarifai uses machine learning and deep neural networks to identify and analyze images, videos, text and audio automatically. Clarifai enables users to implement AI technology into their products.

Tony Robinson is a researcher in the application of recurrent neural networks to speech recognition, being one of the first to discover the practical capabilities of deep neural networks and its application to speech recognition.

Amazon Polly is a cloud service by Amazon Web Services, a subsidiary of Amazon.com, that converts text into spoken audio. It allows developers to create speech-enabled applications and products. It was launched in November 2016 and now includes 60 voices across 29 languages, some of which are Neural Text-to-Speech voices of higher quality. Users include Duolingo, a language education platform.

<span class="mw-page-title-main">Otter.ai</span> Transcription software company

Otter.ai, Inc. is an American transcription software company based in Mountain View, California. The company develops speech to text transcription applications using artificial intelligence and machine learning. Its software, called Otter, shows captions for live speakers, and generates written transcriptions of speech.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.

References

  1. Clawson, Trevor. "Finding A Voice - Can A UK Startup Compete With IT's Heavy Hitters In The Speech Recognition Market". Forbes . Retrieved 17 May 2018.
  2. Research, QY (April 2018). "Global Speech and Voice Recognition Market Research Report 2018". QY Research Reoprts: 118. Retrieved 17 May 2018.
  3. Cooper, Lanna. "Shaping the future of speech recognition". Startups Magazine. Retrieved 22 March 2022.
  4. "Speechmatics pushes forward recognition of accented English". TechCrunch. Retrieved 22 March 2022.
  5. Turner, Brian (29 September 2021). "Best speech-to-text software in 2022: Free, paid and online voice recognition apps and services". TechRadar India. Retrieved 22 March 2022.
  6. Robinson, Tony; Fallside, Frank (July 1991). "A recurrent error propagation network speech recognition system". Computer Speech and Language. 5 (3): 259–274. doi:10.1016/0885-2308(91)90010-N.
  7. Robinson, Tony (1996). "The Use of Recurrent Neural Networks in Continuous Speech Recognition". Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science. Vol. 355. pp. 233–258. CiteSeerX   10.1.1.364.7237 . doi:10.1007/978-1-4613-1367-0_10. ISBN   978-1-4612-8590-8.
  8. Wakefield, Jane (14 March 2008). "Speech recognition moves to text". BBC News . Retrieved 24 August 2020.
  9. Robinson, Tony (September 1993). "A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project". Third European Conference on Speech Communication and Technology. 1: 1941–1944. Retrieved 17 May 2018.
  10. Chelba, Ciprian; Mikolov, Tomas (March 2014). "One billion word benchmark for measuring progress in statistical language modeling". Interspeech. arXiv: 1312.3005 . Bibcode:2013arXiv1312.3005C.
  11. Orlowski, Andrew. "Total recog: British AI makes universal speech breakthrough". The Register. Situation Publishing. Retrieved 17 May 2018.
  12. Erdagami, Ahmed. "QCRI in deal with UK's Speechmatics to take Arabic transcription technology global". Qatar is Booming. ME Business Wire. Retrieved 17 May 2018.
  13. "Speechmatics Improves Language Recognition Accuracy with Next Generation Update". audioXpress. Retrieved 22 March 2022.
  14. "Speechmatics raises £6.35m Series A". UK Tech News. 21 October 2019. Retrieved 22 March 2022.
  15. "Speechmatics' Queen's Award honours translation at its best". Cambridge Independent. 25 September 2019. Retrieved 22 March 2022.
  16. "Eight Cambridge life science and technology companies win Queen's Awards | Business Weekly | Technology News | Business news | Cambridge and the East of England". www.businessweekly.co.uk. Retrieved 22 March 2022.
  17. "Live Chennai: Speechmatics adds Chennai office as part of global expansion plans, Speechmatics, Speech Therapy, Second International Office in Chennai, Hive Collaborative Workspaces". www.livechennai.com. Retrieved 22 March 2022.
  18. Ohr, Thomas (21 October 2019). "Cambridge-based Speechmatics raises €7.4 million for the global expansion of its speech recognition technology". EU-Startups.
  19. Team, EBM ADMIN (24 March 2021). "Speechmatics launches on the Microsoft Azure Marketplace to offer any-context speech recognition technology at scale". European Business Magazine. Retrieved 4 April 2022.
  20. McFarl, Alex (10 December 2021). "Speechmatics Partners With Personal.ai to Capture Voice Memories". Unite.AI. Retrieved 4 April 2022.
  21. Shenwai, Tanushree (15 March 2023). "Speechmatics Introduces Ursa: A Speech-To-Text System That Delivers Unprecedented Performance Across A Diverse Range of Voices". MarkTechPost. Retrieved 23 August 2024.
  22. "Speechmatics Introduces Flow API for Advanced Speech Interactions". audioXpress. 25 July 2024. Retrieved 23 August 2024.
  23. "Speechmatics Launches Global English, an Accent-agnostic Language Pack for Speech-to-text Transcription". www.bloomberg.com. 20 February 2018. Retrieved 22 March 2022.
  24. Turner, Brian (29 September 2021). "Best speech-to-text software in 2022: Free, paid and online voice recognition apps and services". TechRadar. Retrieved 22 March 2022.
  25. "Speechmatics launches industry-first Global Spanish language pack for automatic speech-to-text transcription at scale | Cambridge Network". www.cambridgenetwork.co.uk. Retrieved 22 March 2022.
  26. Coldewey, Devin. "Speechmatics pushes forward recognition of accented English". TechCrunch. Retrieved 22 March 2022.
  27. McFarl, Alex (26 October 2021). "Speechmatics Launches Autonomous Speech Recognition Software". Unite.AI. Retrieved 22 March 2022.
  28. "How Speechmatics is leading the way in tackling AI bias". Information Age. 28 October 2021. Retrieved 22 March 2022.
  29. Burt, Chris (27 October 2021). "Speechmatics says dramatic speech recognition bias reduced with unlabelled training data | Biometric Update". www.biometricupdate.com. Retrieved 22 March 2022.
  30. "FT 1000: the sixth annual list of Europe's fastest-growing companies". 1 March 2022. Retrieved 4 April 2022.
  31. Smith, Ian (1 March 2019). "The FT 1000: third annual list of Europe's fastest-growing companies". Financial Times. Retrieved 4 April 2022.
  32. "City successes at SME national awards 2018". Cambridge Independent. 19 December 2018. Retrieved 4 April 2022.
  33. "Speechmatics recognised with prestigious Queen's Award | Cambridge Network". www.cambridgenetwork.co.uk. Retrieved 4 April 2022.
  34. "Cambridge Independent: Speechmatics' Queen's Award honours innovation at its best - News & insight". Cambridge Judge Business School. 18 September 2019. Retrieved 4 April 2022.