Speechmatics

Last updated

Speechmatics Ltd
Company typePrivately held company
Industry Speech recognition
Founded2006
Founder Tony Robinson
Headquarters Cambridge, UK
Number of locations
Cambridge, UK, London, UK, Chennai, India, Brno, Czech Republic
Area served
Global
Key people
Katy Wigdahl (CEO)
ProductsAutomatic Speech Recognition (ASR), Cloud-based ASR, Speech-to-text, Autonomous Speech Recognition
Revenue11,342,008 Euro (2021)  OOjs UI icon edit-ltr-progressive.svg
Number of employees
100-250
Website www.speechmatics.com

Speechmatics is a technology company based in Cambridge, England, which develops automatic speech recognition software (ASR) based on recurrent neural networks and statistical language modelling. Speechmatics was originally named Cantab Research Ltd when founded in 2006 by speech recognition specialist Dr. Tony Robinson. [1] [2]

Contents

Speechmatics Ltd became the trading name of Cantab Research Ltd in 2012 when the company commercialized. [3] Speechmatics offers its speech recognition for solution and service providers to integrate into their stack regardless of their industry or use case. [4] Businesses use Speechmatics to understand and transcribe human-level speech into text regardless of any gender or demographic barrier. The technology can be deployed on-premises and in public and private cloud. [5] [6]

History

Speechmatics was founded in 2006 by Tony Robinson who pioneered in the application of recurrent neural networks to speech recognition. [7] [8] [9] He was one of the early people who has discovered the practical capabilities of deep neural networks and how they can be used to benefit speech recognition. [10] In 2012, Cantab Research Ltd commercialized its speech recognition product and started selling the technology to customers as Speechmatics Ltd. [11]

In 2014, the company led the development of a billion-word text corpus for measuring progress in statistical language modelling and placed the corpus into the public domain to help accelerate the development of speech recognition technology. [12]

In 2017, the company announced they had developed a new computational method for creating new language models at speed. [13] Around the same time Speechmatics announced a partnership with Qatar Computing Research Institute (QCRI) to develop advanced Arabic speech to text services. [14]

In 2018, Speechmatics became the first ASR provider to develop a Global English language pack which incorporates all dialects and accents of English into one single model. [15]

In 2019, the company raised £6.35 million in venture capital investment in a Series A funding round. [16] With investment from Albion Venture Capital, IQ Capital, and Amadeus Capital Partners, Speechmatics were able to scale into a fast-growth technology start-up. In the same year, the company wins a Queen's Award for Innovation. [17] [18]

In 2020, Speechmatics began scaling beyond its product development and into physical geographic locations. The company opened offices in Brno, Czech Republic, Denver, USA and Chennai, India. [19] [11]

In March 2021, Speechmatics announced its launch on the Microsoft Azure Marketplace to offer any-context speech recognition technology at scale. The ability to consume Speechmatics’ speech recognition engine directly in the Microsoft Azure technology stack enables businesses to start using the technology quickly without barriers to adoption. [20]

In December 2021, Speechmatics and consumer AI startup Personal.ai announced their partnership to offer individuals a personal AI that empower them to never forget their conversations, spoken notes, reminders, details of what they said during a meeting, and more — no matter the dialect of English that they use or accent that they carry. [21]

Product and services

In February 2018, Speechmatics launched Global English, a single English language pack supporting all major English accents for use in speech-to-text transcription. Global English (GE) was trained through spoken data by users from 40 countries and billions of words drawn from global sources, making it comprehensive and accurate accent-agnostic transcription solutions on the market. [22] [23]

In November 2020, the company launched the first Global Spanish language pack on the market that supports all major Spanish accents. Global Spanish (GS) is a single Spanish language pack trained on data drawn across a wide range of diverse sources – specifically those from Latin America – making it the most accurate and comprehensive accent-independent Spanish language pack for speech-to-text. [24]

In October 2021, Speechmatics launched its ‘Autonomous Speech Recognition’ software. [25] [26] Using the latest techniques in deep learning and with the introduction of its breakthrough self-supervised models, Speechmatics outperforms Amazon, Apple, Google, and Microsoft in the company's latest step towards its mission to understand all voices. [27] [28]

Awards and recognition

Speechmatics was named in the FT 1000: Europe's Fastest Growing Companies list for consecutive four years from 2019 to 2022. [29] [30]

In 2018, the company won SME National Business Awards in High Growth Business of the Year. [31]

In 2019, Speechmatics won 2019 Queen's Award for Enterprise in Innovation category. [32] [33]

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

A language model is a probabilistic model of a natural language. In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.

<span class="mw-page-title-main">Google Translate</span> Multilingual neural machine translation service

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of 2022, Google Translate supports 133 languages at various levels; it claimed over 500 million total users as of April 2016, with more than 100 billion words translated daily, after the company stated in May 2013 that it served over 200 million people daily.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed at dealing with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type what they hear. By using transcription hot keys, the manual transcription can be accelerated, the sound filtered, equalized or have the tempo adjusted when the clarity is not great. With speech recognition technology, transcriptionists can automatically convert recordings to text transcripts by opening recordings in a PC and uploading them to a cloud for automatic transcription, or transcribe recordings in real-time by using digital dictation. Depending on quality of recordings, machine generated transcripts may still need to be manually verified. The accuracy rate of the automatic transcription depends on several factors such as background noises, speakers' distance to the microphone, and accents.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Google Brain was a deep learning artificial intelligence research team under the umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, it combined open-ended machine learning research with information systems and large-scale computing resources. It created tools such as TensorFlow, which allow neural networks to be used by the public, and multiple internal AI research projects, and aimed to create research opportunities in machine learning and natural language processing. It was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

Mycroft is a free and open-source software virtual assistant that uses a natural language user interface. Its code was formerly copyleft, but is now under a permissive license. It is named after a fictional computer from the 1966 science fiction novel The Moon Is a Harsh Mistress.

Alex Graves is a computer scientist and research scientist at DeepMind.

Uniphore is a conversational automation technology company. Uniphore sells software for conversational analytics, conversational assistants and conversational security. The company is headquartered in Palo Alto, California, with offices in the United States, Singapore, India, Japan, Spain, and Israel. Its products are used by up to 75,000 customer service agents during approximately 160 million interactions per month.

Clarifai is an independent artificial intelligence company that specializes in computer vision, natural language processing, and audio recognition. One of the first deep learning platforms having been founded in 2013, Clarifai provides an AI platform for unstructured image, video, text, and audio data. Its platform supports the full AI lifecycle for data exploration, data labeling, model training, evaluation and inference around images, video, text, and audio data. Headquartered in Washington DC and with employees in the US, Canada, Argentina, Estonia and India Clarifai uses machine learning and deep neural networks to identify and analyze images, videos, text and audio automatically. Clarifai enables users to implement AI technology into their products.

Tony Robinson is a researcher in the application of recurrent neural networks to speech recognition, being one of the first to discover the practical capabilities of deep neural networks and its application to speech recognition.

<span class="mw-page-title-main">Nanosemantics</span> Russian IT AI company

Nanosemantics Lab is a Russian IT company specializing in natural language processing (NLP), computer vision (CV), speech technologies (ASR/TTS) and creation of interactive dialog interfaces, particularly chatbots and virtual assistants, based on artificial intelligence (AI). The company uses neural network platforms, including its own-made platform PuzzleLib which works on Russian-made microprocessor architecture Elbrus and Russia-based Astra Linux operating system. The company was founded in 2005 by Igor Ashmanov and Natalya Kaspersky.

An audio deepfake is a product of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

vidby AI-based language translator for videos

vidby AG is aa multi-product IT company based in Rotkreuz, Switzerland specializing in the development of AI-powered translation solutions. Founded by Alexander Konovalov and Eugen von Rubinberg in September 2021, the company has garnered attention, especially after gaining the YouTube’s Recommended Vendor status for AI-powered content localization for its video translation and dubbing platform.

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.

References

  1. Clawson, Trevor. "Finding A Voice - Can A UK Startup Compete With IT's Heavy Hitters In The Speech Recognition Market". Forbes . Retrieved 17 May 2018.
  2. Research, QY (April 2018). "Global Speech and Voice Recognition Market Research Report 2018". QY Research Reoprts: 118. Retrieved 17 May 2018.
  3. "New vision as Speechmatics moves into Science Park site". Cambridge Independent. 18 March 2020. Retrieved 22 March 2022.
  4. Cooper, Lanna. "Shaping the future of speech recognition". Startups Magazine. Retrieved 22 March 2022.
  5. "Speechmatics pushes forward recognition of accented English". TechCrunch. Retrieved 22 March 2022.
  6. Turner, Brian (29 September 2021). "Best speech-to-text software in 2022: Free, paid and online voice recognition apps and services". TechRadar India. Retrieved 22 March 2022.
  7. Robinson, Tony; Fallside, Frank (July 1991). "A recurrent error propagation network speech recognition system". Computer Speech and Language. 5 (3): 259–274. doi:10.1016/0885-2308(91)90010-N.
  8. Robinson, Tony (1996). "The Use of Recurrent Neural Networks in Continuous Speech Recognition". Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science. Vol. 355. pp. 233–258. CiteSeerX   10.1.1.364.7237 . doi:10.1007/978-1-4613-1367-0_10. ISBN   978-1-4612-8590-8.
  9. Wakefield, Jane (14 March 2008). "Speech recognition moves to text". BBC News . Retrieved 24 August 2020.
  10. Robinson, Tony (September 1993). "A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project". Third European Conference on Speech Communication and Technology. 1: 1941–1944. Retrieved 17 May 2018.
  11. 1 2 Ohr, Thomas (21 October 2019). "Cambridge-based Speechmatics raises €7.4 million for the global expansion of its speech recognition technology". EU-Startups.
  12. Chelba, Ciprian; Mikolov, Tomas (March 2014). "One billion word benchmark for measuring progress in statistical language modeling". Interspeech. arXiv: 1312.3005 . Bibcode:2013arXiv1312.3005C.
  13. Orlowski, Andrew. "Total recog: British AI makes universal speech breakthrough". The Register. Situation Publishing. Retrieved 17 May 2018.
  14. Erdagami, Ahmed. "QCRI in deal with UK's Speechmatics to take Arabic transcription technology global". Qatar is Booming. ME Business Wire. Retrieved 17 May 2018.
  15. "Speechmatics Improves Language Recognition Accuracy with Next Generation Update". audioXpress. Retrieved 22 March 2022.
  16. "Speechmatics raises £6.35m Series A". UK Tech News. 21 October 2019. Retrieved 22 March 2022.
  17. "Speechmatics' Queen's Award honours translation at its best". Cambridge Independent. 25 September 2019. Retrieved 22 March 2022.
  18. "Eight Cambridge life science and technology companies win Queen's Awards | Business Weekly | Technology News | Business news | Cambridge and the East of England". www.businessweekly.co.uk. Retrieved 22 March 2022.
  19. "Live Chennai: Speechmatics adds Chennai office as part of global expansion plans, Speechmatics, Speech Therapy, Second International Office in Chennai, Hive Collaborative Workspaces". www.livechennai.com. Retrieved 22 March 2022.
  20. Team, EBM ADMIN (24 March 2021). "Speechmatics launches on the Microsoft Azure Marketplace to offer any-context speech recognition technology at scale". European Business Magazine. Retrieved 4 April 2022.
  21. McFarl, Alex (10 December 2021). "Speechmatics Partners With Personal.ai to Capture Voice Memories". Unite.AI. Retrieved 4 April 2022.
  22. "Speechmatics Launches Global English, an Accent-agnostic Language Pack for Speech-to-text Transcription". www.bloomberg.com. 20 February 2018. Retrieved 22 March 2022.
  23. Turner, Brian (29 September 2021). "Best speech-to-text software in 2022: Free, paid and online voice recognition apps and services". TechRadar. Retrieved 22 March 2022.
  24. "Speechmatics launches industry-first Global Spanish language pack for automatic speech-to-text transcription at scale | Cambridge Network". www.cambridgenetwork.co.uk. Retrieved 22 March 2022.
  25. Coldewey, Devin. "Speechmatics pushes forward recognition of accented English". TechCrunch. Retrieved 22 March 2022.
  26. McFarl, Alex (26 October 2021). "Speechmatics Launches Autonomous Speech Recognition Software". Unite.AI. Retrieved 22 March 2022.
  27. "How Speechmatics is leading the way in tackling AI bias". Information Age. 28 October 2021. Retrieved 22 March 2022.
  28. Burt, Chris (27 October 2021). "Speechmatics says dramatic speech recognition bias reduced with unlabelled training data | Biometric Update". www.biometricupdate.com. Retrieved 22 March 2022.
  29. "FT 1000: the sixth annual list of Europe's fastest-growing companies". 1 March 2022. Retrieved 4 April 2022.
  30. Smith, Ian (1 March 2019). "The FT 1000: third annual list of Europe's fastest-growing companies". Financial Times. Retrieved 4 April 2022.
  31. "City successes at SME national awards 2018". Cambridge Independent. 19 December 2018. Retrieved 4 April 2022.
  32. "Speechmatics recognised with prestigious Queen's Award | Cambridge Network". www.cambridgenetwork.co.uk. Retrieved 4 April 2022.
  33. "Cambridge Independent: Speechmatics' Queen's Award honours innovation at its best - News & insight". Cambridge Judge Business School. 18 September 2019. Retrieved 4 April 2022.