Audio Analytic

Last updated

Audio Analytic
Type Private
Industry Software, Embedded
Founded Cambridge, UK (2010 (2010)) Series A Investment
FounderDr. Christopher Mitchell (CEO)
Headquarters Cambridge, UK
Key people
Dr. Robert Swann (chairman) Alphamosaic, Amy Weatherup (director)
ProductsSound Recognition Systems
Website www.audioanalytic.com

Audio Analytic is a British company headquartered in Cambridge, England that has developed a patented sound recognition software framework called ai3, which provides technology with the ability to understand context through sound. This framework includes an embeddable software platform that can react to a range of sounds such as smoke alarms and carbon monoxide alarms, window breakage, infant crying and dogs barking.

Contents

History

The company was based on founder Christopher Mitchell's doctoral research from Anglia Ruskin University, with seed investment from EEDA (East of England Development Agency) and local Cambridge Angels investors.[ citation needed ]

In 2022 Audio Analytic was bought by Facebook and Instagram owner Meta. [1]

Products

Audio Analytic sells ai3, a software package that is embedded on a device, along with an assortment of sound profiles that the software can recognise, including warning alarms, window breakage, an infant crying, and voice activity. [2]

Audio Analytic developed the Polyphonic Sound Detection Score (PSDS), a metric for evaluating the performance of sound recognition algorithms when applied to polyphonic sound recordings. [3] [4] [5] They also released an accompanying software framework that implements the PSDS. [6]

Related Research Articles

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these.

<span class="mw-page-title-main">OpenCV</span> Computer vision library

OpenCV is a library of programming functions mainly for real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez. The library is cross-platform and licensed as free and open-source software under Apache License 2. Starting in 2011, OpenCV features GPU acceleration for real-time operations.

<span class="mw-page-title-main">Blender Game Engine</span>

The Blender Game Engine was a free and open-source 3D production suite used for making real-time interactive content. It was previously embedded within Blender, but support for it was dropped in 2019, with the release of Blender 2.8. The game engine was written from scratch in C++ as a mostly independent component, and includes support for features such as Python scripting and OpenAL 3D sound.

PSDS may refer to:

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

<span class="mw-page-title-main">Sensory, Inc.</span>

Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.

<span class="mw-page-title-main">AMD TrueAudio</span>

TrueAudio is the name given to AMD's ASIC intended to serve as dedicated co-processor for the calculations of computationally expensive advanced audio signal processing, like e.g. convolution reverberation effects and 3D audio effects. TrueAudio is integrated into some of the AMD GPUs and APUs available since 2013.

<span class="mw-page-title-main">Deeplearning4j</span> Open-source deep learning library

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">MulticoreWare</span>

MulticoreWare Inc is a software development company, offering products and services related to HEVC video compression, machine learning, compilers for heterogeneous computing, and software performance optimization services. MulticoreWare's customers include AMD, Microsoft, Google, Qualcomm and Telestream. MulticoreWare was founded in 2009. Today it has offices in three countries – United States, China and India.

spaCy

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.

Sound recognition is a technology, which is based on both traditional pattern recognition theories and audio signal analysis methods. Sound recognition technologies contain preliminary data processing, feature extraction and classification algorithms. Sound recognition can classify feature vectors. Feature vectors are created as a result of preliminary data processing and linear predictive coding.

<span class="mw-page-title-main">ML.NET</span> Machine learning library

ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions.

<span class="mw-page-title-main">Voice computing</span> Discipline in computing

Voice computing is the discipline that develops hardware or software to process voice inputs.

<span class="mw-page-title-main">Raylib</span> Game programming library

raylib is a cross-platform open-source software development library. The library was made to create graphical applications and games.

The audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

References

  1. Field, Matthew (6 November 2022). "Cambridge start-up is bought by Facebook owner as Zuckerberg pushes deeper into the metaverse". The Telegraph. ISSN   0307-1235 . Retrieved 7 November 2022.
  2. Bedingfield, Will (5 September 2019). "AI sound recognition will help protect your home from burglary". Wired UK. ISSN   1357-0978 . Retrieved 1 October 2020.
  3. Bilen, Cagdas; Ferroni, Giacomo; Tuveri, Francesco; Azcarreta, Juan; Krstulovic, Sacha (May 2020). "A Framework for the Robust Evaluation of Sound Event Detection". ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 61–65. arXiv: 1910.08440 . doi:10.1109/ICASSP40776.2020.9052995. ISBN   978-1-5090-6631-5. S2CID   204788761.
  4. DCase 2020 Challenges. "Sound event detection and separation in domestic environments - DCASE". dcase.community. Retrieved 4 August 2020.
  5. Wisdom, Scott; Erdogan, Fonseca, Eduardo and Salamon, Justin and Seetharaman, Prem and Hershey, John R., Hakan; Ellis, Daniel P. W.; Serizel, Romain; Turpault, Nicolas; Fonseca, Eduardo; Salamon, Justin; Seetharaman, Prem; Hershey, John R. (2020). "What's All the FUSS About Free Universal Sound Separation Data?". In Preparation.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. Audio Analytic (22 July 2020). "audioanalytic/psds_eval GitHub repository". GitHub. Audio Analytic. Retrieved 4 August 2020.