![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Type | Private |
---|---|
Industry | Software, Embedded |
Founded | Cambridge, UK (2010 ) Series A Investment |
Founder | Dr. Christopher Mitchell (CEO) |
Headquarters | Cambridge, UK |
Key people | Dr. Robert Swann (chairman) Alphamosaic, Amy Weatherup (director) |
Products | Sound Recognition Systems |
Website | www |
Audio Analytic is a British company headquartered in Cambridge, England that has developed a patented sound recognition software framework called ai3, which provides technology with the ability to understand context through sound. This framework includes an embeddable software platform that can react to a range of sounds such as smoke alarms and carbon monoxide alarms, window breakage, infant crying and dogs barking.
The company was based on founder Christopher Mitchell's doctoral research from Anglia Ruskin University, with seed investment from EEDA (East of England Development Agency) and local Cambridge Angels investors.[ citation needed ]
In 2022 Audio Analytic was bought by Facebook and Instagram owner Meta. [1]
Audio Analytic sells ai3, a software package that is embedded on a device, along with an assortment of sound profiles that the software can recognise, including warning alarms, window breakage, an infant crying, and voice activity. [2]
Audio Analytic developed the Polyphonic Sound Detection Score (PSDS), a metric for evaluating the performance of sound recognition algorithms when applied to polyphonic sound recordings. [3] [4] [5] They also released an accompanying software framework that implements the PSDS. [6]
Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these.
OpenCV is a library of programming functions mainly for real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez. The library is cross-platform and licensed as free and open-source software under Apache License 2. Starting in 2011, OpenCV features GPU acceleration for real-time operations.
The Blender Game Engine was a free and open-source 3D production suite used for making real-time interactive content. It was previously embedded within Blender, but support for it was dropped in 2019, with the release of Blender 2.8. The game engine was written from scratch in C++ as a mostly independent component, and includes support for features such as Python scripting and OpenAL 3D sound.
PSDS may refer to:
Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.
Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.
Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.
TrueAudio is the name given to AMD's ASIC intended to serve as dedicated co-processor for the calculations of computationally expensive advanced audio signal processing, like e.g. convolution reverberation effects and 3D audio effects. TrueAudio is integrated into some of the AMD GPUs and APUs available since 2013.
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
MulticoreWare Inc is a software development company, offering products and services related to HEVC video compression, machine learning, compilers for heterogeneous computing, and software performance optimization services. MulticoreWare's customers include AMD, Microsoft, Google, Qualcomm and Telestream. MulticoreWare was founded in 2009. Today it has offices in three countries – United States, China and India.
spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.
Sound recognition is a technology, which is based on both traditional pattern recognition theories and audio signal analysis methods. Sound recognition technologies contain preliminary data processing, feature extraction and classification algorithms. Sound recognition can classify feature vectors. Feature vectors are created as a result of preliminary data processing and linear predictive coding.
ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions.
Voice computing is the discipline that develops hardware or software to process voice inputs.
raylib is a cross-platform open-source software development library. The library was made to create graphical applications and games.
The audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.
{{cite journal}}
: CS1 maint: multiple names: authors list (link)