Modular Audio Recognition Framework

Last updated
Modular Audio Recognition Framework (MARF)
Developer(s) Serguei A. Mokhov, Stephen Sinclair, Ian Clement, Dimitrios Nicolacopoulos, The MARF Research and Development Group
Stable release
snapshot-0.3.0-devel-20070108 / January 8, 2007 (2007-01-08)
Repository Blue pencil.svg
Operating system Any with the JVM installed
Type Pattern recognition, Artificial intelligence, Signal processing, Software frameworks
License BSD
Website sourceforge.net/projects/marf

Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework that attempts to facilitate addition of new algorithms. MARF may act as a library in applications or be used as a source for learning and extension. A few example applications are provided to show how to use the framework. There is also a detailed manual [1] and the API reference [2] in the javadoc format as the project tends to be well documented. MARF, its applications, and the corresponding source code and documentation are released under the BSD-style license.

A voice message is a message containing audio of a person's voice. Voice itself could be 'packaged' and sent through the IP backbone so that it reaches its marked 'address'. In a technical sense, the process of sending 'voice packets' is a semi passive way of communication. However, given the speed at which it could be delivered can make the communication sound seamless.

Sound mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing; pressure wave, generated by vibrating structure

In physics, sound is a vibration that typically propagates as an audible wave of pressure, through a transmission medium such as a gas, liquid or solid.

Writing Representation of language in a textual medium

Writing is a medium of human communication that represents language and emotion with signs and symbols. In most languages, writing is a complement to speech or spoken language. Writing is not a language, but a tool used to make languages be read. Within a language system, writing relies on many of the same structures as speech, such as vocabulary, grammar, and semantics, with the added dependency of a system of signs or symbols. The result of writing is called text, and the recipient of text is called a reader. Motivations for writing include publication, storytelling, correspondence, record keeping and diary. Writing has been instrumental in keeping history, maintaining culture, dissemination of knowledge through the media and the formation of legal systems.

Contents

See also

Related Research Articles

In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.

Fast Fourier transform O(n logn) divide and conquer algorithm to calculate the discrete Fourier transforms

A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse factors. As a result, it manages to reduce the complexity of computing the DFT from , which arises if one simply applies the definition of DFT, to , where is the data size. The difference in speed can be enormous, especially for long data sets where N may be in the thousands or millions. In the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory.

The Open Sound System (OSS) is an interface for making and capturing sound in Unix and Unix-like operating systems. It is based on standard Unix devices system calls. The term also sometimes refers to the software in a Unix kernel that provides the OSS interface; it can be thought of as a device driver for sound controller hardware. The goal of OSS is to allow the writing of sound-based applications that are agnostic of the underlying sound hardware.

SuperCollider is an environment and programming language originally released in 1996 by James McCartney for real-time audio synthesis and algorithmic composition.

ASP.NET is an open-source server-side web application framework designed for web development to produce dynamic web pages. It was developed by Microsoft to allow programmers to build dynamic web sites, web applications and web services.

DirectShow

DirectShow, codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft's earlier Video for Windows technology. Based on the Microsoft Windows Component Object Model (COM) framework, DirectShow provides a common interface for media across various programming languages, and is an extensible, filter-based framework that can render or record media files on demand at the request of the user or developer. The DirectShow development tools and documentation were originally distributed as part of the DirectX SDK. Currently, they are distributed as part of the Windows SDK.

The Adaptive Multi-Rateaudio codec is an audio compression format optimized for speech coding. AMR speech codec consists of a multi-rate narrowband speech codec that encodes narrowband (200–3400 Hz) signals at variable bit rates ranging from 4.75 to 12.2 kbit/s with toll quality speech starting at 7.4 kbit/s.

OpenMAX, often shortened as "OMX", is a non-proprietary and royalty-free cross-platform set of C-language programming interfaces. It provides abstractions for routines that are especially useful for processing of audio, video, and still images. It is intended for low power and embedded system devices that need to efficiently process large amounts of multimedia data in predictable ways, such as video codecs, graphics libraries, and other functions for video, image, audio, voice and speech.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

A blackboard system is an artificial intelligence approach based on the blackboard architectural model, where a common knowledge base, the "blackboard", is iteratively updated by a diverse group of specialist knowledge sources, starting with a problem specification and ending with a solution. Each knowledge source updates the blackboard with a partial solution when its internal constraints match the blackboard state. In this way, the specialists work together to solve the problem. The blackboard model was originally designed as a way to handle complex, ill-defined problems, where the solution is the sum of its parts.

A multimedia framework is a Software framework that handles media on a computer and through a network. A good multimedia framework offers an intuitive API and a modular architecture to easily add support for new audio, video and container formats and transmission protocols. It is meant to be used by applications such as media players and audio or video editors, but can also be used to build videoconferencing applications, media converters and other multimedia tools. Data is processed among modules automatically, it is unnecessary for app to pass buffers between connected modules one by one.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

Windows Vista has many significant new features compared with previous Microsoft Windows versions, covering most aspects of the operating system.

MARF may refer to:

XDAIS or eXpressDsp Algorithm Interoperability Standard is a standard for algorithm development by Texas Instruments for the TMS320 DSP family. The standard was first introduced in 1999 and was created to facilitate integration of DSP algorithms into systems without re-engineering cost. The XDAIS standard address the issues of algorithm resource allocation and consumption on a DSP. Algorithms that comply with the standard are tested and awarded an "eXpressDSP-compliant" mark upon successful completion of the test

Microsoft Translator multilingual machine translation cloud service provided by Microsoft

Microsoft Translator is a multilingual machine translation cloud service provided by Microsoft. Microsoft Translator is integrated across multiple consumer, developer, and enterprise products; including Bing, Microsoft Office, SharePoint, Microsoft Edge, Microsoft Lync, Yammer, Skype Translator, Visual Studio, Internet Explorer, and Microsoft Translator apps for Windows, Windows Phone, iPhone and Apple Watch, and Android phone and Android Wear.

Media Lovin' Toolkit (MLT) is an open source multimedia framework, designed and developed for television broadcasting. It provides a toolkit for broadcasters, video editors, media players, transcoders, web streamers and many more types of applications. The functionality of the system is provided via an assortment of ready to use tools, XML authoring components, and an extensible plug-in based API.

In computer programming, an application programming interface (API) is a set of subroutine definitions, communication protocols, and tools for building software. In general terms, it is a set of clearly defined methods of communication among various components. A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer.

The Colony Framework is an open source plugin framework specification. Implementations of the specification offer a runtime component model, that allows for plugins to be installed, started, stopped, updated and uninstalled without requiring the application container to be stopped. The specification relies heavily on the Inversion of control principle, in order to make it easier for application components to discover and interact with each other.

HTML5 Audio is a subject of the HTML5 specification, incorporating audio input, playback, and synthesis, as well as speech to text, in the browser.

References

New Jersey State of the United States of America

New Jersey is a state in the Mid-Atlantic and Northeastern regions of the United States. It is located on a peninsula, bordered on the north and east by the state of New York, particularly along the extent of the length of New York City on its western edge; on the east, southeast, and south by the Atlantic Ocean; on the west by the Delaware River and Pennsylvania; and on the southwest by the Delaware Bay and Delaware. New Jersey is the fourth-smallest state by area but the 11th-most populous, with 9 million residents as of 2017, and the most densely populated of the 50 U.S. states; its biggest city is Newark. New Jersey lies completely within the combined statistical areas of New York City and Philadelphia. New Jersey was the second-wealthiest U.S. state by median household income as of 2017.

United States Federal republic in North America

The United States of America (USA), commonly known as the United States or America, is a country comprising 50 states, a federal district, five major self-governing territories, and various possessions. At 3.8 million square miles, the United States is the world's third or fourth largest country by total area and is slightly smaller than the entire continent of Europe's 3.9 million square miles. With a population of over 327 million people, the U.S. is the third most populous country. The capital is Washington, D.C., and the largest city by population is New York City. Forty-eight states and the capital's federal district are contiguous in North America between Canada and Mexico. The State of Alaska is in the northwest corner of North America, bordered by Canada to the east and across the Bering Strait from Russia to the west. The State of Hawaii is an archipelago in the mid-Pacific Ocean. The U.S. territories are scattered about the Pacific Ocean and the Caribbean Sea, stretching across nine official time zones. The extremely diverse geography, climate, and wildlife of the United States make it one of the world's 17 megadiverse countries.

Footnotes