Kevin Lenzo

Last updated

Kevin Lenzo (born 1967) is an American computer scientist. [1] He wrote the initial infobot, founded The Perl Foundation (and was its chairman until 2007 [2] ) and the Yet Another Perl Conferences (YAPC)., [3] released CMU Sphinx into Open source, founded Cepstral LLC, and has been a major contributor to the Festival Speech Synthesis System, FestVox, and Flite. His voice is the basis for a number of synthetic voices, including FreeTTS, Flite, and the cmu_us_kal_diphone Festival voice. He has also contributed Perl modules to CPAN. Kevin was also a founding member of the 1980s funk band "Leftover Funk"

See also

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

The Festival Speech Synthesis System is a general multi-lingual speech synthesis system originally developed by Alan W. Black, Paul Taylor and Richard Caley at the Centre for Speech Technology Research (CSTR) at the University of Edinburgh. Substantial contributions have also been provided by Carnegie Mellon University and other sites. It is distributed under a free software license similar to the BSD License.

New Orleans Jazz & Heritage Festival Annual music festival

The New Orleans Jazz & Heritage Festival is an annual celebration of local music and culture held at the Fair Grounds Race Course in New Orleans, Louisiana. Jazz Fest attracts thousands of visitors to New Orleans each year. The New Orleans Jazz & Heritage Festival and Foundation Inc., as it is officially named, was established in 1970 as a 501(c)(3) nonprofit organization (NPO). The Foundation is the original organizer of the New Orleans Jazz & Heritage Festival presented by Shell Oil Company, a corporate financial sponsor. The Foundation was established primarily to redistribute the funds generated by Jazz Fest into the local community. As an NPO, their mission further states that the Foundation "promotes, preserves, perpetuates and encourages the music, culture and heritage of communities in Louisiana through festivals, programs and other cultural, educational, civic and economic activities". The founders of the organization included pianist and promoter George Wein, producer Quint Davis and the late Allison Miner.

Yet Another Perl Conference (YAPC), from 2016–2019 called The Perl Conference (TPC), from 2020 on The Perl and Raku Conference, is a series of conferences discussing the Perl programming language, usually organized under the auspices of The Perl Foundation and Yet Another Society, a "non-profit corporation for the advancement of collaborative efforts in computer and information sciences". The name is an homage to yacc, "Yet Another Compiler Compiler".

Human image synthesis Computer generation of human images

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work.

Residual-excited linear prediction (RELP) is an obsolete speech coding algorithm. It was originally proposed in the 1970s and can be seen as an ancestor of code-excited linear prediction (CELP). Unlike CELP however, RELP directly transmits the residual signal. To achieve lower rates, that residual signal is usually down-sampled. The algorithm is hardly used anymore in audio transmission.

The core idea of Artificial Intelligence systems integration is making individual software components, such as speech synthesizers, interoperable with other components, such as common sense knowledgebases, in order to create larger, broader and more capable A.I. systems. The main methods that have been proposed for integration are message routing, or communication protocols that the software components use to communicate with each other, often through a middleware blackboard system.

As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.

The CMU Pronouncing Dictionary is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

eSpeakNG is a compact, open-source, software speech synthesizer for Linux, Windows, and other platforms. It uses a formant synthesis method, providing many languages in a small size. Much of the programming for eSpeakNG's language support is done using rule files with feedback from native speakers.

LumenVox is a privately held speech recognition software company based in San Diego, California. LumenVox has been described as one of the market leaders in the speech recognition software industry.

The Microsoft text-to-speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI) or the Microsoft Speech Server Platform. There are client, server, and mobile versions of Microsoft text-to-speech voices. Client voices are shipped with Windows operating systems; server voices are available for download for use with server applications such as Speech Server, Lync etc. for both Windows client and server platforms, and mobile voices are often shipped with more recent versions.

The "White Camel" award is given to important contributors to the Perl Programming Language community.

Cepstral is a provider of speech synthesis technology and services. It was founded in June 2000 by leading scientists from Carnegie Mellon University including the computer scientists Kevin Lenzo and Alan W. Black. It is a privately held corporation with headquarters in Pittsburgh, Pennsylvania. The company provides rapid delivery of state-of-the-art speech synthesis applications on a broad range of platforms, from the smallest hand-held devices to the largest centralized servers.

Loquendo is a multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia Lab, also had offices in United Kingdom, Spain, Germany, France, and the United States.

Rob A. Rutenbar American academic

Rob A. Rutenbar is an American academic noted for contributions to software tools that automate analog integrated circuit design, and custom hardware platforms for high-performance automatic speech recognition. He is Senior Vice Chancellor for Research at the University of Pittsburgh, where he leads the university's strategic and operational vision for research and innovation.

Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of pornographic videos altered with the use of AI algorithms to insert the faces of famous actresses. Fears of synthetic media include the potential to supercharge fake news, the spread of misinformation, distrust of reality, mass automation of creative and journalistic jobs, and potentially a complete retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

References

  1. IT Conversations: Kevin Lenzo - Text to Speech: Make It Talk, recorded 2006-01-25
  2. Perl Foundation: Perl Foundation Board of Directors election, new people, new positions, accessed 2010-07-10
  3. "The Timeline of Perl and its Culture".