Dissociated press is a parody generator (a computer program that generates nonsensical text). The generated text is based on another text using the Markov chain technique. The name is a play on "Associated Press" and the psychological term dissociation (although word salad is more typical of conditions like aphasia and schizophrenia – which is, however, frequently confused with dissociative identity disorder by laypeople).
An implementation of the algorithm is available in Emacs. Another implementation is available as a Perl module in CPAN, Games::Dissociate. [1]
The algorithm starts by printing a number of consecutive words (or letters) from the source text. Then it searches the source text for an occurrence of the few last words or letters printed out so far. If multiple occurrences are found, it picks a random one, and proceeds with printing the text following the chosen occurrence. After a predetermined length of text is printed out, the search procedure is repeated for the newly printed ending.
Considering that words and phrases tend to appear in specific grammatical contexts, the resulting text usually seems correct grammatically, and if the source text is uniform in style, the result appears to be of similar style and subject, and takes some effort on the reader's side to recognize as not genuine. Still, the randomness of the assembly process deprives it of any logical flow - the loosely related parts are connected in a nonsensical way, creating a humorously abstract, random result.
Here is a short example of word-based Dissociated Press applied to the Jargon File: [2]
Here is a short example of letter-based Dissociated Press applied to the same source:
The dissociated press algorithm is described in HAKMEM (1972) Item #176. The name "dissociated press" is first known to have been associated with the Emacs implementation.
Brian Hayes discussed a Travesty algorithm in Scientific American in November 1983. [3] The article provided a garbled William Faulkner passage:
When he got on the table, he come in. He never come out of my own pocket as a measure of protecting the company against riot and bloodshed. And when he said. "You tell me a bus ticket, let alone write out no case histories. Then the law come back with a knife!"
Hugh Kenner and Joseph O'Rourke of Johns Hopkins University discussed their frequency table-based Travesty generator for microcomputers in BYTE in November 1984. The article included the Turbo Pascal source for two versions of the generator, one using Hayes' algorithm and another using Claude Shannon's Hellbat algorithm. [3] Murray Lesser offered a compiled BASIC version in the magazine in July 1985, [4] in September 1985 Peter Wayner offered a version that used tree data structures instead of frequency tables, [5] and in December 1985 Neil J. Rubenking offered a version written in Turbo Pascal that stored frequency information in a B-tree. [6]
The editor war is the rivalry between users of the Emacs and vi text editors. The rivalry has become an enduring part of hacker culture and the free software community.
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process. Pseudorandom number generators are often used in computer programming, as traditional sources of randomness available to humans rely on physical processes not readily available to computer programs, although developments in hardware random number generator technology have challenged this.
In cryptography, RC4 is a stream cipher. While it is remarkable for its simplicity and speed in software, multiple vulnerabilities have been discovered in RC4, rendering it insecure. It is especially vulnerable when the beginning of the output keystream is not discarded, or when nonrandom or related keys are used. Particularly problematic uses of RC4 have led to very insecure protocols such as WEP.
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.
Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words "Markov chain". Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person.
A cryptographic hash function (CHF) is a hash algorithm that has special properties desirable for a cryptographic application:
Algorithmic composition is the technique of using algorithms to create music.
Ctags is a programming tool that generates an index file of names found in source and header files of various programming languages to aid code comprehension. Depending on the language, functions, variables, class members, macros and so on may be indexed. These tags allow definitions to be quickly and easily located by a text editor, a code search engine, or other utility. Alternatively, there is also an output mode that generates a cross reference file, listing information about various names found in a set of language files in human-readable form.
SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements of the papers are formed, including graphs, diagrams, and citations. Created by scientists at the Massachusetts Institute of Technology, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use.
A travesty is an absurd or grotesque misrepresentation, a parody, or grossly inferior imitation. In literary or theatrical contexts it may refer to:
Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators (HRNGs), wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact predetermined—these generations can be reproduced simply by knowing the state of the PRNG.
CryptGenRandom is a deprecated cryptographically secure pseudorandom number generator function that is included in Microsoft CryptoAPI. In Win32 programs, Microsoft recommends its use anywhere random number generation is needed. A 2007 paper from Hebrew University suggested security problems in the Windows 2000 implementation of CryptGenRandom. Microsoft later acknowledged that the same problems exist in Windows XP, but not in Vista. Microsoft released a fix for the bug with Windows XP Service Pack 3 in mid-2008.
In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. Modern symmetric-key ciphers are specifically designed to be immune to such an attack. In other words, modern encryption schemes are pseudorandom permutations and are designed to have ciphertext indistinguishability. If an algorithm is found that can distinguish the output from random faster than a brute force search, then that is considered a break of the cipher.
Computational statistics, or statistical computing, is the study which is the intersection of statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational science specific to the mathematical science of statistics. This area is fast developing. The view that the broader concept of computing must be taught as part of general statistical education is gaining momentum.
Pop music automation is a field of study among musicians and computer scientists with a goal of producing successful pop music algorithmically. It is often based on the premise that pop music is especially formulaic, unchanging, and easy to compose. The idea of automating pop music composition is related to many ideas in algorithmic music, Artificial Intelligence (AI) and computational creativity.
Parody generators are computer programs which generate text that is syntactically correct, but usually meaningless, often in the style of a technical paper or a particular writer. They are also called travesty generators and random text generators.
A paper generator is computer software that composes scholarly papers in the style of those that appear in academic journals or conference proceedings. Typically, the generator uses technical jargon from the field to compose sentences that are grammatically correct and seem erudite but are actually nonsensical. The prose is supported by tables, figures, and references that may be valid in themselves, but are randomly inserted rather than relevant.
The following outline is provided as an overview of, and topical guide to, machine learning:
This article is based in part on the Jargon File, which is in the public domain.