Dissociated press

Last updated

Dissociated press is a parody generator (a computer program that generates nonsensical text). The generated text is based on another text using the Markov chain technique. The name is a play on "Associated Press" and the psychological term dissociation (although word salad is more typical of conditions like aphasia and schizophrenia – which is, however, frequently confused with dissociative identity disorder by laypeople).

Contents

An implementation of the algorithm is available in Emacs. Another implementation is available as a Perl module in CPAN, Games::Dissociate. [1]

The algorithm

The algorithm starts by printing a number of consecutive words (or letters) from the source text. Then it searches the source text for an occurrence of the few last words or letters printed out so far. If multiple occurrences are found, it picks a random one, and proceeds with printing the text following the chosen occurrence. After a predetermined length of text is printed out, the search procedure is repeated for the newly printed ending.

Considering that words and phrases tend to appear in specific grammatical contexts, the resulting text usually seems correct grammatically, and if the source text is uniform in style, the result appears to be of similar style and subject, and takes some effort on the reader's side to recognize as not genuine. Still, the randomness of the assembly process deprives it of any logical flow - the loosely related parts are connected in a nonsensical way, creating a humorously abstract, random result.

Examples

Here is a short example of word-based Dissociated Press applied to the Jargon File: [2]

wart: n. A small, crocky feature that sticks out of an array (C has no checks for this). This is relatively benign and easy to spot if the phrase is bent so as to be not worth paying attention to the medium in question.

Here is a short example of letter-based Dissociated Press applied to the same source:

window sysIWYG: n. A bit was named aften /bee´t@/ prefer to use the other guy's re, especially in every cast a chuckle on neithout getting into useful informash speech makes removing a featuring a move or usage actual abstractionsidered interj. Indeed spectace logic or problem!

History

The dissociated press algorithm is described in HAKMEM (1972) Item #176. The name "dissociated press" is first known to have been associated with the Emacs implementation.

Brian Hayes discussed a Travesty algorithm in Scientific American in November 1983. [3] The article provided a garbled William Faulkner passage:

When he got on the table, he come in. He never come out of my own pocket as a measure of protecting the company against riot and bloodshed. And when he said. "You tell me a bus ticket, let alone write out no case histories. Then the law come back with a knife!"

Hugh Kenner and Joseph O'Rourke of Johns Hopkins University discussed their frequency table-based Travesty generator for microcomputers in BYTE in November 1984. The article included the Turbo Pascal source for two versions of the generator, one using Hayes' algorithm and another using Claude Shannon's Hellbat algorithm. [3] Murray Lesser offered a compiled BASIC version in the magazine in July 1985, [4] in September 1985 Peter Wayner offered a version that used tree data structures instead of frequency tables, [5] and in December 1985 Neil J. Rubenking offered a version written in Turbo Pascal that stored frequency information in a B-tree. [6]

See also

Related Research Articles

The editor war is the rivalry between users of the Emacs and vi text editors. The rivalry has become an enduring part of hacker culture and the free software community.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

In cryptography, RC4 is a stream cipher. While it is remarkable for its simplicity and speed in software, multiple vulnerabilities have been discovered in RC4, rendering it insecure. It is especially vulnerable when the beginning of the output keystream is not discarded, or when nonrandom or related keys are used. Particularly problematic uses of RC4 have led to very insecure protocols such as WEP.

<span class="mw-page-title-main">Hardware random number generator</span> Cryptographic device

In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm. Such devices are often based on microscopic phenomena that generate low-level, statistically random "noise" signals, such as thermal noise, the photoelectric effect, involving a beam splitter, and other quantum phenomena. These stochastic processes are, in theory, completely unpredictable for as long as an equation governing such phenomena is unknown or uncomputable. This is in contrast to the paradigm of pseudo-random number generation commonly implemented in computer programs.

Pretty-printing is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation style, using different color and typeface to highlight syntactic elements of source code, or adjusting size, to make the content easier for people to read, and understand. Pretty-printers for source code are sometimes called code formatters or beautifiers.

In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream. Usually, the book to be used would be agreed ahead of time, while the passage to be used would be chosen randomly for each message and secretly indicated somewhere in the message.

Algorithmic composition is the technique of using algorithms to create music.

A word salad, or schizophasia, is a "confused or unintelligible mixture of seemingly random words and phrases", most often used to describe a symptom of a neurological or mental disorder. The term schizophasia is used in particular to describe the confused language that may be evident in schizophrenia. The words may or may not be grammatically correct, but are semantically confused to the point that the listener cannot extract any meaning from them. The term is often used in psychiatry as well as in theoretical linguistics to describe a type of grammatical acceptability judgement by native speakers, and in computer programming to describe textual randomization.

Ctags is a programming tool that generates an index file of names found in source and header files of various programming languages to aid code comprehension. Depending on the language, functions, variables, class members, macros and so on may be indexed. These tags allow definitions to be quickly and easily located by a text editor, a code search engine, or other utility. Alternatively, there is also an output mode that generates a cross reference file, listing information about various names found in a set of language files in human-readable form.

SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements of the papers are formed, including graphs, diagrams, and citations. Created by scientists at the Massachusetts Institute of Technology, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use.

A travesty is an absurd or grotesque misrepresentation, a parody, or grossly inferior imitation. In literary or theatrical contexts it may refer to:

<span class="mw-page-title-main">Random number generation</span> Producing a sequence that cannot be predicted better than by random chance

Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but unpredictable to foresight. True random number generators can be hardware random-number generators (HRNGS) that generate random numbers, wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called "random number generations" done by pseudorandom number generators (PRNGs) that generate numbers that only look random but are in fact pre-determined—these generations can be reproduced simply by knowing the state of the PRNG.

In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. Modern symmetric-key ciphers are specifically designed to be immune to such an attack. In other words, modern encryption schemes are pseudorandom permutations and are designed to have ciphertext indistinguishability. If an algorithm is found that can distinguish the output from random faster than a brute force search, then that is considered a break of the cipher.

<span class="mw-page-title-main">GNU Emacs</span> GNU version of the Emacs text editor

GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of the free software movement. Its name has occasionally been shortened to GNUMACS. The tag line for GNU Emacs is "the extensible self-documenting text editor".

<span class="mw-page-title-main">Emacs</span> Family of text editors

Emacs, originally named EMACS, is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, self-documenting, real-time display editor". Development of the first Emacs began in the mid-1970s, and work on its direct descendant, GNU Emacs, continues actively; the latest version is 28.2, released in September 2022.

Pop music automation is a field of study among musicians and computer scientists with a goal of producing successful pop music algorithmically. It is often based on the premise that pop music is especially formulaic, unchanging, and easy to compose. The idea of automating pop music composition is related to many ideas in algorithmic music, Artificial Intelligence (AI) and computational creativity.

Parody generators are computer programs which generate text that is syntactically correct, but usually meaningless, often in the style of a technical paper or a particular writer. They are also called travesty generators and random text generators.

<span class="mw-page-title-main">Paper generator</span> Software to create fake academic articles

A paper generator is computer software that composes scholarly papers in the style of those that appear in academic journals or conference proceedings. Typically, the generator uses technical jargon from the field to compose sentences that are grammatically correct and seem erudite but are actually nonsensical. The prose is supported by tables, figures, and references that may be valid in themselves, but are randomly inserted rather than relevant.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

References

  1. Burke, Sean M. and Avi Finkel. Games::Dissociate distribution in CPAN. Retrieved 2012-11-13. Most recent release: 2010, "v1.0".
  2. Raymond, Eric S. (2003-12-29). "Dissociated Press". Jargon File 4.4.7. Retrieved 2007-04-10.
  3. 1 2 Kenner, Hugh; O'Rourke, Joseph (November 1984). "A Travesty Generator for Micros". BYTE. p. 129. Retrieved 23 October 2013.
  4. Lesser, Murray (July 1985). "Travesty Revisited". BYTE. p. 163. Retrieved 27 October 2013.
  5. Wayner, Peter (September 1985). "Build a Travesty Tree". BYTE. p. 183. Retrieved 27 October 2013.
  6. Rubenking, Neil J. (December 1985). "Travesty with Database". BYTE. p. 161. Retrieved 28 October 2013.

This article is based in part on the Jargon File, which is in the public domain.