Rogeting

Last updated

Rogeting is an informal neologism created to describe the act of modifying a published source by substituting synonyms for sufficient words to fool plagiarism detection software, often resulting in the creation of new meaningless phrases through extensive synonym swapping. The term, a reference to Roget's Thesaurus, has been attributed[ by whom? ] to Chris Sadler, principal lecturer in business information systems at Middlesex University, who found the practice in papers submitted by his students, [1] [2] [3] though there is no scholarly evidence of Rogeting more broadly, as little research into Rogeting has been conducted.

Rogeting simply consists of replacing words with their synonyms, chosen from a thesaurus. Since plagiarism detection software operates by comparing the sample text against publicly available source materials, changing words can fool the software. Several websites can perform this word substitution task online for free. A plagiarism checker would not usually be able to detect the original source; however, the main drawback is that the new automatically generated text might not sound natural or might not make sense at all, thus requiring the intervention of a human proofreader — who has to be careful not to reuse words that were present in the original source. An example of a common phrase that becomes meaningless when random synonyms are used is the replacement of "left behind" (e.g., "The purpose of the daycare checklist was to ensure no children were left behind") with the bizarre statement "sinister buttocks".

See also

Related Research Articles

A thesaurus, sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings, sometimes as a hierarchy of broader and narrower terms, sometimes simply as lists of synonyms and antonyms. They are often used by writers to help find the best word to express an idea:

...to find the word, or words, by which [an] idea may be most fitly and aptly expressed

A dysphemism is an expression with connotations that are derogatory either about the subject matter or to the audience. Dysphemisms contrast with neutral or euphemistic expressions. Dysphemism may be motivated by fear, distaste, hatred, contempt, or humour.

<span class="mw-page-title-main">Synonym</span> Words or phrases having the same meaning

A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words begin, start, commence, and initiate are all synonyms of one another: they are synonymous. The standard test for synonymy is substitution: one form can be replaced by another in a sentence without changing its meaning. Words are considered synonymous in only one particular sense: for example, long and extended in the context long time or extended time are synonymous, but long cannot be used in the phrase extended family. Synonyms with exactly the same meaning share a seme or denotational sememe, whereas those with inexactly similar meanings share a broader denotational or connotational sememe and thus overlap within a semantic field. The former are sometimes called cognitive synonyms and the latter, near-synonyms, plesionyms or poecilonyms.

<span class="mw-page-title-main">Glossary of library and information science</span>

This page is a glossary of library and information science.

<span class="mw-page-title-main">Paraphrase</span> Restatement of the meaning of a text or passage using other words

A paraphrase is a restatement of the meaning of a text or passage using other words. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

Turnitin is an Internet-based plagiarism detection service run by the American company Turnitin, LLC, a subsidiary of Advance Publications.

Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music and to fine-art paintings as well. Another conceptualization defines it as the linguistic discipline that evaluates an author's style through the application of statistical analysis to a body of their work.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

A statistically improbable phrase (SIP) is a phrase or set of words that occurs more frequently in a document than in some larger corpus. Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section. Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm. SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea.

A foreign language writing aid is a computer program or any other instrument that assists a non-native language user in writing decently in their target language. Assistive operations can be classified into two categories: on-the-fly prompts and post-writing checks. Assisted aspects of writing include: lexical, syntactic, lexical semantic and idiomatic expression transfer, etc. Different types of foreign language writing aids include automated proofreading applications, text corpora, dictionaries, translation aids and orthography aids.

Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and information remain the same. Text simplification is an important area of research because of communication needs in an increasingly complex and interconnected world more dominated by science, technology, and new media. But natural human languages pose huge problems because they ordinarily contain large vocabularies and complex constructions that machines, no matter how fast and well-programmed, cannot easily process. However, researchers have discovered that, to reduce linguistic diversity, they can use methods of semantic compression to limit and simplify a set of words used in given texts.

Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.

Article spinning is a writing technique used in search engine optimization (SEO), and other applications, which creates what deceitfully appears to be new content from what already exists. Content spinning works by replacing specific words, phrases, sentences, or even entire paragraphs with any number of alternate versions, in order to provide a slightly different variation with each spin — also known as Rogeting. This process can be completely automated or written manually as many times as needed. Early content produced through automated methods often resulted in articles which were hard or even impossible to read. However, as article-spinning techniques were refined they became more sophisticated, and can now result in readable articles which, upon cursory review, can appear original.

<span class="mw-page-title-main">Plagiarism</span> Using another authors work as if it was ones own original work

Plagiarism is the fraudulent representation of another person's language, thoughts, ideas, or expressions as one's own original work. Although precise definitions vary, depending on the institution, such representations are generally considered to violate academic integrity and journalistic ethics as well as social norms of learning, teaching, research, fairness, respect, and responsibility in many cultures. It is subject to sanctions such as penalties, suspension, expulsion from school or work, substantial fines, and even imprisonment.

Assemblage refers to a text "built primarily and explicitly from existing texts to solve a writing or communication problem in a new context". The concept was first proposed by Johndan Johnson-Eilola and Stuart Selber in the journal Computers & Composition in 2007. The notion of assemblages builds on remix and remix practices, which blur distinctions between invented and borrowed work. This idea predates modernism, with the quote by Edgar Allan Poe, "There is no greater mistake than the supposition that a true originality is a mere matter of impulse or inspiration. To originate, is carefully, patiently, and understandingly to combine."

In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

<span class="mw-page-title-main">Sketch Engine</span> Corpus manager and text analysis software

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in 90+ languages.

Unicheck is a cloud-based plagiarism detection software that finds similarities, citations and references in texts.

Peter Mark Roget (1779–1869) was a British physician and lexicographer known for his thesaurus.

Paraphrase or paraphrasing in computational linguistics is the natural language processing task of detecting and generating paraphrases. Applications of paraphrasing are varied including information retrieval, question answering, text summarization, and plagiarism detection. Paraphrasing is also useful in the evaluation of machine translation, as well as semantic parsing and generation of new samples to expand existing corpora.

References

  1. Grove, Jack (7 August 2014). "Sinister buttocks? Roget would blush at the crafty cheek Middlesex lecturer gets to the bottom of meaningless phrases found while marking essays". Times Higher Education . Retrieved 15 July 2015.
  2. Schuman, Rebecca (14 August 2014). "Cease Rogeting Proximately!". Slate . Retrieved 15 July 2015.
  3. "Rogeting: why 'sinister buttocks' are creeping into students' essays". The Guardian. 8 August 2014. Retrieved 15 July 2015.