Writeprint

Last updated

Writeprint is a method in forensic linguistics of establishing author identification over the internet, likened to a digital fingerprint. Identity is established through a comparison of distinguishing stylometric characteristics of an unknown written text with known samples of the suspected author (writer invariants). Even without a suspect, writeprint provides potential background characteristics of the author, such as nationality and education. [1]

There are five broad aspects to author identification in writeprint:

While the five features above are the traditional methods of author identification, there are features unique to online text. Features such as choice in font, the use of emojis, and links to other websites all provide a path to identification which is absent in traditional text analysis. [4]

See also

Related Research Articles

<span class="mw-page-title-main">Forensic science</span> Application of science to criminal and civil laws

Forensic science, also known as criminalistics, is the application of science principles and methods to support legal decision-making in matters of criminal and civil law.

Idiolect is an individual's unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. This differs from a dialect, a common set of linguistic characteristics shared among a group of people.

<span class="mw-page-title-main">Fingerprint</span> Biometric identifier

A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfaces such as glass or metal. Deliberate impressions of entire fingerprints can be obtained by ink or other substances transferred from the peaks of friction ridges on the skin to a smooth surface such as paper. Fingerprint records normally contain impressions from the pad on the last joint of fingers and thumbs, though fingerprint cards also typically record portions of lower joint areas of the fingers.

<span class="mw-page-title-main">Forensic dentistry</span> Aspect of criminal investigation

Forensic dentistry or forensic odontology involves the handling, examination, and evaluation of dental evidence in a criminal justice context. Forensic dentistry is used in both criminal and civil law. Forensic dentists assist investigative agencies in identifying human remains, particularly in cases when identifying information is otherwise scarce or nonexistent—for instance, identifying burn victims by consulting the victim's dental records. Forensic dentists may also be asked to assist in determining the age, race, occupation, previous dental history, and socioeconomic status of unidentified human beings.

<span class="mw-page-title-main">Forensic linguistics</span> Application of linguistics to forensics

Forensic linguistics, legal linguistics, or language and the law, is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics.

In linguistics and grammar, a sentence is a linguistic expression, such as the English example "The quick brown fox jumps over the lazy dog." In traditional grammar it is typically defined as a string of words that expresses a complete thought, or as a unit consisting of a subject and predicate. In non-functional linguistics it is typically defined as a maximal unit of syntactic structure such as a constituent. In functional linguistics, it is defined as a unit of written texts delimited by graphological features such as upper-case letters and markers such as periods, question marks, and exclamation marks. This notion contrasts with a curve, which is delimited by phonologic features such as pitch and loudness and markers such as pauses; and with a clause, which is a sequence of words that represents some process going on throughout time. A sentence can include words grouped meaningfully to express a statement, question, exclamation, request, command, or suggestion.

In linguistics, prosody is the study of elements of speech that are not individual phonetic segments but which are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.

Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music, paintings, and chess.

Forensic identification is the application of forensic science, or "forensics", and technology to identify specific objects from the trace evidence they leave, often at a crime scene or the scene of an accident. Forensic means "for the courts".

In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of words, phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves removing features specific to particular linguistic and cultural contexts, to the extent that such a project is possible. The elements of idiom and figurative speech, being cultural, are often also converted into relatively invariant meanings in semantic analysis. Semantics, although related to pragmatics, is distinct in that the former deals with word or sentence choice in any given context, while pragmatics considers the unique or particular meaning derived from context or tone. To reiterate in different terms, semantics is about universally coded meaning, and pragmatics, the meaning encoded in words that is then interpreted by an audience.

The following outline is provided as an overview of and topical guide to forensic science:

Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.

BasisTech is a software company specializing in applying artificial intelligence techniques to understanding documents and unstructured data written in different languages. It has headquarters in Somerville, Massachusetts with a subsidiary office in Tokyo. Its legal name is BasisTech LLC.

<span class="mw-page-title-main">Forensic footwear evidence</span>

Forensic footwear evidence can be used in legal proceedings to help prove that a shoe was at a crime scene. Footwear evidence is often the most abundant form of evidence at a crime scene and in some cases can prove to be as specific as a fingerprint. Initially investigators will look to identify the make and model of the shoe or trainer which made an impression. This can be done visually or by comparison with evidence in a database; both methods focus heavily on pattern recognition and brand or logo marks. Information about the footwear can be gained from the analysis of wear patterns which are dependent on angle of footfall and weight distribution. Detailed examination of footwear impressions can help to link a specific piece of footwear to a footwear imprint as each shoe will have unique characteristics.

<span class="mw-page-title-main">Body identification</span> Subfield of forensic science

Body identification is a subfield of forensic science that uses a variety of scientific and non-scientific methods to identify a body. Forensic purposes are served by rigorous scientific forensic identification techniques, but these are generally preceded by formal identification. This involves requesting a family member or friend of the victim to visually identify the body.

Linguistics is the scientific study of language. Linguistics is based on a theoretical as well as a descriptive study of language and is also interlinked with the applied fields of language studies and language learning, which entails the study of specific languages. Before the 20th century, linguistics evolved in conjunction with literary study and did not exclusively employ scientific methods.

Carole Elisabeth Chaski is a forensic linguist who is considered one of the leading experts in the field. Her research has led to improvements in the methodology and reliability of stylometric analysis and inspired further research on the use of this approach for authorship identification. Her contributions have served as expert testimony in several federal and state court cases in the United States and Canada. She is president of ALIAS Technology and executive director of the Institute for Linguistic Evidence, a non-profit research organization devoted to linguistic evidence.

<span class="mw-page-title-main">Ear print analysis</span>

Ear print analysis is used as a means of forensic identification intended as an identification tool similar to fingerprinting. An ear print is a two-dimensional reproduction of the parts of the outer ear that have touched a specific surface. Ear prints and their use for identification were first discovered by Fritz Hirschi in 1965. Fritz Hirschi was the first to identify a criminal using this method, in Switzerland in 1965 and ear print analysis has also been successfully used to solve crimes in the UK and the Netherlands. In addition to identification, the height of an ear imprint at a crime scene may also provide investigators with information regarding the stature of the perpetrator.

<span class="mw-page-title-main">Author profiling</span> System to identify an author

Author profiling is the analysis of a given set of texts in an attempt to uncover various characteristics of the author based on stylistic- and content-based features, or to identify the author. Characteristics analysed commonly include age and gender, though more recent studies have looked at other characteristics like personality traits and occupation

Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop.

References

  1. Li, Jiexun; Zheng, Rong; Chen, Hsinchun (April 2006). "From Fingerprint to Writeprint". Communications of the ACM. 49 (4): 76–82. doi:10.1145/1121949.1121951. S2CID   14341797.
  2. Iqbal, F; Binsalleeh, H; Fung, B; Debbabi, M (October 2010). "Mining writeprints from anonymous e-mails for forensic investigation" (PDF). Digital Investigation. 7 (1–2): 56–64. doi:10.1016/j.diin.2010.03.003.
  3. Abbasi, Ahmed; Chen, Hsinchun; Nunamaker Jr., Jay F. (Summer 2008). "Stylometric Identification in Electronic Markets: Scalability and Robustness". Journal of Management Information Systems. 25 (1): 49–78. doi:10.2753/MIS0742-1222250103. JSTOR   40398926. S2CID   3941985.
  4. Rehmeyer, Juli (Jan 13, 2007). "Digital Fingerprints". Science News. 171 (2): 26–28. doi:10.1002/scin.2007.5591710210. JSTOR   3982506.