Adversarial stylometry

Last updated

Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop.

Contents

All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own.

Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools.

It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them.

History

Rao & Rohatgi (2000), an early work in adversarial stylometry, [1] identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges. [2] Kacmarcik & Gamon (2006) is another early work. Brennan, Afroz & Greenstadt (2012) performed the first evaluation of adversarial stylometric methods on actual texts. [1]

Brennan & Greenstadt (2009) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods; [3] other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus . [4]

Motivations

Rao & Rohatgi (2000) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable. [5] Narayanan et al. (2012) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time; [6] identification improved with the number of posts written by the author in the corpus. [7] Even if an author is not identified, some of their characteristics may still be deduced stylometrically, [8] or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification. [7] Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates. [9]

Modern machine learning techniques offer powerful tools for identification; [10] further development of corpora and computational stylometric techniques are likely to raise further privacy issues. [11] Gröndahl & Asokan (2020a) say that the general validity of the hypothesis underlying stylometrythat authors have invariant, content-independent 'style fingerprints'is uncertain, but "the deanonymisation attack is a real privacy concern". [12]

Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution; [13] journalists and activists; [10] perpetrators of frauds and hoaxes; [14] authors of fake reviews; [15] literary forgers; [16] criminals disguising their identity from investigators; [17] and, generally, anyone with a desire for anonymity or pseudonymity. [13] Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias). [18] [19] Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored. [20]

Methods

With imitation, the author attempts to mislead stylometry by matching their style to another author's. [21] An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry. [22] Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from. [23]

Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning. [21] Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output. [24] [25] With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics. [2] Wang, Juola & Riddell (2022) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translationshowever, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal. [26] Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed. [23]

Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning, [27] and likewise genres with more permissible variation allow more obfuscation. [28] However, longer texts are harder to thoroughly obfuscate. [29] Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style. [30] With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles. [31] Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way. [2] [32] Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though Juola & Vescovi (2011) report that imitation typically succeeds at masking more characteristics than obfuscation. [33] Automated obfuscation may require large amounts of training data written by the author. [29]

Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoderdecoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral. [34] Another division in automated methods is whether there is feedback from an identification system or not. [35] With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system. [36] [37]

Evaluation

How to best mask stylometric characteristics in practice, and what tasks to perform manually, what with tool assistance, and what fully automatically, is an open field of research, especially in short documents with limited potential variability. [38] [11] Manual adversarial stylometry can be preferred or even required if the author does not trust available computers with the task (as may be the case for a whistleblower, for example). [23] Software tools require maintenance; Wang, Juola & Riddell (2022) report that there is no maintained obfuscatory software suitable for general use. [39] Zhai et al. (2022) identify DS-PAN ( Castro-Castro, Ortega Bueno & Muñoz 2017 ) and Mutant-X ( Mahmood et al. 2019 ) as the 2022 state of the art in automated obfuscation. [40] Manual stylistic modulation is a significant effort, with poor scalability properties; tool assistance can reduce the burden to varying degrees. [41] Deterministic automated methods can lose effectiveness against a classifier trained adversarially, where output from the style transfer program is used in the classifier's training set. [42]

Potthast, Hagen & Stein (2016) give three criteria for use in evaluation of adversarial stylometry methods: safety, meaning that stylistic characteristics are reliably eliminated; soundness, meaning that the semantic content of the text is not unacceptably altered; and sensible, meaning that the output is "well-formed and inconspicuous". Compromising any too deeply is typically an unacceptable result, and the three trade off against each other in practice. [43] Potthast, Hagen & Stein (2016) find that automatically evaluating sensibility, and specifically whether output is acceptably grammatical and well-formed, is difficult; [44] automated evaluation of soundness is somewhat more promising, but manual review is the best method. [45]

Despite safety being an important property of an adversarial stylometry method, it can still be usefully traded away if the conceded stylometric identification potential is otherwise possible by non-stylometric analysisfor example, an author discussing their own upbringing in Britain is unlikely to care if stylometry can reveal that their text is typical of British English. [46] [47]

Evaluating the safety of different approaches is complicated by how identification-resistance fundamentally depends on the methods of identification under consideration. [48] The property of being resilient to unknown analyses is called transferability. [49] Gröndahl & Asokan (2020b) identify four different threat models for authors, varying with their knowledge of how their text will be analysed and what training data will be used: query access, with the weakest analyst and the strongest author who knows both the methods of analysis and the training data; architecture access, where the author knows the analysis methods but not the training data; data access, where the author knows the training data but not the analysis methods; and surrogate access, with the weakest author and the strongest analyst, where the author does not know the methods of analysis nor the training data. [34] Further, when an author chooses a method, they must rely on their threat model and trust that it is valid, and that unknown analyses able to detect remaining stylistic signals cannot or will not be performed, or that the masking successfully transfers; [50] a stylometrist with knowledge of how the author attempted to mask their style, however, may be able to exploit some weakness in the method and render it unsafe. [51] Much of the research into automated methods has assumed that the author has query access, which may not generalise to other settings. [52] Masking methods that internally use an ensemble of different analyses as a model for its adversary may transfer better against unseen analyses. [35]

A thorough soundness loss defeats the purpose of communication, though some degree of meaning change may be tolerable if the core message is preserved; requiring only textual entailment or allowing automatic summarisation are other options to lose some meaning in a possibly-tolerable way. [53] Rewriting an input text to defeat stylometry, as opposed to consciously removing stylistic characteristics during composition, poses challenges in retaining textual meaning. [54] Gröndahl & Asokan (2020a) assess the problem of unsoundness as "the most important challenge" for research into fully automatic approaches. [11]

For sensibility, if a text is so ungrammatical as to be incomprehensible or so ill-formed that it cannot fit in to its genre then the method has failed, but compromises short of that point may be useful. [44] If inconspicuity is partially lost, then there is the possibility that more expensive and less scalable analyses will be performed (e.g., consulting a forensic linguist) to confirm suspicions or gather further evidence. [55] The impact of a total inconspicuity failure varies depending on the motivation for performing adversarial stylometry: for someone simply attempting to stay anonymous (e.g., a whistleblower), detection may not be an issue; for a literary forger, however, detection would be disastrous. [16] Adversarial stylometry can leave evidence of its practice, which is an inconspicuity failure. [56] [57] In the BrennanGreenstadt corpus, the texts have been found to share a common "style" of their own. [58] However, Gröndahl & Asokan (2020a) assess existing evidence as insufficient to prove that adversarial stylometry is always detectable, with only limited methods having been studied. [59] Improving the smoothness of the output text may reduce the detectability of automated tools. [60] The overall detectability of adversarial authorship has not been thoroughly studied; if the methods available to be used by the author are unknown to the stylometrist, it may be impossible. [11]

The problems of author identification and verification in an adversarial setting are greatly different from recognising naïve or cooperative authors. [61] Deliberate attempts to mask authorship are described by Juola & Vescovi (2011) as a "problem for the current state of stylometric art", [62] and Brennan, Afroz & Greenstadt (2012) state that, despite stylometry's high performance in identifying non-adversarial authors, manual application of adversarial methods render it unreliable. [63]

Kacmarcik & Gamon (2006) observe that low-dimensional stylometric models which operate on small numbers of features are less resistant to adversarial stylometry. [64] Research has found that authors vary in how well they are able to modulate their style, with some able to successfully perform the task even without training. [39] Wang, Juola & Riddell (2022), a replication and reproduction of Brennan, Afroz & Greenstadt (2012), found that all three of imitation, translation and obfuscation meaningfully reduced the effectiveness of authorship attribution, with manual obfuscation being somewhat more effective than manual imitation or translation, which performed similarly to each other; the original study found that imitation was superior. [65] Potthast, Hagen & Stein (2016) reported that even simple automated methods of adversarial stylometry caused major difficulties for state-of-the-art authorship identification systems, though at significant soundness and sensibility cost. [66] Adversarially-aware identification systems can perform much better against adversarial stylometry provided that they know which potential obfuscation methods were used, even if the identifier makes mistakes in analysing which anonymisation method was used. [67]

See also

Related Research Articles

<span class="mw-page-title-main">Epistle to the Hebrews</span> Book of the New Testament

The Epistle to the Hebrews is one of the books of the New Testament.

Anonymity describes situations where the acting person's identity is unknown. Some writers have argued that namelessness, though technically correct, does not capture what is more centrally at stake in contexts of anonymity. The important idea here is that a person be non-identifiable, unreachable, or untrackable. Anonymity is seen as a technique, or a way of realizing, a certain other values, such as privacy, or liberty. Over the past few years, anonymity tools used on the dark web by criminals and malicious users have drastically altered the ability of law enforcement to use conventional surveillance techniques.

<span class="mw-page-title-main">Linguistics and the Book of Mormon</span>

According to most adherents of the Latter Day Saint movement, the Book of Mormon is a 19th-century translation of a record of ancient inhabitants of the American continent, which was written in a script which the book refers to as "reformed Egyptian". This claim, as well as all claims to historical authenticity of the Book of Mormon, are rejected by non-Latter Day Saint historians and scientists. Linguistically based assertions are frequently cited and discussed in the context of the subject of the Book of Mormon, both in favor of and against the book's claimed origins.

<span class="mw-page-title-main">Forensic linguistics</span> Application of linguistics to forensics

Forensic linguistics, legal linguistics, or language and the law, is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics.

Automatic summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data.

A paraphrase or rephrase is the rendering of the same text in different words without losing the meaning of the text itself. More often than not, a paraphrased text can convey its meaning better than the original words. In other words, it is a copy of the text in meaning, but which is different from the original. For example, when someone tells a story they heard in their own words, they paraphrase, with the meaning being the same. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

<span class="mw-page-title-main">Authorship of the Pauline epistles</span> New Testament works attributed to Paul the Apostle

The Pauline epistles are the thirteen books in the New Testament traditionally attributed to Paul the Apostle.

Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music, paintings, and chess.

The Form of Preaching is a 14th-century style book or manual about a preaching style known as the "thematic sermon", or "university-style sermon", by Robert of Basevorn. Basevorn's text was not the first book about this topic to appear but was popular because it is very thorough.

Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.

Writeprint is a method in forensic linguistics of establishing author identification over the internet, likened to a digital fingerprint. Identity is established through a comparison of distinguishing stylometric characteristics of an unknown written text with known samples of the suspected author. Even without a suspect, writeprint provides potential background characteristics of the author, such as nationality and education.

Carole Elisabeth Chaski is a forensic linguist who is considered one of the leading experts in the field. Her research has led to improvements in the methodology and reliability of stylometric analysis and inspired further research on the use of this approach for authorship identification. Her contributions have served as expert testimony in several federal and state court cases in the United States and Canada. She is president of ALIAS Technology and executive director of the Institute for Linguistic Evidence, a non-profit research organization devoted to linguistic evidence.

<span class="mw-page-title-main">Shakespeare attribution studies</span> Seeking extent of Shakespeares writings

Shakespeare attribution studies is the scholarly attempt to determine the authorial boundaries of the William Shakespeare canon, the extent of his possible collaborative works, and the identity of his collaborators. The studies, which began in the late 17th century, are based on the axiom that every writer has a unique, measurable style that can be discriminated from that of other writers using techniques of textual criticism originally developed for biblical and classical studies. The studies include the assessment of different types of evidence, generally classified as internal, external, and stylistic, of which all are further categorised as traditional and non-traditional.

Béroul was a Norman or Breton poet of the mid-to-late 12th century. He is usually credited with the authorship of Tristran, a Norman language version of the legend of Tristan and Iseult, of which just under 4500 verses survive in a manuscript of the 13th century. His name is known only from two references in the text of the poem.

In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

Anonymous social media is a subcategory of social media wherein the main social function is to share and interact around content and information anonymously on mobile and web-based platforms. Another key aspect of anonymous social media is that content or information posted is not connected with particular online identities or profiles.

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.

<span class="mw-page-title-main">Generative adversarial network</span> Deep learning method

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

<span class="mw-page-title-main">Author profiling</span> System to identify an author

Author profiling is the analysis of a given set of texts in an attempt to uncover various characteristics of the author based on stylistic- and content-based features, or to identify the author. Characteristics analysed commonly include age and gender, though more recent studies have looked at other characteristics like personality traits and occupation

Code stylometry is the application of stylometry to computer code to attribute authorship to anonymous binary or source code. It often involves breaking down and examining the distinctive patterns and characteristics of the programming code and then comparing them to computer code whose authorship is known. Unlike software forensics, code stylometry attributes authorship for purposes other than intellectual property infringement, including plagiarism detection, copyright investigation, and authorship verification.

References

  1. 1 2 Brennan, Afroz & Greenstadt 2012, p. 3-4.
  2. 1 2 3 Kacmarcik & Gamon 2006, p. 445.
  3. Juola & Vescovi 2011, p. 117.
  4. Afroz, Brennan & Greenstadt 2012, p. 466.
  5. Rao & Rohatgi 2000, 1.3 Contributions.
  6. Gröndahl & Asokan 2020a, p. 19.
  7. 1 2 Narayanan et al. 2012, p. 301.
  8. Emmery, Kádár & Chrupała 2021, p. 2388.
  9. Shetty, Schiele & Fritz 2018, 1 Introduction.
  10. 1 2 Mahmood et al. 2019, p. 54.
  11. 1 2 3 4 Gröndahl & Asokan 2020a, p. 28.
  12. Gröndahl & Asokan 2020a, p. 3.
  13. 1 2 Kacmarcik & Gamon 2006, p. 444.
  14. Afroz, Brennan & Greenstadt 2012, p. 461.
  15. Gröndahl & Asokan 2020a, p. 4.
  16. 1 2 Potthast, Hagen & Stein 2016, p. 5.
  17. Juola & Vescovi 2011, p. 115.
  18. Xu et al. 2019, p. 247.
  19. Mireshghallah & Berg-Kirkpatrick 2021, p. 2009.
  20. Uchendu, Le & Lee 2022, p. 1.
  21. 1 2 Neal et al. 2018, p. 6.
  22. Kacmarcik & Gamon 2006, p. 446.
  23. 1 2 3 Wang, Juola & Riddell 2022, p. 2.
  24. Adelani et al. 2021, p. 8687.
  25. Wang, Juola & Riddell 2022, p. 8.
  26. Neal et al. 2018, p. 6-7.
  27. Neal et al. 2018, p. 26.
  28. 1 2 Mahmood et al. 2019, p. 55.
  29. Afroz, Brennan & Greenstadt 2012, p. 471.
  30. Mireshghallah & Berg-Kirkpatrick 2021, p. 2009-2010.
  31. Rao & Rohatgi 2000, 5 Future Directions.
  32. Juola & Vescovi 2011, p. 121-123.
  33. 1 2 Gröndahl & Asokan 2020b, p. 177.
  34. 1 2 Haroon et al. 2021, p. 1.
  35. Bevendorff et al. 2019, p. 1098.
  36. Saedi & Dras 2020, p. 181.
  37. Neal et al. 2018, p. 27.
  38. 1 2 Wang, Juola & Riddell 2022, p. 3.
  39. Zhai et al. 2022, p. 7374.
  40. Gröndahl & Asokan 2020a, p. 21-22.
  41. Gröndahl & Asokan 2020b, p. 176.
  42. Potthast, Hagen & Stein 2016, p. 6.
  43. 1 2 Potthast, Hagen & Stein 2016, p. 12-13.
  44. Potthast, Hagen & Stein 2016, p. 11.
  45. Almishari, Oguz & Tsudik 2014, p. 6.
  46. Xu et al. 2019, p. 247-248.
  47. Kacmarcik & Gamon 2006, p. 448.
  48. Haroon et al. 2021, p. 3.
  49. Emmery, Kádár & Chrupała 2021, p. 2388-2389.
  50. Potthast, Hagen & Stein 2016, p. 9-10.
  51. Gröndahl & Asokan 2020b, p. 189.
  52. Potthast, Hagen & Stein 2016, p. 11-12.
  53. McDonald et al. 2012, 7.1 Further Work.
  54. Potthast, Hagen & Stein 2016, p. 13.
  55. Mahmood, Shafiq & Srinivasan 2020, p. 2235.
  56. Afroz, Brennan & Greenstadt 2012, p. 462.
  57. Juola 2012, p. 93-94.
  58. Gröndahl & Asokan 2020a, p. 2.
  59. Mahmood, Shafiq & Srinivasan 2020, p. 2243.
  60. Afroz, Brennan & Greenstadt 2012, p. 464.
  61. Juola & Vescovi 2011, p. 123.
  62. Brennan, Afroz & Greenstadt 2012, p. 2.
  63. Kacmarcik & Gamon 2006, p. 451.
  64. Wang, Juola & Riddell 2022, p. 7-8.
  65. Potthast, Hagen & Stein 2016, p. 21.
  66. Zhai et al. 2022, p. 7373.

Bibliography