Shlomo Argamon

Last updated
Shlomo Argamon
Born1967 (age 5556)
Education B.S. applied mathematics, MPhil, Ph.D. computer science
Alma mater Carnegie-Mellon University, Yale University
Occupation Computational linguistics
Employer Illinois Institute of Technology
Known for Computational stylistics
TitleDirector, Master of Data Science; Director, Linguistic Cognition Laboratory, Illinois Institute of Technology
Website lingcog.blogspot.com

Shlomo Argamon is an American/Israeli computer scientist and forensic linguist. He is currently the chair of the computer science department as well as a tenured professor of computer science and director of the Master of Data Science [1] program at Illinois Institute of Technology in Chicago, IL.

Contents

Education

Shlomo Argamon received his B.S. in applied mathematics from Carnegie-Mellon University and his MPhil and Ph.D. in computer science from Yale University, supervised by Drew McDermott. [2] He spent two years doing postdoctoral research under a Fulbright Foundation fellowship with Sarit Kraus at Bar-Ilan University in Ramat Gan, Israel.

Research

Since the late 1990s, Argamon has worked primarily on computational linguistics analysis of non-denotational meaning, including computational analysis of language stylistics, sentiment analysis, [3] [4] [5] and metaphor analysis. [6] He has also published well-cited research on active learning (machine learning), [7] metalearning, [8] and robotic mapping.

Computational Stylistics

Argamon is best known for his work on computational stylistics, particularly author profiling. Together with Moshe Koppel and others, he has shown how statistical analysis of word usage can determine an author's age, sex, native language, and personality type with high accuracy in English-language texts. [9] [10] [11] His work has also shown how textual features indicating differences between male and female authorship are consistent between languages and across time. [12] [13] [14]

He has also developed computational stylistic methods that provide insights into the meaning of stylistic differences. One of Argamon's key innovations for this purpose is the development of computational stylistic analysis using systemic functional linguistics. [15] [16] For example, together with Jeff Dodick and Paul Chase, he examined whether there are clear and consistent differences between scientific method in experimental sciences and historical sciences. Their work showed how using systemic functional features in computational stylistic analysis provides evidence for multiple scientific methodologies of the sorts posited previously by philosophers of science. [17]

Forensic Linguistics is viewed through its two major components, first one being Written Language and the Second one being Spoken Language. [18] Written language is mainly used on transcripts for police interviews, for both the witnesses and the suspects. The transcripts are considered examined text material from criminal messages, terrorist threats or blackmailing messages and translate them from one language to another and then reviewed to help in answering questions about the author if the message. Many different kinds of text materials can be examined, some being notes, phone messages, letters both typed and handwritten as well as text from social medias. Much more can be determined by combining computational stylistics and scientific methods in order to enhance Cybersecurity.

Linguistics for Cybersecurity

Recently, Argamon has pushed for the increased use of linguistic analysis for attribution of cybersecurity attacks. He has pointed out how linguistic attribution techniques can often be used to good effect on natural language texts that arise in different attack scenarios, and has provided analyses for high-profile cases such as the Sony Pictures hack, [19] [20] the Democratic National Committee cyber attacks, [21] and the Shadow Brokers NSA leak. [22] [23]

Data Science

In 2013, Argamon founded the Illinois Institute of Technology Master of Data Science program, [1] which he currently directs. The program seeks to teach students "to think about the real problems that need to be solved, not to simply find technical solutions." Argamon views data scientists as "sensemakers", whose job is not merely to produce analytic results, but to help their clients make sense of a complex, uncertain, and fast-changing world through rigorous analysis and explanation of the data. [24] [25]

Honors

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

The following outline is provided as an overview and topical guide to linguistics:

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

Stylistics, a branch of applied linguistics, is the study and interpretation of texts of all types and/or spoken language in regard to their linguistic and tonal style, where style is the particular variety of language used by different individuals and/or in different situations or settings. For example, the vernacular, or everyday language may be used among casual friends, whereas more formal language, with respect to grammar, pronunciation or accent, and lexicon or choice of words, is often used in a cover letter and résumé and while speaking during a job interview.

Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a KDD process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<span class="mw-page-title-main">Forensic linguistics</span>

Forensic linguistics, legal linguistics, or language and the law, is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics.

<span class="mw-page-title-main">Discourse analysis</span> Generic term for the analysis of social, language policy or historiographical discourse phenomena

Discourse analysis (DA), or discourse studies, is an approach to the analysis of written, vocal, or sign language use, or any significant semiotic event.

Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music and to fine-art paintings as well. Another conceptualization defines it as the linguistic discipline that evaluates an author's style through the application of statistical analysis to a body of their work.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

<span class="mw-page-title-main">Internet linguistics</span>

Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human–computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts, such as Gretchen McCulloch have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users combined.

Computational criminology is an interdisciplinary field which uses computing science methods to formally define criminology concepts, improve our understanding of complex phenomena, and generate solutions for related problems.

Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguistics is concerned with both the cognitive and social aspects of language. It is considered a scientific field as well as an academic discipline; it has been classified as a social science, natural science, cognitive science, or part of the humanities.

Carole Elisabeth Chaski is a forensic linguist who is considered one of the leading experts in the field. Her research has led to improvements in the methodology and reliability of stylometric analysis and inspired further research on the use of this approach for authorship identification. Her contributions have served as expert testimony in several federal and state court cases in the United States and Canada. She is president of ALIAS Technology and executive director of the Institute for Linguistic Evidence, a non-profit research organization devoted to linguistic evidence.

<span class="mw-page-title-main">Moshe Koppel</span> American-Israeli computer scientist

Moshe Koppel is an American-Israeli computer scientist, Talmud scholar and political activist. Koppel was born and raised in New York, where he received a traditional Jewish education. He studied at Yeshivat Har Etzion, received a B.A. from Yeshiva University and in 1979 completed his doctorate in mathematics under the supervision of Martin Davis at the Courant Institute of New York University. He spent a post-doctoral year at the Institute for Advanced Study in Princeton before moving to Israel in 1980. He has been a member of the Department of Computer Science in Bar-Ilan University since then.

The following outline is provided as an overview of and topical guide to natural-language processing:

Native-language identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). NLI works through identifying language-usage patterns that are common to specific L1 groups and then applying this knowledge to predict the native language of previously unseen texts. This is motivated in part by applications in second-language acquisition, language teaching and forensic linguistics, amongst others.

<span class="mw-page-title-main">Jussi Karlgren</span>

Jussi Karlgren is a Swedish computational linguist, research scientist at Spotify, and co-founder of text analytics company Gavagai AB. He holds a PhD in computational linguistics from Stockholm University, and the title of docent of language technology at Helsinki University.

<span class="mw-page-title-main">Author profiling</span> System to identify an author

Author profiling is the analysis of a given set of texts in an attempt to uncover various characteristics of the author based on stylistic- and content-based features, or to identify the author. Characteristics analysed commonly include age and gender, though more recent studies have looked at other characteristics like personality traits and occupation

Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on Natural Language Processing and Computational Linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.

References

  1. 1 2 "Master of Data Science | IIT College of Science".
  2. http://webmail.cs.yale.edu/publications/techreports/tr1032.ps.gz [ dead link ]
  3. Kenneth Bloom, Navendu Garg, and Shlomo Argamon. Extracting appraisal expressions. In Proc. Human Language Technologies: Conference of the North American Association for Computational Linguistics (NAACL-HLT), Rochester, New York, April, 2007.
  4. Casey Whitelaw, Navendu Garg, and Shlomo Argamon. Using appraisal groups for sentiment analysis. In Proc. Conference on Information and Knowledge Management, Bremen, Germany, November 2005.
  5. Shlomo Argamon, Ken Bloom, Andrea Esuli, and Fabrizio Sebastiani. Automatically Determining Attitude Type and Force for Sentiment Analysis. 3rd Language and Technology Conference, Poznan, Poland, October 2007.
  6. Lisa Gandy, Nadji Allan, Mark Atallah, Ophir Frieder, Newton Howard, Sergey Kanareykin, Moshe Koppel, Mark Last, Yair Neuman, Shlomo Argamon. Automatic identification of conceptual metaphors with limited knowledge. In Proc. Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI-13), Bellevue, WA, July 2013.
  7. Shlomo Argamon-Engelson and Ido Dagan. Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, 11:335-360, 1999.
  8. Julio Ortega, Moshe Koppel, and Shlomo Argamon-Engelson. Arbitrating among competing classifiers using learned referees. Knowledge and Information Systems, 3(4):470–490, 2001.
  9. Argamon, Shlomo, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. "Gender, genre, and writing style in formal written texts." Text 23, no. 3 (2003): 321-346.
  10. Argamon, Shlomo, Moshe Koppel, James W. Pennebaker, and Jonathan Schler. "Automatically profiling the author of an anonymous text." Communications of the ACM 52, no. 2 (2009): 119-123.
  11. Argamon, Shlomo, Moshe Koppel, James W. Pennebaker, and Jonathan Schler. "Mining the Blogosphere: Age, gender and the varieties of self-expression." First Monday 12, no. 9 (2007). http://journals.uic.edu/ojs/index.php/fm/article/view/2003
  12. Argamon, Shlomo, Jean-Baptiste Goulain, Russell Horton, and Mark Olsen. "Vive la Différence! Text mining gender difference in French literature." Digital Humanities Quarterly 3, no. 2 (2009).
  13. Argamon, Shlomo, Russell Horton, Mark Olsen, and Sterling Stuart Stein. "Gender, Race, and Nationality in BlackDrama, 1850-2000: Mining Differences in Language Use in Authors and their Characters." Proceedings of Digital Humanities (2007).
  14. Hota, Sobhan R., Shlomo Argamon, and Rebecca Chung. "Gender in Shakespeare: Automatic stylistics gender character classification using syntactic, lexical and lemma features." Proc. Chicago Colloquium on Digital Humanities and Computer Science (DHCS) (2006).
  15. Argamon, Shlomo, Casey Whitelaw, Paul Chase, Sobhan Raj Hota, Navendu Garg, and Shlomo Levitan. "Stylistic text classification using functional lexical features." Journal of the American Society for Information Science and Technology 58, no. 6 (2007): 802-822.
  16. Argamon, Shlomo, and Moshe Koppel. "The rest of the story: Finding meaning in stylistic variation." In The Structure of Style, pp. 79-112. Springer, Berlin, Heidelberg, 2010.
  17. Argamon, Shlomo, Jeff Dodick, and Paul Chase. "Language use reflects scientific methodology: A corpus-based study of peer-reviewed journal articles." Scientometrics 75, no. 2 (2008): 203-238.
  18. "Language Matters! – Exciting insights into the realm of Applied Linguistics" . Retrieved 2020-04-20.
  19. "Doubts Persist on U.S. Claims of North Korean Role in Sony Hack". NPR.org.
  20. "New Study May Add to Skepticism Among Security Experts That North Korea Was Behind Sony Hack". 2014-12-24.
  21. Savage, Charlie; Perlroth, Nicole (2016-07-27). "Is D.N.C. Email Hacker a Person or a Russian Front? Experts Aren't Sure". The New York Times.
  22. "The NSA Data Leakers Might be Faking Their Awful English to Deceive Us". 2016-08-18.
  23. "Second Snowden could be behind sale of NSA hacking tools".
  24. "The Well-Rounded Data Scientist". 16 April 2014.
  25. "Becoming a Data Scientist Podcast Episode 03: Shlomo Argamon | Becoming a Data Scientist".
  26. "BCS Register of Members". wam.bcs.org. Retrieved 2018-10-05.
  27. "Sixth Annual Forensic Linguistics Distinguished Visitor Lecture" (PDF).