Rada Mihalcea

Last updated
Rada Mihalcea
Rada Mihalcea.jpg
Mihalcea at Bob and Betty Beyster Building, University of Michigan
Born
Education Technical University of Cluj-Napoca (1992), Southern Methodist University (1999, 2001), Oxford University (2010)
OccupationProfessor at University of Michigan
Known for

Rada Mihalcea is a Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan. She has made contributions to natural language processing, multimodal processing, and computational social science. With Paul Tarau, she is the co-inventor of TextRank Algorithm, [1] which is widely used for text summarization.

Contents

Career

Mihalcea has a Ph.D. in Computer Science and Engineering from Southern Methodist University (2001) and a Ph.D. in Linguistics, Oxford University (2010). [2] In 2017 she was named Director of the Artificial Intelligence Laboratory at University of Michigan, Computer Science and Engineering. In 2018, Mihalcea was elected as new VP for the Association for Computational Linguistics (ACL). She is a professor of Computer Science and Engineering at the University of Michigan, where she also leads the Language and Information Technologies (LIT) Lab. [3]

A prolific researcher, Mihalcea has authored or coauthored over 350 articles since 1998 on topics ranging from semantic analysis of text to lie detection. [4] Her work has been cited over 40,000 times on Google Scholar, which made her one of the most cited scholars in Multimodal Interation and Computational Social Science. [5]

In 2008, Mihalcea received the Presidential Early Career Award for Scientists and Engineers (PECASE) [6] She is an ACM Fellow (since 2019) and AAAI Fellow (since 2021).

Mihalcea is an outspoken promoter of diversity in computer science. She also supports an expansion of the traditional analysis of educational success, which tends to focus on academic behaviour, to include student life, personality and background outside of the classroom. [7] Mihalcea leads Girls Encoded, a program designed to develop the pipeline of women in computer science as well as to retain the women who have entered into the program. [8] [9] [10]

Awards

Research

Mihalcea is known for her research in natural language processing, multimodal processing, computational social sciences. In a collaboration she leads at the University of Michigan, Mihalcea has created software that can detect human lying. [15] In a study of video clips of high profile court cases, a computer was more accurate at detecting deception than human judges. [16] [17] [18]

Mihalcea's lie-detection software uses machine learning techniques to analyze video clips of actual trials. [19] In her 2015 study, the team used clips from The Innocence Project, a national organization that works to reexamine cases where individuals were tried without the benefit of DNA testing with the aim of exonerating wrongfully convicted individuals. [20] After identifying common human gestures, they transcribed the audio from the video clips of trials and analyzed how often subjects labeled deceptive used various words and phrases. The system was 75% accurate in identifying which subjects were deceptive among 120 videos. [20] [21] That puts Mihalcea’s algorithm on par with the most commonly accepted form of lie detection, polygraph tests, which are roughly 85 percent accurate when testing guilty people and 56 percent accurate when testing the innocent. [22] She notes there are still improvements to be made — in particular to account for cultural and demographic differences. [20] A possibly unique advantage of Mihalcea's study was the real world, high stakes nature of the footage analyzed in the study. In laboratory experiments, it is difficult to create a setting that motivates people to truly lie. [23]

In 2018, Mihalcea and her collaborators worked on an algorithm-based system that identifies linguistic cues in fake news stories. It successfully found fakes up to 76% of the time, compared to a human success rate of 70%. [24]

Publications

Books

Journals and conferences

Personal life

Mihalcea was born in Cluj-Napoca, Romania, where she attended the Technical University of Cluj-Napoca.

She can speak Romanian, English, Italian, and French.

Mihalcea has two children - Zara (b. 2009) and Caius (b. 2013). They were both born in Dallas, Texas.

She is married to an associate professor of engineering at the University of Michigan–Flint - Mihai Burzo. They met while they were both completing Ph.D.s at Southern Methodist University in 2001 [25] and have often collaborated on research, [26] such as the 2015 study on lie detection. [22]

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation (TOC), formal language theory, the lambda calculus and type theory.

Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper. The term encompasses both the application of semiotics to computer hardware and software design and, conversely, the use of computation for performing semiotic analysis. The former focuses on what semiotics can bring to computation; the latter on what computation can bring to semiotics.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.

Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. The "topics" produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

William Aaron Woods, generally known as Bill Woods, is a researcher in natural language processing, continuous speech understanding, knowledge representation, and knowledge-based search technology. He is currently a Software Engineer at Google.

<span class="mw-page-title-main">Jitendra Malik</span> Indian-American academic (born 1960)

Jitendra Malik is an Indian-American academic who is the Arthur J. Chick Professor of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He is known for his research in computer vision.

Dragomir R. Radev was an American computer scientist who was a professor at Yale University, working on natural language processing and information retrieval. He also served as a University of Michigan computer science professor and Columbia University computer science adjunct professor, as well as a Member of the Advisory Board of Lawyaw.

Stephanie Forrest is an American computer scientist and director of the Biodesign Center for Biocomputing, Security and Society at the Biodesign Institute at Arizona State University. She was previously Distinguished Professor of Computer Science at the University of New Mexico in Albuquerque. She is best known for her work in adaptive systems, including genetic algorithms, computational immunology, biological modeling, automated software repair, and computer security.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

References

  1. "TextRank: Bringing Order into Texts" (PDF). ACL . Retrieved 2024-03-17.
  2. "The Language of Humor, PhD Dissertation". Oxford University . Retrieved 2021-02-13.
  3. "Language Information and Technologies". lit.eecs.umich.edu. Retrieved 2019-03-07.
  4. "Rada Mihalcea". dblp . Retrieved 2024-03-16.
  5. "Rada Mihalcea". Google Scholar . Retrieved 2024-03-17.
  6. "President Honors Outstanding Early-Career Scientists". National Science Foundation . Retrieved 2017-08-30.
  7. "U Michigan MIDAS Program Backs Student Success Research". Campus Technology. Retrieved 2016-06-23.
  8. "Girls Encoded". girlsencoded.eecs.umich.edu. Retrieved 2019-03-07.
  9. "Making a difference for women in academia". University of Michigan EECS. Retrieved 2019-03-07.
  10. "A champion for women in computer science". University of Michigan EECS. Retrieved 2019-03-07.
  11. 2019 ACM Fellows Recognized for Far-Reaching Accomplishments that Define the Digital Age, Association for Computing Machinery, retrieved 2019-12-11
  12. "Sarah Goddard Power Award". The University Record. Retrieved 2019-03-07.
  13. "Carol Hollenshead Award | Center for the Education of Women | University of Michigan". www.cew.umich.edu. Retrieved 2019-03-07.
  14. "President Honors Outstanding Early-Career Scientists | NSF - National Science Foundation". www.nsf.gov. Retrieved 2019-03-07.
  15. "Researchers Develop New Lie-Detecting Software". Topnews.in. Retrieved 2015-12-16.
  16. "Can you spot a liar? Fail safe ways to determine if someone is telling the truth". New Zealand Herald. Retrieved 2017-01-30.
  17. "New Developed Software can detect lie with %75 success – Baltimore News". Albany Daily Star. Retrieved 2016-08-17.[ permanent dead link ]
  18. "To spot a liar, look at their hands". Quartz. 12 December 2015. Retrieved 2015-12-12.
  19. "Courtroom fibs used to develop lie-detecting software". Gizmag. 2015-12-12. Retrieved 2015-12-12.
  20. 1 2 3 "University professors create new software to detect lies". Michigan Daily. 10 December 2015. Retrieved 2015-12-11.
  21. "Liar, Liar Pants On Fire: 6 Signs Computers Use To Spot Liars With 75% Accuracy". Medical Daily. 2015-12-15. Retrieved 2015-12-16.
  22. 1 2 "5 Ways to Tell If Someone is Lying to You". Yahoo! Health. 15 December 2015. Retrieved 2015-12-15.
  23. "New software analysis words, gestures to detect lies". Jagran Post. Retrieved 2015-12-11.
  24. "Fake news detector algorithm works better than a human". University of Michigan News. 2018-08-21. Retrieved 2019-03-26.
  25. "Episode 31: From Romania – Immigrant Computer Scientists Podcast" . Retrieved 2023-03-19.
  26. "Mihai Burzo's research works | University of Michigan".