Marti Hearst

Last updated
Marti Hearst
Marti Hearst 2008.jpg
Hearst in 2008
NationalityAmerican
Alma mater University of California, Berkeley (BA, MS, PhD)
Known forHearst patterns
Scientific career
FieldsComputer Science
Institutions
Thesis Context and structure in automated full-text information access  (1994)
Doctoral advisor Robert Wilensky
Other academic advisors Michael Stonebraker
Doctoral students Cecilia R. Aragon
Website people.ischool.berkeley.edu/~hearst/

Martha Alice Hearst is a professor in the School of Information at the University of California, Berkeley. She did early work in corpus-based computational linguistics, including some of the first work in automating sentiment analysis, [1] and word sense disambiguation. [2] She invented an algorithm that became known as "Hearst patterns" [3] which applies lexico-syntactic patterns to recognize hyponymy [4] (ISA) relations with high accuracy in large text collections, including an early application of it to WordNet; [5] this algorithm is widely used in commercial text mining applications including ontology learning. Hearst also developed early work in automatic segmentation of text into topical discourse boundaries, inventing a now well-known approach called TextTiling. [6]

Hearst's research is on user interfaces for search engine technology [7] [8] [9] and big data analytics. [10] [11] [12] She did early work in user interfaces and information visualization for search user interfaces, inventing the TileBars query term visualization. [13] Her Flamenco research project investigated and developed the now widely used faceted navigation approach for searching and browsing web sites and information collections. [14] [15] She wrote the first academic book on the topic of Search User Interfaces (Cambridge University Press, 2009). [12]

Hearst is an Edge Foundation contributing author and a member of the Usage panel of the American Heritage Dictionary of the English Language.[ citation needed ]

Hearst received her B.A., M.S., and Ph.D. in computer science, all from Berkeley. [16] In 2013 she became a fellow of the Association for Computing Machinery. [17] She became a member of the CHI Academy in 2017, and has previously served as president of the Association for Computational Linguistics and on the advisory council of NSF's CISE Directorate. [18] Additionally, she has been a member of the Web Board for CACM, the Usage Panel for the American Heritage Dictionary, the Edge.org panel of experts, the research staff at Xerox PARC, and the boards of ACM Transactions on the Web, Computational Linguistics, ACM Transactions on Information Systems, and IEEE Intelligent Systems.[ citation needed ]

Hearst has received an NSF CAREER award, an IBM Faculty Award, and an Okawa Foundation Fellowship. Her work on user interfaces has had a profound impact on the industry, earning Hearst two Google Research Awards and four Excellence in Teaching Awards.} She has also led projects worth over $3.5M in research grants. [19]

Hearst’s publications date back to 1990, when ‘A Hybrid Approach to Restricted Text Interpretation’ was published in Stanford University’s AAAI Spring Symposium on Text Based Intelligent Systems in March of that year.[ citation needed ]

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<span class="mw-page-title-main">DBLP</span> Computer science bibliography website

DBLP is a computer science bibliography website. Starting in 1993 at Universität Trier in Germany, it grew from a small collection of HTML files and became an organization hosting a database and logic programming bibliography site. Since November 2018, DBLP is a branch of Schloss Dagstuhl – Leibniz-Zentrum für Informatik (LZI). DBLP listed more than 5.4 million journal articles, conference papers, and other publications on computer science in December 2020, up from about 14,000 in 1995 and 3.66 million in July 2016. All important journals on computer science are tracked. Proceedings papers of many conferences are also tracked. It is mirrored at three sites across the Internet.

<span class="mw-page-title-main">Ben Shneiderman</span> American computer scientist

Ben Shneiderman is an American computer scientist, a Distinguished University Professor in the University of Maryland Department of Computer Science, which is part of the University of Maryland College of Computer, Mathematical, and Natural Sciences at the University of Maryland, College Park, and the founding director (1983-2000) of the University of Maryland Human-Computer Interaction Lab. He conducted fundamental research in the field of human–computer interaction, developing new ideas, methods, and tools such as the direct manipulation interface, and his eight rules of design.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

Terminology extraction is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

Faceted search augments lexical search with a faceted navigation system, allowing users to narrow results by applying filters based on a faceted classification of the items. It is a parametric search technique. A faceted classification system classifies each information element along multiple explicit dimensions, facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, predetermined, taxonomic order.

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

Ed Huai-Hsin Chi is a Taiwanese American computer scientist and research scientist at Google, known for his early work in applying the theory of information scent to predict usability of websites.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

A corpus manager is a tool for multilingual corpus analysis, which allows effective searching in corpora.

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts, through some specific characteristics, through user-provided sketches, or through content-based similarity search.

Emily Menon Bender is an American linguist who is a professor at the University of Washington. She specializes in computational linguistics and natural language processing. She is also the director of the University of Washington's Computational Linguistics Laboratory. She has published several papers on the risks of large language models and on ethics in natural language processing.

Argument technology is a sub-field of collective intelligence and artificial intelligence that focuses on applying computational techniques to the creation, identification, analysis, navigation, evaluation and visualisation of arguments and debates.

<span class="mw-page-title-main">Shumin Zhai</span> Human–computer interaction research scientist

Shumin Zhai is a Chinese-born American Canadian Human–computer interaction (HCI) research scientist and inventor. He is known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. His studies have contributed to both foundational models and understandings of HCI and practical user interface designs and flagship products. He previously worked at IBM where he invented the ShapeWriter text entry method for smartphones, which is a predecessor to the modern Swype keyboard. Dr. Zhai's publications have won the ACM UIST Lasting Impact Award and the IEEE Computer Society Best Paper Award, among others, and he is most known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. Dr. Zhai is currently a Principal Scientist at Google where he leads and directs research, design, and development of human-device input methods and haptics systems.

Meredith Ringel Morris is an American computer scientist who works in human-computer interaction and collaborative web search. She is a principal scientist at Google Brain and an affiliate professor at the University of Washington in The Paul G. Allen School of Computer Science & Engineering and in The Information School.

Preslav Nakov is a computer scientist who works on natural language processing. He is particularly known for his research on fake news detection, automatic detection of offensive language, and biomedical text mining. Nakov obtained a PhD in computer science under the supervision of Marti Hearst from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria.

References

  1. Hearst, M. (1992). Direction-Based Text Interpretation as an Information Access (in Text-Based Intelligent Systems). Lawrence Erlbaum.
  2. Hearst, M. (1991). "Noun Homograph Disambiguation using Local Context in Large Text Corpora" (PDF). Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora. Oxford. Retrieved February 15, 2013.
  3. Indurkhya, N., Damerau, F. (2010). Handbook of Natural Language Processing. Chapman & Hall/CRC. p. 594.{{cite book}}: CS1 maint: multiple names: authors list (link)
  4. "Automatic Acquisition of Hyponyms from Large Text Corpora" (PDF). Proceedings of the Fourteenth International Conference on Computational Linguistics. Nantes, France. 1992. Retrieved February 15, 2013.
  5. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press.
  6. "Multi-Paragraph Segmentation of Expository Text" (PDF). Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, NM. June 1994. Retrieved February 15, 2013.
  7. "ACM Hypertext 2011 Keynotes". 22nd ACM Conference on Hypertext and Hypermedia. Association for Computing Machinery. 2011-06-06. Archived from the original on 2016-06-04. Retrieved 2013-05-08.
  8. Tate, Ryan (2013-01-15). "Facebook Announces New Search Engine". Wired. Wired.com. Retrieved 2013-05-08.
  9. Hearst, Marti A. (2011-11-01). "'Natural' Search User Interfaces". Communications of the ACM, Vol. 54, No. 11. Association for Computing Machinery. pp. 60–67. Retrieved 2013-05-08.
  10. Isaac, Mike (2012-12-14). "Twitter Takes Big Data to School". AllThingsD. Retrieved 2013-05-08.
  11. Keen, Andrew (2012-05-12). "Keen On… Big Data: Why UC Berkeley Might Have An Edge Over Stanford [TCTV]". TechCrunch.com. Retrieved 2013-05-08.
  12. 1 2 Yee, Christopher (2012-11-13). "Five Questions with Marti Hearst, 'Big Data' pioneer". The Daily Californian. University of California, Berkeley. Retrieved 2013-05-08.
  13. Hearst, M. (1995). "TileBars: Visualization of Term Distribution Information in Full Text Information Access" (PDF). Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM SIGCHI Conference on Human Factors in Computing Systems. Denver, CO. Retrieved February 15, 2013.
  14. Hearst, M. (September 2000). "Next Generation Web Search: Setting Our Sites" (PDF). In Gravano, Luis (ed.). In IEEE Data Engineering Bulletin. Special issue on Next Generation Web Search. Retrieved February 15, 2013.
  15. Yee, K-P., Swearingen, K., Li, K., and Hearst, M. (2003). "Faceted Metadata Image Search and Browsing" (PDF). in Proceedings of ACM CHI 2003. Retrieved February 15, 2013.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  16. Hearst, Martha Alice (1994). Context and structure in automated full-text information access (Ph.D. thesis). University of California, Berkeley. OCLC   33496523. ProQuest   304100421.
  17. ACM Names Fellows for Computing Advances that Are Transforming Science and Society Archived 2014-07-22 at the Wayback Machine , Association for Computing Machinery, accessed 2013-12-10.
  18. "Marti A. Hearst: Bio and CV". people.ischool.berkeley.edu. Retrieved 2021-09-08.
  19. "Marti A. Hearst: Bio and CV". people.ischool.berkeley.edu. Retrieved 8 September 2021.