Marti Hearst

Last updated
Marti Hearst
Marti Hearst 2008.jpg
Hearst in 2008
NationalityAmerican
Alma mater University of California, Berkeley (BA, MS, PhD)
Known forHearst patterns
Scientific career
FieldsComputer Science
Institutions
Thesis Context and structure in automated full-text information access  (1994)
Doctoral advisor Robert Wilensky
Other academic advisors Michael Stonebraker
Doctoral students Cecilia R. Aragon
Website people.ischool.berkeley.edu/~hearst/

Martha Alice Hearst is a professor in the School of Information at the University of California, Berkeley. She did early work in corpus-based computational linguistics, including some of the first work in automating sentiment analysis, [1] and word sense disambiguation. [2] She invented an algorithm that became known as "Hearst patterns" [3] which applies lexico-syntactic patterns to recognize hyponymy [4] (ISA) relations with high accuracy in large text collections, including an early application of it to WordNet; [5] this algorithm is widely used in commercial text mining applications including ontology learning. Hearst also developed early work in automatic segmentation of text into topical discourse boundaries, inventing a now well-known approach called TextTiling. [6]

Hearst's research is on user interfaces for search engine technology [7] [8] [9] and big data analytics. [10] [11] [12] She did early work in user interfaces and information visualization for search user interfaces, inventing the TileBars query term visualization. [13] Her Flamenco research project investigated and developed the now widely used faceted navigation approach for searching and browsing web sites and information collections. [14] [15] She wrote the first academic book on the topic of Search User Interfaces (Cambridge University Press, 2009). [12]

Hearst is an Edge Foundation contributing author and a member of the Usage panel of the American Heritage Dictionary of the English Language.[ citation needed ]

Hearst received her B.A., M.S., and Ph.D. in computer science, all from Berkeley. [16] In 2013 she became a fellow of the Association for Computing Machinery. [17] She became a member of the CHI Academy in 2017, and has previously served as president of the Association for Computational Linguistics and on the advisory council of NSF's CISE Directorate. [18] Additionally, she has been a member of the Web Board for CACM, the Usage Panel for the American Heritage Dictionary, the Edge.org panel of experts, the research staff at Xerox PARC, and the boards of ACM Transactions on the Web, Computational Linguistics, ACM Transactions on Information Systems, and IEEE Intelligent Systems.[ citation needed ]

Hearst has received an NSF CAREER award, an IBM Faculty Award, and an Okawa Foundation Fellowship. Her work on user interfaces has had a profound impact on the industry, earning Hearst two Google Research Awards and four Excellence in Teaching Awards.} She has also led projects worth over $3.5M in research grants. [19]

Hearst’s publications date back to 1990, when ‘A Hybrid Approach to Restricted Text Interpretation’ was published in Stanford University’s AAAI Spring Symposium on Text Based Intelligent Systems in March of that year.[ citation needed ]

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005), there are three perspectives of text mining: information extraction, data mining, and knowledge discovery in databases (KDD). Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<span class="mw-page-title-main">DBLP</span> Computer science bibliography website

DBLP is a computer science bibliography website. Starting in 1993 at Universität Trier in Germany, it grew from a small collection of HTML files and became an organization hosting a database and logic programming bibliography site. Since November 2018, DBLP is a branch of Schloss Dagstuhl – Leibniz-Zentrum für Informatik (LZI). DBLP listed more than 5.4 million journal articles, conference papers, and other publications on computer science in December 2020, up from about 14,000 in 1995 and 3.66 million in July 2016. All important journals on computer science are tracked. Proceedings papers of many conferences are also tracked. It is mirrored at three sites across the Internet.

<span class="mw-page-title-main">Ben Shneiderman</span> American computer scientist

Ben Shneiderman is an American computer scientist, a Distinguished University Professor in the University of Maryland Department of Computer Science, which is part of the University of Maryland College of Computer, Mathematical, and Natural Sciences at the University of Maryland, College Park, and the founding director (1983-2000) of the University of Maryland Human-Computer Interaction Lab. He conducted fundamental research in the field of human–computer interaction, developing new ideas, methods, and tools such as the direct manipulation interface, and his eight rules of design.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

Terminology extraction is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.

Faceted search augments lexical search with a faceted navigation system, allowing users to narrow results by applying filters based on a faceted classification of the items. It is a parametric search technique. A faceted classification system classifies each information element along multiple explicit dimensions, facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, predetermined, taxonomic order.

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

Ed Huai-Hsin Chi is a Taiwanese American computer scientist and research scientist at Google, known for his early work in applying the theory of information scent to predict usability of websites.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

A corpus manager is a tool for multilingual corpus analysis, which allows effective searching in corpora.

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts, through some specific characteristics, through user-provided sketches, or through content-based similarity search.

Emily Menon Bender is an American linguist who is a professor at the University of Washington. She specializes in computational linguistics and natural language processing. She is also the director of the University of Washington's Computational Linguistics Laboratory. She has published several papers on the risks of large language models and on ethics in natural language processing.

<span class="mw-page-title-main">Wendy Mackay</span> Computer Scientist

Wendy Elizabeth Mackay is a Canadian researcher specializing in human-computer interaction. She has served in all of the roles on the SIGCHI committee, including Chair. She is a member of the CHI Academy and a recipient of a European Research Council Advanced grant. She has been a visiting professor in Stanford University between 2010 and 2012, and received the ACM SIGCHI Lifetime Service Award in 2014.

Argument technology is a sub-field of collective intelligence and artificial intelligence that focuses on applying computational techniques to the creation, identification, analysis, navigation, evaluation and visualisation of arguments and debates.

Batya Friedman is an American professor in the University of Washington Information School. She is also an adjunct professor in the Paul G. Allen School of Computer Science and Engineering and adjunct professor in the Department of Human-Centered Design and Engineering, where she directs the Value Sensitive Design Research Lab. She received her PhD in learning sciences from the University of California, Berkeley School of Education in 1988, and has an undergraduate degree from Berkeley in computer science and mathematics.

Preslav Nakov is a computer scientist who works on natural language processing. He is particularly known for his research on fake news detection, automatic detection of offensive language, and biomedical text mining. Nakov obtained a PhD in computer science under the supervision of Marti Hearst from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria.

References

  1. Hearst, M. (1992). Direction-Based Text Interpretation as an Information Access (in Text-Based Intelligent Systems). Lawrence Erlbaum.
  2. Hearst, M. (1991). "Noun Homograph Disambiguation using Local Context in Large Text Corpora" (PDF). Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora. Oxford. Retrieved February 15, 2013.
  3. Indurkhya, N., Damerau, F. (2010). Handbook of Natural Language Processing. Chapman & Hall/CRC. p. 594.{{cite book}}: CS1 maint: multiple names: authors list (link)
  4. "Automatic Acquisition of Hyponyms from Large Text Corpora" (PDF). Proceedings of the Fourteenth International Conference on Computational Linguistics. Nantes, France. 1992. Retrieved February 15, 2013.
  5. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press.
  6. "Multi-Paragraph Segmentation of Expository Text" (PDF). Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, NM. June 1994. Retrieved February 15, 2013.
  7. "ACM Hypertext 2011 Keynotes". 22nd ACM Conference on Hypertext and Hypermedia. Association for Computing Machinery. 2011-06-06. Archived from the original on 2016-06-04. Retrieved 2013-05-08.
  8. Tate, Ryan (2013-01-15). "Facebook Announces New Search Engine". Wired. Wired.com. Retrieved 2013-05-08.
  9. Hearst, Marti A. (2011-11-01). "'Natural' Search User Interfaces". Communications of the ACM, Vol. 54, No. 11. Association for Computing Machinery. pp. 60–67. Retrieved 2013-05-08.
  10. Isaac, Mike (2012-12-14). "Twitter Takes Big Data to School". AllThingsD. Retrieved 2013-05-08.
  11. Keen, Andrew (2012-05-12). "Keen On… Big Data: Why UC Berkeley Might Have An Edge Over Stanford [TCTV]". TechCrunch.com. Retrieved 2013-05-08.
  12. 1 2 Yee, Christopher (2012-11-13). "Five Questions with Marti Hearst, 'Big Data' pioneer". The Daily Californian. University of California, Berkeley. Retrieved 2013-05-08.
  13. Hearst, M. (1995). "TileBars: Visualization of Term Distribution Information in Full Text Information Access" (PDF). Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM SIGCHI Conference on Human Factors in Computing Systems. Denver, CO. Retrieved February 15, 2013.
  14. Hearst, M. (September 2000). "Next Generation Web Search: Setting Our Sites" (PDF). In Gravano, Luis (ed.). In IEEE Data Engineering Bulletin. Special issue on Next Generation Web Search. Retrieved February 15, 2013.
  15. Yee, K-P., Swearingen, K., Li, K., and Hearst, M. (2003). "Faceted Metadata Image Search and Browsing" (PDF). in Proceedings of ACM CHI 2003. Retrieved February 15, 2013.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  16. Hearst, Martha Alice (1994). Context and structure in automated full-text information access (Ph.D. thesis). University of California, Berkeley. OCLC   33496523. ProQuest   304100421.
  17. ACM Names Fellows for Computing Advances that Are Transforming Science and Society Archived 2014-07-22 at the Wayback Machine , Association for Computing Machinery, accessed 2013-12-10.
  18. "Marti A. Hearst: Bio and CV". people.ischool.berkeley.edu. Retrieved 2021-09-08.
  19. "Marti A. Hearst: Bio and CV". people.ischool.berkeley.edu. Retrieved 8 September 2021.