ChengXiang Zhai | |
---|---|
Alma mater | Nanjing University Carnegie Mellon University |
Scientific career | |
Fields | Information Retrieval Text Mining Natural Language Processing Machine Learning Bioinformatics |
Institutions | University of Illinois at Urbana-Champaign |
Thesis | Risk Minimization and Language Modeling in Text Retrieval |
Doctoral advisor | John D. Lafferty |
Website | czhai |
ChengXiang Zhai is a computer scientist. He is a Donald Biggar Willett Professor in Engineering in the Department of Computer Science at the University of Illinois at Urbana-Champaign. [1]
Zhai received the BS (1984), MS (1987, under Guoliang Zheng), and PhD (1990, under Jiafu Xu) in Computer Science from Nanjing University. He spent 1990 to 1993 working at Nanjing University's State Key Laboratory for Novel Software Technology. In 1993, he left for America to pursue a second PhD, this time at Carnegie Mellon University (CMU) with David A. Evans. [2] Evans then left to spend more time with the company ClariTech. Zhai obtained from CMU a MS (1997) in computational linguistics and then started working with John Lafferty. He finally received from CMU a PhD in Language and Information Technologies in 2002. [3]
Since then, he has been an Assistant Professor (2002–2008), [4] Associate Professor (2008–2013), [5] Professor (2013–2018), and Donald Biggar Willett Professor (2018–) at the UIUC Department of Computer Science. [6] He also holds joint appointments with the Carl R. Woese Institute for Genomic Biology, Department of Statistics, and School of Information Sciences at UIUC. [3]
Zhai's son Alex has earned three medals at the International Mathematical Olympiad. [22] [23]
In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.
Gerard A. "Gerry" Salton was a professor of Computer Science at Cornell University. Salton was perhaps the leading computer scientist working in the field of information retrieval during his time, and "the father of Information Retrieval". His group at Cornell developed the SMART Information Retrieval System, which he initiated when he was at Harvard. It was the very first system to use the now popular vector space model for information retrieval.
A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.
The Gerard Salton Award is presented by the Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) every three years to an individual who has made "significant, sustained and continuing contributions to research in information retrieval". SIGIR also co-sponsors the Vannevar Bush Award, for the best paper at the Joint Conference on Digital Libraries.
Susan Dumais is an American computer scientist who is a leader in the field of information retrieval, and has been a significant contributor to Microsoft's search technologies. According to Mary Jane Irwin, who heads the Athena Lecture awards committee, “Her sustained contributions have shaped the thinking and direction of human-computer interaction and information retrieval."
Robert William "Bob" Harper, Jr. is a computer science professor at Carnegie Mellon University who works in programming language research. Prior to his position at Carnegie Mellon, Harper was a research fellow at the University of Edinburgh.
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents. Query expansion involves techniques such as:
Plagiarism detection or content similarity detection is the process of locating instances of plagiarism or copyright infringement within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.
SIGIR is the Association for Computing Machinery's Special Interest Group on Information Retrieval. The scope of the group's specialty is the theory and application of computers to the acquisition, organization, storage, retrieval and distribution of information; emphasis is placed on working with non-numeric information, ranging from natural language to highly structured data bases.
XML retrieval, or XML information retrieval, is the content-based retrieval of documents structured with XML. As such it is used for computing relevance of XML documents.
W. Bruce Croft is a distinguished professor of computer science at the University of Massachusetts Amherst whose work focuses on information retrieval. He is the founder of the Center for Intelligent Information Retrieval and served as the editor-in-chief of ACM Transactions on Information Systems from 1995 to 2002. He was also a member of the National Research Council Computer Science and Telecommunications Board from 2000 to 2003. Since 2015, he is the Dean of the College of Information and Computer Sciences at the University of Massachusetts Amherst. He was Chair of the UMass Amherst Computer Science Department from 2001 to 2007.
Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.
David Ron Karger is an American computer scientist who is professor and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology.
Monika Henzinger is a German computer scientist, and is a former director of research at Google. She is currently a professor at the Institute of Science and Technology Austria. Her expertise is mainly on algorithms with a focus on data structures, algorithmic game theory, information retrieval, search algorithms and Web data mining. She is married to Thomas Henzinger and has three children.
John D. Lafferty is an American scientist, Professor at Yale University and leading researcher in machine learning. He is best known for proposing the Conditional Random Fields with Andrew McCallum and Fernando C.N. Pereira.
Shumin Zhai is a Chinese-born American Canadian Human–computer interaction (HCI) research scientist and inventor. He is known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. His studies have contributed to both foundational models and understandings of HCI and practical user interface designs and flagship products. He previously worked at IBM where he invented the ShapeWriter text entry method for smartphones, which is a predecessor to the modern Swype keyboard. Dr. Zhai's publications have won the ACM UIST Lasting Impact Award and the IEEE Computer Society Best Paper Award, among others, and he is most known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. Dr. Zhai is currently a Principal Scientist at Google where he leads and directs research, design, and development of human-device input methods and haptics systems.
Wei Wang is a Chinese-born American computer scientist. She is the Leonard Kleinrock Chair Professor in Computer Science and Computational Medicine at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). Her research specializes in big data analytics and modeling, database systems, natural language processing, bioinformatics and computational biology, and computational medicine.
Hsiao-Wuen Hon is a Taiwanese-US researcher in speech technology, and coauthor of the book Spoken Language Processing. He is Corporate Vice President of Microsoft and Chairman of Microsoft's Asia-Pacific R&D Group.
Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents. It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE and its successor SPLADE v2. Others include DeepCT, uniCOIL, EPIC, DeepImpact, TILDE and TILDEv2, Sparta, SPLADE-max, and DistilSPLADE-max.