Philip S. Yu

Last updated October 24, 2024

Philip S. Yu (born c. 1952) is an American computer scientist and professor of information technology at the University of Illinois at Chicago. He holds over 300 patents, and is known for his work in the field of data mining.

Biography

Yu received his B.S. in electrical engineering from the National Taiwan University, and his M.S. and Ph.D. also in electrical engineering from Stanford University in 1978, and received an M.B.A. from New York University in 1982.

He started his career in private enterprise, at IBM's Thomas J. Watson Research Center, where he eventually became manager of the Software Tools and Techniques group. Currently he is Distinguished Professor and Wexler Chair in Information Technology at the Department of Computer Science of the University of Illinois at Chicago

Yu holds over 300 U.S. patents, is an ACM and IEEE Fellow, is editor-in-chief of ACM Transactions on Knowledge Discovery from Data, has chaired numerous conferences, and received several awards, including from IBM, the IEEE ^[1] and, in 2022, he and his coauthors, Yizhou Sun, Jiawei Han, Xifeng Yan, and Tianyi Wu, received the Very Large Data Bases Endowment Inc. (VLDB) 2022 Test of Time award, for their 2011 research paper, PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks.^[2]

His research interests are in the fields of "data mining (especially on graph/network mining), social network, privacy preserving data publishing, data stream, database systems, and Internet applications and technologies."^[3] Yu is an ISI Highly Cited researcher. According to Google Scholar, Yu's H-index is among the ten highest in computer science.^[4]

Selected works

Yu has authored or co-authored several books and over 650 academic articles,^[5] including:

Zhang, Jiawei, Philip S. Yu. Broad Learning Through Fusions: An Application on Social Networks, Springer, 2019
Park, Jong Soo, Ming Syan Chen, and Philip S. Yu. An effective hash-based algorithm for mining association rules. Vol. 24. No. 2. ACM, 1995.
Chen, Ming-Syan, Jiawei Han, and Philip S. Yu. "Data mining: an overview from a database perspective." Knowledge and data Engineering, IEEE Transactions on 8.6 (1996): 866–883.
Aggarwal, Charu C., et al. "Fast algorithms for projected clustering." ACM SIGMOD Record. Vol. 28. No. 2. ACM, 1999.
Aggarwal, Charu C., et al. "A framework for clustering evolving data streams." Proceedings of the 29th international conference on Very large data bases-Volume 29. VLDB Endowment, 2003.
Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
Ross Quinlan, Qiang Yang, Zhou Zhihua and David Hand et al. Top 10 algorithms in data mining. Knowledge and Information Systems 14.1: 1-37. 2008

Related Research Articles

Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.

SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference.

Process mining is a family of techniques used to analyze event data in order to understand and improve operational processes. Part of the fields of data science and process management, process mining is generally built on logs that contain case id, a unique identifier for a particular process instance; an activity, a description of the event that is occurring; a timestamp; and sometimes other information such as resources, costs, and so on.

Jiawei Han is a Chinese-American computer scientist and writer. He currently holds the position of Michael Aiken Chair Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research focuses on data mining, text mining, database systems, information networks, data mining from spatiotemporal data, Web data, and social/information network data.

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time.

Tomasz Imieliński is a Polish-American computer scientist, most known in the areas of data mining, mobile computing, data extraction, and search engine technology. He is currently a professor of computer science at Rutgers University in New Jersey, United States.

l-diversity, also written as ℓ-diversity, is a form of group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or mining algorithms in order to gain some privacy. The l-diversity model is an extension of the k-anonymity model which reduces the granularity of data representation using techniques including generalization and suppression such that any given record maps onto at least k-1 other records in the data. The l-diversity model handles some of the weaknesses in the k-anonymity model where protected identities to the level of k-individuals is not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity. The l-diversity model adds the promotion of intra-group diversity for sensitive values in the anonymization mechanism.

Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.

Shojiro Nishio is a Japanese information scientist and technology scholar and the 18th president of Osaka University. Having co-authored or co-edited more than 55 books and more than 650 refereed journal or conference papers as well as serving on editorial boards of major information sciences journals, Nishio is considered one of the most prominent and influential researchers on database systems and networks.

Huan Liu is a Shanghai-born Chinese computer scientist.

Latifur Khan joined the University of Texas at Dallas in 2000, where he has been conducting research and teaching as a Professor in the Department of Computer Science.

An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".

Wei Wang is a Chinese-born American computer scientist. She is the Leonard Kleinrock Chair Professor in Computer Science and Computational Medicine at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). Her research specializes in big data analytics and modeling, database systems, natural language processing, bioinformatics and computational biology, and computational medicine.

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time $, the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.$

S. ("Muthu") Muthukrishnan is a computer scientist of Indian origin, known for his work in streaming algorithms, auction design, and pattern matching. He is vice president of sponsored products, Amazon Advertising.

Yixin Chen is a computer scientist, academic, and author. He is a professor of computer science and engineering at Washington University in St. Louis.

References

↑ "On Mining Big Data". Illinois Institute of Technology. Retrieved October 4, 2023.
↑ "Philip S. Yu receives Test of Time Award". University of Illinois at Chicago. Retrieved October 4, 2023.
↑ Curriculum Philip S. Yu at cs.uic.edu. Accessed September 2, 2013
↑ See H-index for computer science. Google Scholar's H-index metric includes self-citations.
↑ Philip S. Yu Google Scholar profile

External links

homepage at University of Illinois at Chicago

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "On Mining Big Data". Illinois Institute of Technology. Retrieved October 4, 2023.

[2] "Philip S. Yu receives Test of Time Award". University of Illinois at Chicago. Retrieved October 4, 2023.

[3] Curriculum Philip S. Yu at cs.uic.edu. Accessed September 2, 2013

[4] See H-index for computer science. Google Scholar's H-index metric includes self-citations.

[5] Philip S. Yu Google Scholar profile

[1]

[2]

[3]

[4]

[5]

Authority control databases
International	ISNI VIAF WorldCat
National	Germany United States France BnF data Netherlands Norway Poland Israel
Academics	CiNii ORCID Association for Computing Machinery 2 Google Scholar DBLP
Other	IdRef