Gregory I. Piatetsky-Shapiro (born 7 April 1958) is a data scientist and the co-founder of the KDD conferences, and co-founder and past chair of the Association for Computing Machinery SIGKDD group for Knowledge Discovery, Data Mining and Data Science. [1] He is the founder and president of KDnuggets, [2] a discussion and learning website for Business Analytics, Data Mining and Data Science.
A Jewish refugee from Soviet Union, Gregory Piatetsky was born in Moscow, Russia to Inna Mogilevskaya and mathematician Ilya Piatetski-Shapiro. He was admitted in 1970 to Physics-Mathematics School no. 2, a leading math school in Moscow. [3] [4]
In March 1974, Piatetsky emigrated to Israel with his family, studying mathematics and computer science at Tel Aviv University for one semester at Technion. [5] He subsequently earned MS (1979) and Ph.D. (1984) degrees from NYU Courant Institute. [6]
In 1984, his first paper was published in SIGMOD, proving that secondary index selection is NP-complete by reducing it to a set cover problem. [7] In his dissertation, he proved that the greedy method for set cover has a lower bound of 1 - 1/e ~ 63% of the optimal. [8]
He joined GTE Laboratories, where he worked on intelligent interfaces relating to databases. In 1989, he proposed a new project at GTE called "Knowledge Discovery in Databases". The project created advanced prototypes, including KEFIR (Key Findings Reporter), [9] a system for analysis and summarization of key changes in large databases, which was a forerunner of systems like Google Analytics Intelligence. A KEFIR prototype was applied to GTE health care data and received GTE's highest technical award. [10]
In 1997, he left GTE to join Knowledge Stream Partners (KSP), where he was Director and later Vice President and Chief Scientist. [11] In April 2000, KSP was acquired by Xchange, Inc., [12] where Piatetsky served as VP and Chief Scientist. [11]
Piatetsky left Xchange in May 2001 to become a self-employed consultant and focus on KDnuggets. [13]
In 1989, Piatetsky organized the first workshop on Knowledge Discovery in Data (KDD-89), held at IJCAI-1989 in Detroit, MI. [1] This workshop had over 60 attendees, including researchers Ross Quinlan and Jaime Carbonell.[ citation needed ]
Piatetsky organized the next two KDD workshops, in 1991 and 1993. [1] With Usama Fayyad and Ramasamy (Sam) Uthurusamy, he expanded the workshops into an annual international conference on Data Mining and was the General Chair of the KDD-98 conference. [14] He served as the chair of the KDD Steering committee until 1998, when the SIGKDD group was formed as part of ACM to run the annual KDD conference and help promote research in Knowledge Discovery and Data Mining. He served as Director of SIGKDD for 2001–2005 and as SIGKDD Chair for 2005–2009. [15]
In 1997, Piatetsky and Ismail Parsa initiated the KDD Cup competition, which was the world's first open data mining contest. [16]
The annual ACM SIGKDD conference is the leading research conference on Knowledge Discovery and Data Mining, according to Microsoft Academic search [17] and Google Scholar. [18] The 21st ACM SIGKDD conference was held in Sydney, Australia in August 2015.
In 1993, Piatetsky started Knowledge Discovery Nuggets (KDnuggets) as a newsletter to connect researchers who attended the KDD-93 workshop. With the emergence of the Internet and Mosaic, he and Chris Matheus eventually created the website: Knowledge Discovery Mine, [19] hosted at GTE Labs. The newsletter served as an unofficial publication of KDD workshops. When Piatetsky left GTE Labs, he created the KDnuggets website, [20] with the mission of covering the field with short, concise "nuggets". The resource started as a directory for the subjects of data mining and data science, including Software, jobs, academic positions, CFP (calls for papers), companies, courses, datasets, education, meetings, publications and webcasts.
KDnuggets' main focus is to cover the fields of Business Analytics, Data Mining, and Data Science, including interviews with key leaders. It offers a free data mining course for advanced undergraduates or first-year graduate students. [21]
@KDnuggets Twitter was
In February 2015, Piatetsky and Data ScienceTech Institute announced a partnership and he became an Honorary Member of its Scientific Advisory Board. [22]
In 1991, Piatetsky and William (Bud) Frawley edited their first book Knowledge Discovery in Databases. In 1996, Piatetsky, Usama Fayyad, Padhraic Smyth, and Ramasamy Uthurusamy edited a follow-up Advances in Knowledge Discovery and Data Mining. [23]
Piatetsky also helped launch and co-edit the Data Mining and Knowledge Discovery journal.[ citation needed ] He authored 9 edited books and collections and over 60 technical papers, articles and book chapters, mostly focusing on data mining and knowledge discovery.[ citation needed ].
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
A Chief Data Officer (CDO) is a corporate officer responsible for enterprise-wide governance and utilization of information as an asset, via data processing, analysis, data mining, information trading and other means. CDOs usually report to the chief executive officer (CEO), although depending on the area of expertise this can vary. The CDO is a member of the executive management team and manager of enterprise-wide data processing and data mining.
The Cross-industry standard process for data mining, known as CRISP-DM, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model.
Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".
SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference.
Jiawei Han is a Chinese-American computer scientist and writer. He currently holds the position of Michael Aiken Chair Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research focuses on data mining, text mining, database systems, information networks, data mining from spatiotemporal data, Web data, and social/information network data.
Data Mining and Knowledge Discovery is a bimonthly peer-reviewed scientific journal focusing on data mining published by Springer Science+Business Media. It was started in 1996 and launched in 1997 by Usama Fayyad as founding Editor-in-Chief by Kluwer Academic Publishers. The first Editorial provides a summary of why it was started.
Hans-Peter Kriegel is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science. He was previously professor at the University of Würzburg and the University of Bremen after habilitation at the Technical University of Dortmund and doctorate from Karlsruhe Institute of Technology.
AMiner is a free online service used to index, search, and mine big scientific data.
Foster Provost is an American computer scientist, information systems researcher, and Professor of Data Science, Professor of Information Systems and Ira Rennert Professor of Entrepreneurship at New York University's Stern School of Business. He is also the Director for the Data Science and AI Initiative at Stern's Fubon Center for Technology, Business and Innovation. Professor Provost has a Bachelor of Science from Duquesne University in physics and mathematics and a Master of Science and Ph.D. in computer science from the University of Pittsburgh.
Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry. It consists of approximately 50 multiple choice and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2) Algorithms, (3) Models, (4) Tools, (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service to the data mining community, and the results are usually announced at the PAW conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated. After 2011, Rexer Analytics moved to a biannual schedule.
Jie Tang is a full-time professor at the Department of Computer Science of Tsinghua University. He received a PhD in computer science from the same university in 2006. He is known for building the academic social network search system AMiner, which was launched in March 2006 and now has attracted 2,766,356 independent IP accesses from 220 countries. His research interests include social networks and data mining.
Usama M. Fayyad is an American-Jordanian data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining. He is a speaker on Business Analytics, Data Mining, Data Science, and Big Data. He recently left his role as the Chief Data Officer at Barclays Bank.
Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the corresponding foundations, frameworks, algorithms, models, architectures, and evaluation systems for actionable knowledge discovery.
Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.
Gautam Das is a computer scientist in the field of databases research. He is an ACM Fellow and IEEE Fellow.
Hui Xiong is a data scientist. He is a distinguished professor at Rutgers University and a distinguished guest professor at the University of Science and Technology of China (USTC).
Wei Wang is a Chinese-born American computer scientist. She is the Leonard Kleinrock Chair Professor in Computer Science and Computational Medicine at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). Her research specializes in big data analytics and modeling, database systems, natural language processing, bioinformatics and computational biology, and computational medicine.
Nitesh V. Chawla is a computer scientist and data scientist currently serving as the Frank M. Freimann Professor of Computer Science and Engineering at the University of Notre Dame. He is the Founding Director of the Lucy Family Institute for Data & Society. Chawla's research expertise lies in machine learning, data science, and network science. He is also the co-founder of Aunalytics, a data science software and cloud computing company. Chawla is a Fellow of the American Association for the Advancement of Sciences (AAAS), Association for Computing Machinery (ACM), a Fellow of the Association for the Advancement of Artificial Intelligence, a Fellow of the Asia Pacific Artificial Intelligence Association, and a Fellow of the Institute of Electrical and Electronics Engineers (IEEE). He has received multiple awards, including the 1st Source Bank Commercialization Award in 2017, Outstanding Teaching Award (twice), IEEE CIS Early Career Award, National Academy of Engineering New Faculty Award, and the IBM Big Data Award in 2013. One of Chawla's most recognized publications, with a citation count of over 25,000, is the research paper titled "SMOTE: Synthetic Minority Over-sampling Technique." Chawla's research has garnered a citation count of over 62,000 and an H-index of 80.
Gregory Piatetsky-Shapiro has received the first ACM SIGKDD Service award for starting the KDD conferences and contributions to the KDD community, including KDnuggets newsletter. Dr. Piatetsky-Shapiro is the founder of the Knowledge Discovery in Database conference series (KDD, now the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
{{cite journal}}
: Cite journal requires |journal=
(help)2. ACM SIGKDD International Conference on Knowledge discovery and data mining(Ranked #1 is a journal, not a conference.)
Dr. Piatetsky-Shapiro is the founder of the Knowledge Discovery in Database conference series (KDD, now the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).