This article is an autobiography or has been extensively edited by the subject or by someone connected to the subject.(October 2024) |
A major contributor to this article appears to have a close connection with its subject.(August 2024) |
Geoffrey I. Webb | |
---|---|
Citizenship | Australia |
Scientific career | |
Fields | Data Science Computer Science Artificial Intelligence Machine Learning Computational Biology |
Institutions | Monash University Department of Data Science and Artificial Intelligence |
Geoffrey I. Webb (also known as Geoff Webb) is Professor in the Department of Data Science and Artificial Intelligence at Monash University, founder and director of Data Mining software development and consultancy company G. I. Webb and Associates, [1] and former editor-in-chief of the journal Data Mining and Knowledge Discovery . [2]
He is an editor of the Encyclopedia of Machine Learning. [3]
Webb is a foundation member of the editorial advisory board of the journal Statistical Analysis and Data Mining. [4]
Webb was elected to the ACM Special Interest Group on Knowledge Discovery and Data Mining Executive Committee in 2017. [5]
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.
Bernhard Schölkopf is a German computer scientist known for his work in machine learning, especially on kernel methods and causality. He is a director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he heads the Department of Empirical Inference. He is also an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and Technische Universität Berlin, and chairman of the European Laboratory for Learning and Intelligent Systems (ELLIS).
Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.
In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.
K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques.
In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.
In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.
Data Mining and Knowledge Discovery is a bimonthly peer-reviewed scientific journal focusing on data mining published by Springer Science+Business Media. It was started in 1996 and launched in 1997 by Usama Fayyad as founding Editor-in-Chief by Kluwer Academic Publishers. The first Editorial provides a summary of why it was started.
Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.
In network theory, link analysis is a data-analysis technique used to evaluate relationships between nodes. Relationships may be identified among various types of nodes (100k), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity, computer security analysis, search engine optimization, market research, medical research, and art.
Foster Provost is an American computer scientist, information systems researcher, and Professor of Data Science, Professor of Information Systems and Ira Rennert Professor of Entrepreneurship at New York University's Stern School of Business. He is also the Director for the Data Science and AI Initiative at Stern's Fubon Center for Technology, Business and Innovation. Professor Provost has a Bachelor of Science from Duquesne University in physics and mathematics and a Master of Science and Ph.D. in computer science from the University of Pittsburgh.
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.
Usama M. Fayyad is an American-Jordanian data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining. He is a speaker on Business Analytics, Data Mining, Data Science, and Big Data. He recently left his role as the chief data officer at Barclays Bank.
Ryszard Stanisław Michalski was a Polish-American computer scientist. Michalski was Professor at George Mason University and a pioneer in the field of machine learning.
Gregory I. Piatetsky-Shapiro is a data scientist and the co-founder of the KDD conferences, and co-founder and past chair of the Association for Computing Machinery SIGKDD group for Knowledge Discovery, Data Mining and Data Science. He is the founder and president of KDnuggets, a discussion and learning website for Business Analytics, Data Mining and Data Science.
Sushmita Mitra is an Indian computer scientist. She is currently a Full Professor (HAG) and a former head of the Machine Intelligence Unit at Indian Statistical Institute, Kolkata. Her research interests include data science, machine learning, bioinformatics, soft computing and medical imaging. She got recognised as a fellow of IEEE for her neuro-fuzzy and hybrid approaches in pattern recognition.
Cynthia Diane Rudin is an American computer scientist and statistician specializing in machine learning and known for her work in interpretable machine learning. She is the director of the Interpretable Machine Learning Lab at Duke University, where she is a professor of computer science, electrical and computer engineering, statistical science, and biostatistics and bioinformatics. In 2022, she won the Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the Association for the Advancement of Artificial Intelligence (AAAI) for her work on the importance of transparency for AI systems in high-risk domains.
Michael R. Berthold is a German computer scientist, entrepreneur, academic and author. He held the chair for bioinformatics and information mining at Konstanz University, and is an honorary professor at Óbuda University. He is also the co-founder of KNIME, and is serving as a president and CEO of KNIME AG since 2017.