Geoff Webb

Last updated

Geoffrey I. Webb
CitizenshipAustralia
AwardsInaugural Eureka Prize for Excellence in Data Science, 2017
IEEE Fellow
Pacific-Asia Conference on Knowledge Discovery and Data Mining Distinguished Research Contributions Award, 2022
Australian Computer Society ICT Researcher of the Year Award 2016
IEEE International Conference on Data Mining Outstanding Service Award, 2013
Australian Research Council Outstanding Researcher Award, 2014
Scientific career
FieldsData Science
Computer Science
Artificial Intelligence
Machine Learning
Computational Biology
Institutions Monash University Department of Data Science and Artificial Intelligence
Notable studentsYing Yang
Jiangning Song
Chang Wei Tan

Geoffrey I. Webb (also known as Geoff Webb) is Professor of Computer Science at Monash University, founder and director of Data Mining software development and consultancy company G. I. Webb and Associates, [1] and former editor-in-chief of the journal Data Mining and Knowledge Discovery . [2] Before joining Monash University he was on the faculty at Griffith University from 1986 to 1988 and then at Deakin University from 1988 to 2002.

Webb has published more than 280 scientific papers in the fields of machine learning, data science, data mining, data analytics, time series analytics, big data, bioinformatics and user modeling. [3] He is an editor of the Encyclopedia of Machine Learning. [4]

Webb created the Averaged One-Dependence Estimators (AODE) machine learning algorithm [5] and its generalization Averaged N-Dependence Estimators (ANDE) [6] and has worked extensively on statistically sound association rule learning. [7] [8] [9] [10] His early work included advocating the use of machine learning to create black box user models; [11] interactive machine learning; [12] [13] decision tree grafting; [14] and one of the first approaches to association rule learning using minimum support and confidence to find the rules for the first associative classifier, FBM. [15] He has developed multiple novel approaches to time series classification. [16] [17] [18] He has worked on diverse problems including concept drift, [19] scalable learning of graphical models, [20] human in the loop machine learning, [21] computational protein biology. [22]

Webb's awards include inaugural Eureka Prize for Excellence in Data Science, 2017, [23] IEEE Fellow, [24] Pacific-Asia Conference on Knowledge Discovery and Data Mining Distinguished Research Contributions Award, 2022, [25] Australian Computer Society ICT Researcher of the Year Award 2016, [26] the IEEE International Conference on Data Mining Outstanding Service Award, 2013 [27] an Australian Research Council Outstanding Researcher Award, 2014 [28] and multiple Australian Research Council Discovery Grants. [29] He has twice been recognised by The Australian Research Magazine as Australia's leading Bioinformatics and Computational Biology researcher. [30] [31]

Webb is a foundation member of the editorial advisory board of the journal Statistical Analysis and Data Mining. [32] He has served on the Editorial Boards of the journals Machine Learning, ACM Transactions on Knowledge Discovery in Data, User Modeling and User Adapted Interaction, and Knowledge and Information Systems.

Webb was elected to the ACM Special Interest Group on Knowledge Discovery and Data Mining Executive Committee in 2017. [33]

Related Research Articles

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.

<span class="mw-page-title-main">Dynamic time warping</span> An algorithm for measuring similarity between two temporal sequences, which may vary in speed

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

<span class="mw-page-title-main">Association rule learning</span> Method for discovering interesting relations between variables in databases

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

<span class="mw-page-title-main">Weka (machine learning)</span>

Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.

<span class="mw-page-title-main">Active learning (machine learning)</span>

Active learning is a special case of machine learning in which a learning algorithm can interactively query a user to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle.

<span class="mw-page-title-main">David Hand (statistician)</span> British statistician

David John Hand is a British statistician. His research interests include multivariate statistics, classification methods, pattern recognition, computational statistics and the foundations of statistics. He has written technical books on statistics, data mining, finance, classification methods, and measuring wellbeing, as well as science popularisation books including The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day; Dark Data: Why What You Don’t Know Matters; and Statistics: A Very Short Introduction. In 1991 he launched the journal Statistics and Computing, which is now celebrating its third decade.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.

Longbing Cao is an AI and data science researcher at the University of Technology Sydney, Australia. His broad research interest involves artificial intelligence, data science, behavior informatics, and their enterprise applications.

An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".

Frequent pattern discovery is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets that appear in a data set with frequency no less than a user-specified or auto-determined threshold.

In machine learning and data mining, quantification is the task of using supervised learning in order to train models (quantifiers) that estimate the relative frequencies of the classes of interest in a sample of unlabelled data items. For instance, in a sample of 100,000 unlabelled tweets known to express opinions about a certain political candidate, a quantifier may be used to estimate the percentage of these 100,000 tweets which belong to class `Positive', and to do the same for classes `Neutral' and `Negative'.

References

  1. "G. I. Webb and Associates"
  2. "Data Mining and Knowledge Discovery Journal" Retrieved on 2013-10-20.
  3. Geoff Webb publications indexed by Google Scholar
  4. "Encyclopedia of Machine Learning"
  5. Webb, Geoffrey; J. Boughton; Z. Wang (2005). "Not So Naive Bayes: Aggregating One-Dependence Estimators". Machine Learning. 58 (1): 5–24. CiteSeerX   10.1.1.3.7847 . doi:10.1007/s10994-005-4258-6. S2CID   13148847.
  6. Webb, Geoffrey; J. Boughton; F. Zheng; K.M. Ting; H. Salem (2012). "Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification". Machine Learning. 86 (2): 233–272. doi: 10.1007/s10994-011-5263-6 .
  7. Webb, Geoffrey (2007). "Discovering Significant Patterns". Machine Learning. 68 (1): 1–33. doi: 10.1007/s10994-007-5006-x .
  8. Webb, Geoffrey (2008). "Layered Critical Values: A Powerful Direct-Adjustment Approach to Discovering Significant Patterns". Machine Learning. 71 (2–3): 307–323. doi: 10.1007/s10994-008-5046-x .
  9. Webb, Geoffrey (2010). "Self-Sufficient Itemsets: An Approach to Screening Potentially Interesting Associations Between Items". Transactions on Knowledge Discovery from Data. 4: 3:1–3:20. doi:10.1145/1644873.1644876. S2CID   774593.
  10. Webb, Geoffrey (2011). "Filtered-top-k Association Discovery". WIREs Data Mining and Knowledge Discovery. 1 (3): 183–192. CiteSeerX   10.1.1.228.2541 . doi:10.1002/widm.28. S2CID   14839879.
  11. Webb, Geoffrey; M. Kuzmycz (1996). "Feature based modelling: a methodology for producing coherent, consistent, dynamically changing models of agents' competencies". User Modeling and User-Adapted Interaction. 5 (2): 117–150. doi:10.1007/BF01099758. S2CID   12003265.
  12. Webb, Geoffrey (1996). "Integrating Machine Learning With Knowledge Acquisition Through Direct Interaction With Domain Experts". Knowledge-Based Systems. 9 (4): 253–266. CiteSeerX   10.1.1.228.3037 . doi:10.1016/0950-7051(96)01033-7.
  13. Webb, Geoffrey; J. Wells; Z. Zheng (1999). "An Experimental Evaluation of Integrating Machine Learning with Knowledge Acquisition". Machine Learning. 35 (1): 5–14. doi: 10.1023/A:1007504102006 .
  14. Webb, Geoffrey (1996). "Further Experimental Evidence Against The Utility Of Occam's Razor". Journal of Artificial Intelligence Research. 4: 397–417. doi: 10.1613/jair.228 . S2CID   6088084.
  15. Webb, Geoffrey (1989). "A Machine Learning Approach to Student Modelling" (PDF). Proceedings of the Third Australian Joint Conference on Artificial Intelligence (AI 89). pp. 195–205.
  16. Dempster, Angus; F. Petitjean; G. Webb (2020). "ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels". Data Mining and Knowledge Discovery. 34 (5): 1454–1495. arXiv: 1910.13051 . doi:10.1007/s10618-020-00701-z. S2CID   204949593.
  17. Fawaz, Hassan; B. Lucas; G. Forestier; C. Pelletier; D. Schmidt; J. Weber; G. Webb; L. Idoumghar; P. Muller; F. Petitjean (2020). "InceptionTime: Finding AlexNet for Time Series Classification". Data Mining and Knowledge Discovery. 34 (5): 1936–1962. arXiv: 1909.04939 . doi:10.1007/s10618-020-00710-y. S2CID   202572652.
  18. Shifaz, Ahmed; C. Pelletier; F. Petitjean; G. Webb (2020). "TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification". Data Mining and Knowledge Discovery. 34 (3): 742–775. arXiv: 1906.10329 . doi:10.1007/s10618-020-00679-8. S2CID   195584256.
  19. "Concept Drift: Learning From Non-Stationary Distributions"
  20. "Scalable Graphical Modeling"
  21. "Interactive machine learning and data analytics"
  22. "Computational Biology"
  23. "Eureka Prize Winners 2017"
  24. "IEEE Fellows 2015"
  25. "PAKDD 2022 Awards"
  26. ACS Digital Disruptors Awards Winners 2016
  27. ""IEEE Data Mining Awards"". Archived from the original on 18 August 2017. Retrieved 20 October 2013.
  28. Discovery Projects Funding Outcomes for Projects Commencing in 2014
  29. ""Discovery Projects Funding Outcomes"". Archived from the original on 23 October 2013. Retrieved 20 October 2013.
  30. "Deep Dive Into Research". No. 2021. The Australian. 10 November 2021. Retrieved 10 November 2022.
  31. "Our top researchers in Field Leaders Engineering & Computer Science". No. 2023. The Australian. 9 November 2022. Retrieved 10 November 2022.
  32. Statistical Analysis and Data Mining Editorial Board
  33. About SIGKDD