This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the corresponding foundations, frameworks, algorithms, models, architectures, and evaluation systems for actionable knowledge discovery. [1] [2]
Data-driven pattern mining and knowledge discovery in databases [3] face such challenges that the discovered outputs are often not actionable. In the era of big data, how to effectively discover actionable insights from complex data and environment is critical. A significant paradigm shift is the evolution from data-driven pattern mining to domain-driven actionable knowledge discovery. [4] [5] [6] Domain driven data mining is to enable the discovery and delivery of actionable knowledge and actionable insights.
Domain driven data mining has attracted significant attention from both academic and industry. There was a workshop series on domain driven data mining during 2007-2014 with the IEEE International Conference on Data Mining and a special issue published by the IEEE Transactions on Knowledge and Data Engineering. [7] There are also various new research problems and challenges in the last decade, where the incorporation of domain knowledge into data mining processes and models, such as deep neural networks, graph embedding, text mining, and reinforcement learning, is critically important. [8] [9]
Actionable knowledge refers to the knowledge that can inform decision-making actions and be converted to decision-making actions. [5] [10] The actionability of data mining and machine learning findings, also called knowledge actionability, refers to the satisfaction of both technical (statistical) and business-oriented evaluation metrics or measures in terms of objective [11] [12] and/or subjective [13] perspectives. The research and innovation on actionable knowledge discovery can be deemed a paradigm shift from knowledge discovery from data to actionable knowledge discovery and delivery [14] [15] by mining complex data for complex knowledge in either a multi-feature, multi-source, or multi-method scenario. [16]
Actionable insight enables accurate and in-depth understanding of things or objects and their characteristics, events, stories, occurrences, patterns, exceptions, and evolution and dynamics hidden in the data world and corresponding decision-making actions on top of the insights. Actionable knowledge may disclose actionable insights.
Software architecture is the set of structures needed to reason about a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations.
In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour", or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in Artificial General Intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy.
Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.
In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.
Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.
Software analytics is the analytics specific to the domain of software systems taking into account source code, static and dynamic characteristics as well as related processes of their development and evolution. It aims at describing, monitoring, predicting, and improving the efficiency and effectiveness of software engineering throughout the software lifecycle, in particular during software development and software maintenance. The data collection is typically done by mining software repositories, but can also be achieved by collecting user actions or production data.
Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other.
Usama M. Fayyad is an American-Jordanian data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining. He is a speaker on Business Analytics, Data Mining, Data Science, and Big Data. He recently left his role as the Chief Data Officer at Barclays Bank.
Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.
Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.
Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".
Data mining, the process of discovering patterns in large data sets, has been used in many applications.
Behavior informatics (BI) is the informatics of behaviors so as to obtain behavior intelligence and behavior insights. BI is a research method combining science and technology, specifically in the area of engineering. The purpose of BI includes analysis of current behaviors as well as the inference of future possible behaviors. This occurs through pattern recognition.
Longbing Cao is an AI and data science researcher at the University of Technology Sydney, Australia. His broad research interest involves artificial intelligence, data science, behavior informatics, and their enterprise applications.
Agent mining is an interdisciplinary area that synergizes multiagent systems with data mining and machine learning.
Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics.
An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".
Gautam Das is a computer scientist in the field of databases research. He is an ACM Fellow and IEEE Fellow.
Martin Ester is a Canadian-German Full Professor of Computing Science at Simon Fraser University. His research focuses on researcher data mining and machine learning.
Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.