Domain driven data mining

Last updated

Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the corresponding foundations, frameworks, algorithms, models, architectures, and evaluation systems for actionable knowledge discovery. [1] [2]

Contents

Data-driven pattern mining and knowledge discovery in databases [3] face such challenges that the discovered outputs are often not actionable. In the era of big data, how to effectively discover actionable insights from complex data and environment is critical. A significant paradigm shift is the evolution from data-driven pattern mining to domain-driven actionable knowledge discovery. [4] [5] [6] Domain driven data mining is to enable the discovery and delivery of actionable knowledge and actionable insights.

Domain driven data mining has attracted significant attention from both academic and industry. There was a workshop series on domain driven data mining during 2007-2014 with the IEEE International Conference on Data Mining and a special issue published by the IEEE Transactions on Knowledge and Data Engineering. [7] There are also various new research problems and challenges in the last decade, where the incorporation of domain knowledge into data mining processes and models, such as deep neural networks, graph embedding, text mining, and reinforcement learning, is critically important. [8] [9]

Actionable knowledge

Actionable knowledge refers to the knowledge that can inform decision-making actions and be converted to decision-making actions. [5] [10] The actionability of data mining and machine learning findings, also called knowledge actionability, refers to the satisfaction of both technical (statistical) and business-oriented evaluation metrics or measures in terms of objective [11] [12] and/or subjective [13] perspectives. The research and innovation on actionable knowledge discovery can be deemed a paradigm shift from knowledge discovery from data to actionable knowledge discovery and delivery [14] [15] by mining complex data for complex knowledge in either a multi-feature, multi-source, or multi-method scenario. [16]

Actionable insight

Actionable insight enables accurate and in-depth understanding of things or objects and their characteristics, events, stories, occurrences, patterns, exceptions, and evolution and dynamics hidden in the data world and corresponding decision-making actions on top of the insights. Actionable knowledge may disclose actionable insights.

Related Research Articles

<span class="mw-page-title-main">Software architecture</span> High level structures of a software system

Software architecture is the set of structures needed to reason about a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations.

In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour", or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in Artificial General Intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

Software analytics is the analytics specific to the domain of software systems taking into account source code, static and dynamic characteristics as well as related processes of their development and evolution. It aims at describing, monitoring, predicting, and improving the efficiency and effectiveness of software engineering throughout the software lifecycle, in particular during software development and software maintenance. The data collection is typically done by mining software repositories, but can also be achieved by collecting user actions or production data.

Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary. A reasonable definition for the similarity between two objects is how difficult it is to transform them into each other.

<span class="mw-page-title-main">Usama Fayyad</span> American computer scientist

Usama M. Fayyad is an American-Jordanian data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining. He is a speaker on Business Analytics, Data Mining, Data Science, and Big Data. He recently left his role as the Chief Data Officer at Barclays Bank.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.

<span class="mw-page-title-main">Author name disambiguation</span>

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Data mining, the process of discovering patterns in large data sets, has been used in many applications.

Behavior informatics (BI) is the informatics of behaviors so as to obtain behavior intelligence and behavior insights. BI is a research method combining science and technology, specifically in the area of engineering. The purpose of BI includes analysis of current behaviors as well as the inference of future possible behaviors. This occurs through pattern recognition.

Longbing Cao is an AI and data science researcher at the University of Technology Sydney, Australia. His broad research interest involves artificial intelligence, data science, behavior informatics, and their enterprise applications.

Agent mining is an interdisciplinary area that synergizes multiagent systems with data mining and machine learning.

Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics.

An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".

<span class="mw-page-title-main">Gautam Das (computer scientist)</span> Indian computer scientist

Gautam Das is a computer scientist in the field of databases research. He is an ACM Fellow and IEEE Fellow.

<span class="mw-page-title-main">Martin Ester</span>

Martin Ester is a Canadian-German Full Professor of Computing Science at Simon Fraser University. His research focuses on researcher data mining and machine learning.

<span class="mw-page-title-main">Spatial embedding</span>

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

References

  1. Cao, L.; Zhao, Y.; Yu, P.; Zhang, C. (2010). Domain Driven Data Mining. Springer. ISBN   978-1-4419-5737-5.
  2. Zhang, C.; Yu, P. S.; Bell, D. (June 2010). "IEEE TKDE Special Issue on Domain-driven Data Mining". IEEE Transactions on Knowledge and Data Engineering. 22 (6): 753–754. doi:10.1109/TKDE.2010.74. S2CID   29503757.
  3. Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. (1996). "From Data Mining to Knowledge Discovery in Databases". AI Magazine. 17 (3): 37–54.
  4. Fayyad, U.; et al. (2003). "Summary from the KDD-03 Panel—Data Mining: The Next 10 Years". ACM SIGKDD Explorations Newsletter. 5 (2): 191–196. doi:10.1145/980972.981004. S2CID   37284526.
  5. 1 2 Cao, L.; Zhang, C.; Yang, Q.; Bell, D.; Vlachos, M.; Taneri, B.; Keogh, E.; Yu, P.; Zhong, N.; et al. (2007). "Domain-Driven, Actionable Knowledge Discovery". IEEE Intelligent Systems. 22 (4): 78–89. doi:10.1109/MIS.2007.67. S2CID   15928505.
  6. Fayyad, U.; Smyth, P. (1996). "From Data Mining to Knowledge Discovery: An Overview". Advances in Knowledge Discovery and Data Mining, (U. Fayyad and P. Smyth, Eds.): 1–34.
  7. "DDDM".
  8. "International Workshop on Domain-driven Data Mining (DDDM)".
  9. "International Journal of Data Science and Analytics".
  10. Yang, Q.; et al. (2007). "Extracting Actionable Knowledge from Decision Trees". IEEE Trans. Knowledge and Data Engineering. 19 (1): 43–56. doi:10.1109/TKDE.2007.250584. S2CID   18053232.
  11. Hilderman, R.; Hamilton, H. (2000). "Applying Objective Interestingness Measures in Data Mining Systems". Pkdd2000: 432–439.
  12. Freitas, A. (1998). "On Objective Measures of Rule Surprisingness". Proc. European Conf. Principles and Practice of Knowledge Discovery in Databases: 1–9.
  13. Liu, B. (2000). "Analyzing the Subjective Interestingness of Association Rules". IEEE Intelligent Systems. 15 (5): 47–55. doi:10.1109/5254.889106.
  14. Longbing Cao, Yanchang Zhao, Huaifeng Zhang, Dan Luo, Chengqi Zhang. Flexible Frameworks for Actionable Knowledge Discovery, IEEE Trans. on Knowledge and Data Engineering, 22(9): 1299-1312, 2010
  15. Longbing Cao. Actionable Knowledge Discovery and Delivery, WIREs Data Mining and Knowledge Discovery, 2(2): 149-163, 2012
  16. Longbing Cao. Combined Mining: Analyzing Object and Pattern Relations for Discovering and Constructing Complex but Actionable Patterns, WIREs Data Mining and Knowledge Discovery, 3(2): 140-155, 2013