Educational data mining

Last updated

Educational data mining (EDM) is a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. [1] In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. [2] The field is closely tied to that of learning analytics, and the two have been compared and contrasted. [3]

Contents

Definition

Educational data mining refers to techniques, tools, and research designed for automatically extracting meaning from large repositories of data generated by or related to people's learning activities in educational settings. [4] Quite often, this data is extensive, fine-grained, and precise. For example, several learning management systems (LMSs) track information such as when each student accessed each learning object, how many times they accessed it, and how many minutes the learning object was displayed on the user's computer screen. As another example, intelligent tutoring systems record data every time a learner submits a solution to a problem. They may collect the time of the submission, whether or not the solution matches the expected solution, the amount of time that has passed since the last submission, the order in which solution components were entered into the interface, etc. The precision of this data is such that even a fairly short session with a computer-based learning environment (e.g. 30 minutes) may produce a large amount of process data for analysis.

In other cases, the data is less fine-grained. For example, a student's university transcript may contain a temporally ordered list of courses taken by the student, the grade that the student earned in each course, and when the student selected or changed his or her academic major. EDM leverages both types of data to discover meaningful information about different types of learners and how they learn, the structure of domain knowledge, and the effect of instructional strategies embedded within various learning environments. These analyses provide new information that would be difficult to discern by looking at the raw data. For example, analyzing data from an LMS may reveal a relationship between the learning objects that a student accessed during the course and their final course grade. Similarly, analyzing student transcript data may reveal a relationship between a student's grade in a particular course and their decision to change their academic major. Such information provides insight into the design of learning environments, which allows students, teachers, school administrators, and educational policy makers to make informed decisions about how to interact with, provide, and manage educational resources.

History

While the analysis of educational data is not itself a new practice, recent advances in educational technology, including the increase in computing power and the ability to log fine-grained data about students' use of a computer-based learning environment, have led to an increased interest in developing techniques for analyzing the large amounts of data generated in educational settings. This interest translated into a series of EDM workshops held from 2000 to 2007 as part of several international research conferences. [5] In 2008, a group of researchers established what has become an annual international research conference on EDM, the first of which took place in Montreal, Quebec, Canada. [6]

As interest in EDM continued to increase, EDM researchers established an academic journal in 2009, the Journal of Educational Data Mining, for sharing and disseminating research results. In 2011, EDM researchers established the International Educational Data Mining Society to connect EDM researchers and continue to grow the field.

With the introduction of public educational data repositories in 2008, such as the Pittsburgh Science of Learning Centre's (PSLC) DataShop and the National Center for Education Statistics (NCES), public data sets have made educational data mining more accessible and feasible, contributing to its growth. [7]

Goals

Ryan S. Baker and Kalina Yacef [8] identified the following four goals of EDM:

  1. Predicting students' future learning behavior – With the use of student modeling, this goal can be achieved by creating student models that incorporate the learner's characteristics, including detailed information such as their knowledge, behaviours and motivation to learn. The user experience of the learner and their overall satisfaction with learning are also measured.
  2. Discovering or improving domain models – Through the various methods and applications of EDM, discovery of new and improvements to existing models is possible. Examples include illustrating the educational content to engage learners and determining optimal instructional sequences to support the student's learning style.
  3. Studying the effects of educational support that can be achieved through learning systems.
  4. Advancing scientific knowledge about learning and learners by building and incorporating student models, the field of EDM research and the technology and software used.

Users and stakeholders

There are four main users and stakeholders involved with educational data mining. These include:

Phases

As research in the field of educational data mining has continued to grow, a myriad of data mining techniques have been applied to a variety of educational contexts. In each case, the goal is to translate raw data into meaningful information about the learning process in order to make better decisions about the design and trajectory of a learning environment. Thus, EDM generally consists of four phases: [2] [5]

  1. The first phase of the EDM process (not counting pre-processing) is discovering relationships in data. This involves searching through a repository of data from an educational environment with the goal of finding consistent relationships between variables. Several algorithms for identifying such relationships have been utilized, including classification, regression, clustering, factor analysis, social network analysis, association rule mining, and sequential pattern mining.
  2. Discovered relationships must then be validated in order to avoid overfitting.
  3. Validated relationships are applied to make predictions about future events in the learning environment.
  4. Predictions are used to support decision-making processes and policy decisions.

During phases 3 and 4, data is often visualized or in some other way distilled for human judgment. [2] A large amount of research has been conducted in best practices for visualizing data.

Main approaches

Of the general categories of methods mentioned, prediction, clustering and relationship mining are considered universal methods across all types of data mining; however, Discovery with Models and Distillation of Data for Human Judgment are considered more prominent approaches within educational data mining. [7]

Discovery with models

In the Discovery with Model method, a model is developed via prediction, clustering or by human reasoning knowledge engineering and then used as a component in another analysis, namely in prediction and relationship mining. [7] In the prediction method use, the created model's predictions are used to predict a new variable. [7] For the use of relationship mining, the created model enables the analysis between new predictions and additional variables in the study. [7] In many cases, discovery with models uses validated prediction models that have proven generalizability across contexts.

Key applications of this method include discovering relationships between student behaviors, characteristics and contextual variables in the learning environment. [7] Further discovery of broad and specific research questions across a wide range of contexts can also be explored using this method.

Distillation of data for human judgment

Humans can make inferences about data that may be beyond the scope in which an automated data mining method provides. [7] For the use of education data mining, data is distilled for human judgment for two key purposes, identification and classification. [7]

For the purpose of identification, data is distilled to enable humans to identify well-known patterns, which may otherwise be difficult to interpret. For example, the learning curve, classic to educational studies, is a pattern that clearly reflects the relationship between learning and experience over time.

Data is also distilled for the purposes of classifying features of data, which for educational data mining, is used to support the development of the prediction model. Classification helps expedite the development of the prediction model, tremendously.

The goal of this method is to summarize and present the information in a useful, interactive and visually appealing way in order to understand the large amounts of education data and to support decision making. [9] In particular, this method is beneficial to educators in understanding usage information and effectiveness in course activities. [9] Key applications for the distillation of data for human judgment include identifying patterns in student learning, behavior, opportunities for collaboration and labeling data for future uses in prediction models. [7]

Applications

A list of the primary applications of EDM is provided by Cristobal Romero and Sebastian Ventura. [5] In their taxonomy, the areas of EDM application are:

New research on mobile learning environments also suggests that data mining can be useful. Data mining can be used to help provide personalized content to mobile users, despite the differences in managing content between mobile devices and standard PCs and web browsers.

New EDM applications will focus on allowing non-technical users use and engage in data mining tools and activities, making data collection and processing more accessible for all users of EDM. Examples include statistical and visualization tools that analyzes social networks and their influence on learning outcomes and productivity. [14]

Courses

  1. In October 2013, Coursera offered a free online course on "Big Data in Education" that taught how and when to use key methods for EDM. [15] This course moved to edX in the summer of 2015, [16] and has continued to run on edX annually since then. A course archive is now available online. [17]
  2. Teachers College, Columbia University offers a MS in Learning Analytics. [18]

Publication venues

Considerable amounts of EDM work are published at the peer-reviewed International Conference on Educational Data Mining, organized by the International Educational Data Mining Society.

EDM papers are also published in the Journal of Educational Data Mining (JEDM).

Many EDM papers are routinely published in related conferences, such as Artificial Intelligence and Education, Intelligent Tutoring Systems, and User Modeling, Adaptation, and Personalization.

In 2011, Chapman & Hall/CRC Press, Taylor and Francis Group published the first Handbook of Educational Data Mining. This resource was created for those that are interested in participating in the educational data mining community. [14]

Contests

In 2010, the Association for Computing Machinery's KDD Cup was conducted using data from an educational setting. [33] The data set was provided by the DataShop, and it consisted of over 1,000,000 data points from students using a cognitive tutor. [34] Six hundred teams competed for over US$8,000 in prize money (which was donated by Facebook). The goal for contestants was to design an algorithm that, after learning from the provided data, would make the most accurate predictions from new data. The winners submitted an algorithm that utilized feature generation (a form of representation learning), random forests, and Bayesian networks. [35]

Costs and challenges

Along with technological advancements are costs and challenges associated with implementing EDM applications. These include the costs to store logged data and the cost associated with hiring staff dedicated to managing data systems. [36] Moreover, data systems may not always integrate seamlessly with one another and even with the support of statistical and visualization tools, creating one simplified version of the data can be difficult. [36] Furthermore, choosing which data to mine and analyze can also be challenging, [36] making the initial stages very time-consuming and labor-intensive. From beginning to end, the EDM strategy and implementation requires one to uphold privacy and ethics [36] for all stakeholders involved.

Criticisms

See also

Related Research Articles

<span class="mw-page-title-main">Data mining</span> Process of extracting and discovering patterns in large data sets

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.

Situated learning is a theory that explains an individual's acquisition of professional skills and includes research on apprenticeship into how legitimate peripheral participation leads to membership in a community of practice. Situated learning "takes as its focus the relationship between learning and the social situation in which it occurs".

A learning management system (LMS) is a software application for the administration, documentation, tracking, reporting, automation, and delivery of educational courses, training programs, materials or learning and development programs. The learning management system concept emerged directly from e-Learning. Learning management systems make up the largest segment of the learning system market. The first introduction of the LMS was in the late 1990s. Learning management systems have faced a massive growth in usage due to the emphasis on remote learning during the COVID-19 pandemic.

Educational technology is the combined use of computer hardware, software, and educational theory and practice to facilitate learning. When referred to with its abbreviation, edtech, it often refers to the industry of companies that create educational technology.

An intelligent tutoring system (ITS) is a computer system that aims to provide immediate and customized instruction or feedback to learners, usually without requiring intervention from a human teacher. ITSs have the common goal of enabling learning in a meaningful and effective manner by using a variety of computing technologies. There are many examples of ITSs being used in both formal education and professional settings in which they have demonstrated their capabilities and limitations. There is a close relationship between intelligent tutoring, cognitive learning theories and design; and there is ongoing research to improve the effectiveness of ITS. An ITS typically aims to replicate the demonstrated benefits of one-to-one, personalized tutoring, in contexts where students would otherwise have access to one-to-many instruction from a single teacher, or no teacher at all. ITSs are often designed with the goal of providing access to high quality education to each and every student.

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications. As such, it encompasses a variety of statistical techniques from predictive modeling and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. It represents a major subset of machine learning applications; in some contexts, it is synonymous with machine learning.

Computer-supported collaborative learning (CSCL) is a pedagogical approach wherein learning takes place via social interaction using a computer or through the Internet. This kind of learning is characterized by the sharing and construction of knowledge among participants using technology as their primary means of communication or as a common resource. CSCL can be implemented in online and classroom learning environments and can take place synchronously or asynchronously.

Academic analytics is defined as the process of evaluating and analyzing organizational data received from university systems for reporting and decision making reasons. Academic analytics will help student and faculty to track their career and professional paths. According to Campbell & Oblinger (2007), accrediting agencies, governments, parents and students are all calling for the adoption of new modern and efficient ways of improving and monitoring student success. This has ushered the higher education system into an era characterized by increased scrutiny from the various stakeholders. For instance, the Bradley review acknowledges that benchmarking activities such as student engagement serve as indicators for gauging the institution's quality.

<span class="mw-page-title-main">Open education</span> Educational movement

Open education is an educational movement founded on openness, with connections to other educational movements such as critical pedagogy, and with an educational stance which favours widening participation and inclusiveness in society. Open education broadens access to the learning and training traditionally offered through formal education systems and is typically offered through online and distance education. The qualifier "open" refers to the elimination of barriers that can preclude both opportunities and recognition for participation in institution-based learning. One aspect of openness or "opening up" education is the development and adoption of open educational resources in support of open educational practices.

Online tutoring is the process of tutoring in an online, virtual, or networked, environment, in which teachers and learners participate from separate physical locations. Aside from space, literature also states that participants can be separated by time.

Adaptive learning, also known as adaptive teaching, is an educational method which uses computer algorithms as well as artificial intelligence to orchestrate the interaction with the learner and deliver customized resources and learning activities to address the unique needs of each learner. In professional learning contexts, individuals may "test out" of some training to ensure they engage with novel instruction. Computers adapt the presentation of educational material according to students' learning needs, as indicated by their responses to questions, tasks and experiences. The technology encompasses aspects derived from various fields of study including computer science, AI, psychometrics, education, psychology, and brain science.

Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. The growth of online learning since the 1990s, particularly in higher education, has contributed to the advancement of Learning Analytics as student data can be captured and made available for analysis. When learners use an LMS, social media, or similar online tools, their clicks, navigation patterns, time on task, social networks, information flow, and concept development through discussions can be tracked. The rapid development of massive open online courses (MOOCs) offers additional data for researchers to evaluate teaching and learning in online environments.

A virtual learning environment (VLE) in educational technology is a web-based platform for the digital aspects of courses of study, usually within educational institutions. They present resources, activities, and interactions within a course structure and provide for the different stages of assessment. VLEs also usually report on participation and have some level of integration with other institutional systems. In North America, VLEs are often referred to as Learning Management Systems (LMS).

<span class="mw-page-title-main">Pedagogical agent</span>

A pedagogical agent is a concept borrowed from computer science and artificial intelligence and applied to education, usually as part of an intelligent tutoring system (ITS). It is a simulated human-like interface between the learner and the content, in an educational environment. A pedagogical agent is designed to model the type of interactions between a student and another person. Mabanza and de Wet define it as "a character enacted by a computer that interacts with the user in a socially engaging manner". A pedagogical agent can be assigned different roles in the learning environment, such as tutor or co-learner, depending on the desired purpose of the agent. "A tutor agent plays the role of a teacher, while a co-learner agent plays the role of a learning companion".

Vincent Aleven is a professor of human-computer interaction and director of the undergraduate program at Carnegie Mellon University's Human–Computer Interaction Institute.

<span class="mw-page-title-main">Gamification of learning</span> Educational approach aiming to promote learning by using video game design and game elements

The gamification of learning is an educational approach that seeks to motivate students by using video game design and game elements in learning environments. The goal is to maximize enjoyment and engagement by capturing the interest of learners and inspiring them to continue learning. Gamification, broadly defined, is the process of defining the elements which comprise games, make those games fun, and motivate players to continue playing, then using those same elements in a non-game context to influence behavior. In other words, gamification is the introduction of game elements into a traditionally non-game situation.

Marketing simulation games provide participants with an interactive method of testing out marketing decisions in an environment which is virtual or which has game characteristics. Common game topics belong to categories such as: marketing strategy, product positioning, pricing strategies, consumer behaviour. Marketing games usually focus on the marketing landscape of a certain business industry or a company. A marketing simulation game usually contains a number of scenarios and provides participants with results in response to their decisions.

<span class="mw-page-title-main">Learning engineering</span> Interdisciplinary academic field

Learning Engineering is the systematic application of evidence-based principles and methods from educational technology and the learning sciences to create engaging and effective learning experiences, support the difficulties and challenges of learners as they learn, and come to better understand learners and learning. It emphasizes the use of a human-centered design approach in conjunction with analyses of rich data sets to iteratively develop and improve those designs to address specific learning needs, opportunities, and problems, often with the help of technology. Working with subject-matter and other experts, the Learning Engineer deftly combines knowledge, tools, and techniques from a variety of technical, pedagogical, empirical, and design-based disciplines to create effective and engaging learning experiences and environments and to evaluate the resulting outcomes. While doing so, the Learning Engineer strives to generate processes and theories that afford generalization of best practices, along with new tools and infrastructures that empower others to create their own learning designs based on those best practices.

e-khool LMS is a learning management system suitable for online engineering education and schools. The LMS system is AI-based learning platform with 5 patents, hosted on Amazon AWS, and meets North America security standards.

References

  1. "EducationalDataMining.org". 2013. Retrieved 2013-07-15.
  2. 1 2 3 R. Baker (2010) Data Mining for Education. In McGaw, B., Peterson, P., Baker, E. (Eds.) International Encyclopedia of Education (3rd edition), vol. 7, pp. 112-118. Oxford, UK: Elsevier.
  3. G. Siemens, R.S.j.d. Baker (2012). "Learning analytics and educational data mining: Towards communication and collaboration". Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. pp. 252–254. doi:10.1145/2330601.2330661. ISBN   9781450311113. S2CID   207196058.
  4. "educationaldatamining.org" . Retrieved 2020-11-14.
  5. 1 2 3 C. Romero, S. Ventura. Educational Data Mining: A Review of the State-of-the-Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 40(6), 601-618, 2010.
  6. "http://educationaldatamining.org/EDM2008/" Retrieved 2013-09-04
  7. 1 2 3 4 5 6 7 8 9 Baker, Ryan. "Data Mining for Education" (PDF). oxford, UK: Elsevier. Retrieved 9 February 2014.
  8. 1 2 Baker, R.S.; Yacef, K (2009). "The state of educational data mining in 2009: A review and future visions". JEDM-Journal of Educational Data Mining. 1 (1): 2017.
  9. 1 2 3 4 5 6 Romero, Cristobal; Ventura, Sebastian (January–February 2013). "WIREs Data Mining Knowl Discov". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 3 (1): 12–27. doi:10.1002/widm.1075. S2CID   18019486.
  10. Romero, Cristobal; Ventura, Sebastian (2007). "Educational data mining: A survey from 1995 to 2005". Expert Systems with Applications. 33 (1): 135–146. doi:10.1016/j.eswa.2006.04.005.
  11. "Assessing the Economic Impact of Copyright Reform in the Area of Technology-Enhanced Learning". Industry Canada. Archived from the original on 13 April 2014. Retrieved 6 April 2014.
  12. Azarnoush, Bahareh, et al. "Toward a Framework for Learner Segmentation." JEDM-Journal of Educational Data Mining 5.2 (2013): 102-126.
  13. 1 2 U.S. Department of Education, Office of Educational Technology. "Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief" (PDF). Archived from the original (PDF) on 11 June 2014. Retrieved 30 March 2014.
  14. 1 2 Romero, C.; Ventura, S.; Pechenizkiy, M.; Baker, R. S. (2010). Handbook of educational data mining. CRC Press.
  15. "Big Data in Education". Coursera. Retrieved 30 March 2014.
  16. "Big Data in Education". edXedxed. Retrieved 13 October 2015.
  17. "Big Data in Education" . Retrieved 17 July 2018.
  18. "Learning Analytics | Teachers College Columbia University". www.tc.columbia.edu. Retrieved 2015-10-13.
  19. "Home". www.educationaldatamining.org. Retrieved 1 July 2022.
  20. "EDM'09 - Home". www.educationaldatamining.org. Retrieved 1 July 2022.
  21. "EDM2010". 2011-10-20. Archived from the original on 20 October 2011. Retrieved 2022-07-02.
  22. "EDM2011". 20 October 2011. Archived from the original on 9 May 2021. Retrieved 1 July 2022.
  23. "EDM2012". Archived from the original on 8 May 2013. Retrieved 1 July 2022.
  24. "EDM2013". Archived from the original on 29 December 2013. Retrieved 1 July 2022.
  25. "EDM2014". Archived from the original on 30 January 2014. Retrieved 1 July 2022.
  26. "EDM2015". Archived from the original on 8 October 2014. Retrieved 1 July 2022.
  27. "EDM2016". Archived from the original on 13 May 2022. Retrieved 1 July 2022.
  28. "EDM2017". Archived from the original on 30 April 2017. Retrieved 1 July 2022.
  29. "EDM2018". Archived from the original on 13 May 2022. Retrieved 1 July 2022.
  30. "EDM2019". Archived from the original on 13 May 2022. Retrieved 1 July 2022.
  31. "EDM2020". Archived from the original on 22 January 2022. Retrieved 1 July 2022.
  32. "EDM2021". Archived from the original on 15 August 2021. Retrieved 1 July 2022.
  33. "KDD Cup 2010". KDD. Archived from the original on 15 July 2010. Retrieved 1 July 2022.
  34. "PLCS DataShop". DataShop. Archived from the original on 26 June 2010. Retrieved 1 July 2022.
  35. Yu, Hsaing-Fu; Lin, Chih-Jen; Lin, Hsuan-Tien; Lin, Shou-De; Wei, Yin-Hsuan; Weng, Jui-Yu; Change, Chun-Fu; Yan, En-Syu; McKenzie, Todd; Lou, Jing-Kai; Hsieh, Hsun-Ping (2010). "Feature Engineering and Classifier Ensemble for KDD Cup 2010" (PDF). DataShop. Archived from the original (PDF) on 3 March 2022. Retrieved 1 July 2022.
  36. 1 2 3 4 "How Can Educational Data Mining and Learning Analytics Improve and Personalize Education?". EdTechReview. 18 June 2013. Retrieved 9 April 2014.