Matthias von Davier | |
---|---|
Occupation(s) | Psychometrician, academic, inventor, and author |
Awards | ETS Scientist Award, Educational Testing Service (2006) Bradley Hanson Award for Contributions to Educational Measurement, National Council on Measurement in Education (2012) Award for Significant Contribution to Educational Measurement and Research Methodology, American Educational Research Association (AERA) (2017) |
Academic background | |
Education | Masters in Psychology Dr. rer. nat. |
Alma mater | Kiel University |
Academic work | |
Institutions | TIMSS &PIRLS International Study Center,Lynch School of Education and Human Development,Boston College |
Matthias von Davier is a psychometrician,academic,inventor,and author. He is the executive director of the TIMSS &PIRLS International Study Center at the Lynch School of Education and Human Development and the J. Donald Monan,S.J.,University Professor in Education at Boston College. [1]
von Davier's research focuses on developing advanced psychometric models and methodologies for analyzing complex educational and survey data. He has authored and co-authored more than 130 research articles,chapters,and research reports,along with six books,including Advancing Human Assessment,which is a part of the series Methodology of Educational Measurement and Assessment,co-edited by him. [2] Additionally,he is the recipient of numerous awards such as the 2006 ETS Research Scientist award, [3] the 2012 National Council on Measurement in Education (NCME) Brad Hanson Award for Contributions to Educational Measurement, [4] and the AERA Division-D 2017 Award for Significant Contribution to Measurement and Research Methodology via his book Handbook of International Large-Scale Assessment. [5]
von Davier has been a Fellow of the American Educational Research Association (AERA) since 2021 [6] and elected National Academy of Education member since 2022. [7] He has served as the editor of two leading scientific journals,the British Journal of Mathematical and Statistical Psychology and Psychometrika , [8] and was one of the two founding editors of the Springer journal Large-Scale Assessments in Education,which is a joint publication of the IEA and ETS. [9] He has also been invited as a keynote speaker for the Anne Anastasi lecture at Fordham University, [10] the 9th IEA International Research Conference in Dubai, [11] The Cross Straights Conference on Educational Measurement in Nanchang,China,the International Meeting of the Psychometric Society,the University of Connecticut, [12] Ludwig Maximilian University of Munich, [13] and the Organisation for Economic Co-operation and Development. [14]
von Davier obtained a master's degree in psychology with honors from the Faculty of Mathematics and Science (Mathematisch-Naturwissenschaftlichen Fakultät) at CAU Kiel University in 1993. Subsequently,he completed a Doctoral degree (Dr. rer. nat.) in psychology from the same faculty in 1996. [15]
von Davier's career began as an Assistant Research Scientist at the Institute for Science Education (IPN) at Kiel University. He then was awarded a Postdoctoral Fellowship at Educational Testing Service (ETS) in Princeton,NJ,where he developed item fit measures for complex IRT models. He moved to the role of Research Scientist in the Center for Global Assessment at ETS,Princeton,from November 2000 to April 2004. [16]
In 2004,von Davier became a Senior Research Scientist at the Center for Global Assessment in Princeton,where he led initiatives focused on evaluating outcomes-based models. Transitioning to various roles within the Educational Testing Service (ETS),he assumed responsibilities as a Senior Research Scientist in 2007 while also serving as Technical Director for the National Assessment of Educational Progress (NAEP) Task Order Component and managing the Virtual Research Laboratory at ETS/IEA Research Institute. Among other professional appointments,he stepped into the role of Principal Research Scientist in May 2007. [17]
von Davier was appointed Director of Research at the Center for Global Assessment in June 2011,overseeing international survey assessment research and leading the ETS Research Initiative as a Co-Leader. Since September 2013,he has served as co-director at the Center for Global Assessment,concurrently holding the position of Senior Research Director since October 2014. In January 2017,he assumed the position of Distinguished Research Scientist at the National Board of Medical Examiners (NBME) in Philadelphia. [18] He is the executive director at the TIMSS &PIRLS International Study Center in the Lynch School of Education and Human Development at Boston College since September 2020,alongside his role as the J. Donald Monan,S.J.,University Professor in Education at the same institution. [19]
von Davier's areas of study include item response theory (IRT),latent class analysis,and diagnostic classification models,with a broader emphasis on classification and mixture distribution models,computational statistics,person-fit,item-fit,model checking,and hierarchical model extensions for categorical data analysis. [20]
Focusing on psychometric methodologies,von Davier's quantitative methodological research has received several patents. [21] [22] [23] [24]
von Davier's work in psychometrics has centered around model development,model fit,and estimation methods,including parallel computation and estimation of latent variable models in complex data collection designs. Key examples include his contributions to model extensions around the Rasch Model,such as Conditional Maximum Likelihood Estimation of various Polytomous Rasch Models,Extensions of Mixture Distribution Rasch models,and polytomous HYBRID models. [25] He has worked on Fit Assessment in Latent Variable Models,encompassing Person,Item,and Model Fit Assessment. Among other contributions,the General Diagnostic Model [26] is considered a flexible diagnostic classification model for both binary and polytomous data,as well as for binary and polytomous ordinal attributes. His work also includes the Parallel-E,Parallel-M algorithm. [27] A fundamental result on intelligence testing practices using discontinue rules was derived by von Davier and collaborators [28]
He has also developed models that integrate information on achievement,non-response,and process data,including extensions of the speed-accuracy model. [29] [30] Additionally,his research delved into the use of Artificial Intelligence in automated item generation and automated scoring. [31] [32] [33]
von Davier's applied research has focused on utilizing psychometric methods in international large-scale assessment. In his roles at ETS and Boston College,he led the psychometric work on transitioning the PIAAC 2012,the PISA 2015,the TIMSS 2019 and the PIRLS 2021 from a paper-based to a computer-based trendline using mode effect models with data from studies that were designed to align results from paper and computer-based assessments. [34]
Another line of his research has concerned the more general issue of linking in large-scale educational assessments. [35] [36]
A third line of von Davier's research has discussed the response styles and correcting for survey response bias in self-reports. The applications range from mixture models for personality data [37] [38] to the pitfalls of attempts to correct response bias by anchoring vignettes. [39] More recently,his research's focal point was the use of process data in assessment to improve achievement estimation and contextualize assessment results. [29] [30] [40]
von Davier has authored and co-authored over 150 publications in peer-reviewed journals,edited books,monographs,and research report series. His h-index is 52. He co-edited several books on topics ranging from Latent Variable Models in Psychometrics to International Large-Scale Assessments and NLP in Assessment. [41]
von Davier's first book,Multivariate and Mixture Distribution Rasch Models:Extensions and Applications,explored the advanced applications and extensions of the Rasch model across various disciplines,including education,psychology,health sciences. Allan S. Cohen commented in the Journal of the American Statistical Association,"This book,published in honor of the retirement of Jürgen Rost,is an edited volume of 22 invited chapters written by eminent researchers in the field of item response theory (IRT)." [42] His next book,The Role of International Large-Scale Assessments:Perspectives from Technology,Economy,and Educational Research,published in 2012,discussed the significance of large-scale international assessments as catalysts for change in understanding the role of human capital distribution,impacting policy,education,and research. In 2013,he co-edited the Handbook of International Large-Scale Assessment:Background,Technical Issues,and Methods of Data Analysis,with Leslie Rutkowski and David Rutkowski,which explored the methodology,technical details,and policy implications of International Large-Scale Assessments (ILSA) in education. Terry Ackerman remarked,"This book is an excellent resource and guide to international large-scale assessments or ILSAs. The three editors have done an excellent job identifying a group of prominent scholars whose expertise ranges from international testing and behavioral statistics to educational policy." [43]
Alongside Randy E. Bennett,von Davier published Advancing Human Assessment:The Methodological,Psychological and Policy Contributions of ETS in 2017,detailing the advancements in human assessment made by ETS,covering measurement and statistics,education policy,psychology,and the development of widely used educational surveys and methodologies. Building upon this exploration of assessment methodologies,he co-edited Advancing Natural Language Processing in Educational Assessment in 2023 with Victoria Yaneva,which looked into the implementation,benefits,and challenges of using NLP in educational testing and assessment. In addition,his book,Handbook of Diagnostic Classification Models:Models and Model Extensions,Applications,Software Packages,provided an overview of diagnostic classification models (DCMs),discussing their development,application,and advantages in offering detailed evaluations of test taker performance across multiple skill domains compared to traditional assessment models. Yu Bao reviewed the book and stated,"The Handbook of Diagnostic Classification Models serves as a reference book that consists of a comprehensive collection of the majority of research topics and a summary of the influential publications within recent decades." [44]
In his highly cited studies,von Davier wrote the practices researchers can use for analyzing and reporting data from large-scale international assessments,addressing common issues and statistical complexities to ensure unbiased results. [45] He emphasized the importance of correctly using plausible values in large-scale survey data analysis to avoid biased estimates and underscored the need to follow established procedures and guidelines. [46] Additionally,he presented a diagnostic model for multidimensional skill profiles using maximum likelihood techniques,demonstrated its application with simulated and real data, [47] and introduced general diagnostic models (GDMs) for estimating skill profiles,suitable for polytomous data and missing responses,with a focus on TOEFL Internet-based testing (iBT) field test data. [48] In related research,he showed that the G-DINA and LCDM approaches to diagnostic modeling are special cases of the GDM. [49] Some of his later research focused on large language models,recurrent neural networks,and other so-called AI methods and how they can be used in automated item generation,automated scoring,and other applications in large-scale educational assessment. [31] [32] [33]
Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.
Educational Testing Service (ETS), founded in 1947, is the world's largest private educational testing and assessment organization. It is headquartered in Lawrence Township, New Jersey, but has a Princeton address.
In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.
A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.
A questionnaire is a research instrument that consists of a set of questions for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of close-ended questions and open-ended questions. Open-ended, long-term questions offer the respondent the ability to elaborate on their thoughts. The Research questionnaire was developed by the Statistical Society of London in 1838.
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.
The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. For example, the model is applicable to the use of Likert scales, rating scales, and to educational assessment items for which successively higher integer scores are intended to indicate increasing levels of competence or attainment.
The International Association for the Evaluation of Educational Achievement (IEA)'s Trends in International Mathematics and Science Study (TIMSS) is a series of international assessments of the mathematics and science knowledge of students around the world. The participating students come from a diverse set of educational systems in terms of economic development, geographical location, and population size. In each of the participating educational systems, a minimum of 4,000 to 5,000 students is evaluated. Contextual data about the conditions in which participating students learn mathematics and science are collected from the students and their teachers, their principals, and their parents via questionnaires.
Psychometric software refers to specialized programs used for the psychometric analysis of data obtained from tests, questionnaires, polls or inventories that measure latent psychoeducational variables. Although some psychometric analyses can be performed using general statistical software such as SPSS, most require specialized tools designed specifically for psychometric purposes.
Benjamin Drake Wright was an American psychometrician. He is largely responsible for the widespread adoption of Georg Rasch's measurement principles and models. In the wake of what Rasch referred to as Wright's “almost unbelievable activity in this field” in the period from 1960 to 1972, Rasch's ideas entered the mainstream in high-stakes testing, professional certification and licensure examinations, and in research employing tests, and surveys and assessments across a range of fields. Wright's seminal contributions to measurement continued until 2001, and included articulation of philosophical principles, production of practical results and applications, software development, development of estimation methods and model fit statistics, vigorous support for students and colleagues, and the founding of professional societies and new publications.
Educational measurement refers to the use of educational assessments and the analysis of data such as scores obtained from educational assessments to infer the abilities and proficiencies of students. The approaches overlap with those in psychometrics. Educational measurement is the assigning of numerals to traits such as achievement, interest, attitudes, aptitudes, intelligence and performance.
Anton K. Formann was an Austrian research psychologist, statistician, and psychometrician. He is renowned for his contributions to item response theory, latent class analysis, the measurement of change, mixture models, categorical data analysis, and quantitative methods for research synthesis (meta-analysis).
David Andrich is an Australian academic and assessment specialist. He has made substantial contributions to quantitative social science including seminal work on the Polytomous Rasch model for measurement, which is used in the social sciences, in health and other areas.
Klaus D. Kubinger, is a psychologist as well as a statistician, professor for psychological assessment at the University of Vienna, Faculty of Psychology. His main research work focuses on fundamental research of assessment processes and on application and advancement of Item response theory models. He is also known as a textbook author of psychological assessment on the one hand and on statistics on the other hand.
The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.
Computational psychometrics is an interdisciplinary field fusing theory-based psychometrics, learning and cognitive sciences, and data-driven AI-based computational models as applied to large-scale/high-dimensional learning, assessment, biometric, or psychological data. Computational psychometrics is frequently concerned with providing actionable and meaningful feedback to individuals based on measurement and analysis of individual differences as they pertain to specific areas of enquiry.
Automatic item generation (AIG), or automated item generation, is a process linking psychometrics with computer programming. It uses a computer algorithm to automatically create test items that are the basic building blocks of a psychological test. The method was first described by John R. Bormuth in the 1960s but was not developed until recently. AIG uses a two-step process: first, a test specialist creates a template called an item model; then, a computer algorithm is developed to generate test items. So, instead of a test specialist writing each individual item, computer algorithms generate families of items from a smaller set of parent item models. More recently, neural networks, including Large Language Models, such as the GPT family, have been used successfully for generating items automatically.
Randy Elliot Bennett is an American educational researcher who specializes in educational assessment. He is currently the Norman O. Frederiksen Chair in Assessment Innovation at Educational Testing Service in Princeton, NJ. His research and writing focus on bringing together advances in cognitive science, technology, and measurement to improve teaching and learning. He received the ETS Senior Scientist Award in 1996, the ETS Career Achievement Award in 2005, the Teachers College, Columbia University Distinguished Alumni Award in 2016, Fellow status in the American Educational Research Association (AERA) in 2017, the National Council on Measurement in Education's (NCME) Bradley Hanson Award for Contributions to Educational Measurement in 2019, the E. F. Lindquist Award from AERA and ACT in 2020, elected membership in the National Academy of Education in 2022, and the AERA Cognition and Assessment Special Interest Group Outstanding Contribution to Research in Cognition and Assessment Award in 2024. Randy Bennett was elected President of both the International Association for Educational Assessment (IAEA), a worldwide organization primarily constituted of governmental and NGO measurement organizations, and the National Council on Measurement in Education (NCME), whose members are employed in universities, testing organizations, state and federal education departments, and school districts.
Mark Daniel Reckase is an educational psychologist and expert on quantitative methods and measurement who is known for his work on computerized adaptive testing, multidimensional item response theory, and standard setting in educational and psychological tests. Reckase is University Distinguished Professor Emeritus in the College of Education at Michigan State University.
Jacqueline P. Leighton is a Canadian-Chilean educational psychologist, academic and author. She is a full professor in the Faculty of Education as well as vice-dean of Faculty Development and Faculty Affairs at the University of Alberta.