Matthias von Davier

Last updated
ISBN 978-0387329161
  • The Role of International Large-Scale Assessments: Perspectives from Technology, Economy, and Educational Research (2012) ISBN   978-9400797116
  • Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis (2013) ISBN   978-1439895122
  • Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS (2017) ISBN   978-3319586878
  • Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages (2019) ISBN   978-3030055837
  • Advancing Natural Language Processing in Educational Assessment (2023) ISBN   978-1032244525
  • Selected articles

    Related Research Articles

    Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

    <span class="mw-page-title-main">Educational Testing Service</span> Educational testing and assessment organization

    Educational Testing Service (ETS), founded in 1947, is the world's largest private educational testing and assessment organization. It is headquartered in Lawrence Township, New Jersey, but has a Princeton address.

    In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.

    <span class="mw-page-title-main">Likert scale</span> Psychometric measurement scale

    A Likert scale is a psychometric scale named after its inventor, American social psychologist Rensis Likert, which is commonly used in research questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, although there are other types of rating scales.

    <span class="mw-page-title-main">Programme for International Student Assessment</span> Scholastic performance study by the OECD

    The Programme for International Student Assessment (PISA) is a worldwide study by the Organisation for Economic Co-operation and Development (OECD) in member and non-member nations intended to evaluate educational systems by measuring 15-year-old school pupils' scholastic performance on mathematics, science, and reading. It was first performed in 2000 and then repeated every three years. Its aim is to provide comparable data with a view to enabling countries to improve their education policies and outcomes. It measures problem solving and cognition.

    The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, attitudes, or personality traits, and the item difficulty. For example, they may be used to estimate a student's reading ability or the extremity of a person's attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession, agriculture, and market research.

    The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers. For example, the model is applicable to the use of Likert scales, rating scales, and to educational assessment items for which successively higher integer scores are intended to indicate increasing levels of competence or attainment.

    Estimation of a Rasch model is used to estimate the parameters of the Rasch model. Various techniques are employed to estimate the parameters from matrices of response data. The most common approaches are types of maximum likelihood estimation, such as joint and conditional maximum likelihood estimation. Joint maximum likelihood (JML) equations are efficient, but inconsistent for a finite number of items, whereas conditional maximum likelihood (CML) equations give consistent and unbiased item estimates. Person estimates are generally thought to have bias associated with them, although weighted likelihood estimation methods for the estimation of person parameters reduce the bias.

    Psychometric software refers to specialized programs used for the psychometric analysis of data obtained from tests, questionnaires, polls or inventories that measure latent psychoeducational variables. Although some psychometric analyses can be performed using general statistical software such as SPSS, most require specialized tools designed specifically for psychometric purposes.

    <span class="mw-page-title-main">Benjamin Drake Wright</span> American psychometrician (1926–2015)

    Benjamin Drake Wright was an American psychometrician. He is largely responsible for the widespread adoption of Georg Rasch's measurement principles and models. In the wake of what Rasch referred to as Wright's “almost unbelievable activity in this field” in the period from 1960 to 1972, Rasch's ideas entered the mainstream in high-stakes testing, professional certification and licensure examinations, and in research employing tests, and surveys and assessments across a range of fields. Wright's seminal contributions to measurement continued until 2001, and included articulation of philosophical principles, production of practical results and applications, software development, development of estimation methods and model fit statistics, vigorous support for students and colleagues, and the founding of professional societies and new publications.

    <span class="mw-page-title-main">Anton Formann</span> Austrian research psychologist, statistician and psychometrician

    Anton K. Formann was an Austrian research psychologist, statistician, and psychometrician. He is renowned for his contributions to item response theory, latent class analysis, the measurement of change, mixture models, categorical data analysis, and quantitative methods for research synthesis (meta-analysis).

    David Andrich is an Australian academic and assessment specialist. He has made substantial contributions to quantitative social science including seminal work on the Polytomous Rasch model for measurement, which is used in the social sciences, in health and other areas.

    The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.

    Computational psychometrics is an interdisciplinary field fusing theory-based psychometrics, learning and cognitive sciences, and data-driven AI-based computational models as applied to large-scale/high-dimensional learning, assessment, biometric, or psychological data. Computational psychometrics is frequently concerned with providing actionable and meaningful feedback to individuals based on measurement and analysis of individual differences as they pertain to specific areas of enquiry.

    Automatic item generation (AIG), or automated item generation, is a process linking psychometrics with computer programming. It uses a computer algorithm to automatically create test items that are the basic building blocks of a psychological test. The method was first described by John R. Bormuth in the 1960s but was not developed until recently. AIG uses a two-step process: first, a test specialist creates a template called an item model; then, a computer algorithm is developed to generate test items. So, instead of a test specialist writing each individual item, computer algorithms generate families of items from a smaller set of parent item models. More recently, neural networks, including Large Language Models, such as the GPT family, have been used successfully for generating items automatically.

    Alina Anca von Davier is a psychometrician and researcher in computational psychometrics, machine learning, and education. Von Davier is a researcher, innovator, and an executive leader with over 20 years of experience in EdTech and in the assessment industry. She is the Chief of Assessment at Duolingo, where she leads the Duolingo English Test research and development area. She is also the Founder and CEO of EdAstra Tech, a service-oriented EdTech company. In 2022, she joined the University of Oxford as an Honorary Research Fellow, and Carnegie Mellon University as a Senior Research Fellow.

    Randy Elliot Bennett is an American educational researcher who specializes in educational assessment. He is currently the Norman O. Frederiksen Chair in Assessment Innovation at Educational Testing Service in Princeton, NJ. His research and writing focus on bringing together advances in cognitive science, technology, and measurement to improve teaching and learning. He received the ETS Senior Scientist Award in 1996, the ETS Career Achievement Award in 2005, the Teachers College, Columbia University Distinguished Alumni Award in 2016, Fellow status in the American Educational Research Association (AERA) in 2017, the National Council on Measurement in Education's (NCME) Bradley Hanson Award for Contributions to Educational Measurement in 2019, the E. F. Lindquist Award from AERA and ACT in 2020, elected membership in the National Academy of Education in 2022, and the AERA Cognition and Assessment Special Interest Group Outstanding Contribution to Research in Cognition and Assessment Award in 2024. Randy Bennett was elected President of both the International Association for Educational Assessment (IAEA), a worldwide organization primarily constituted of governmental and NGO measurement organizations, and the National Council on Measurement in Education (NCME), whose members are employed in universities, testing organizations, state and federal education departments, and school districts.

    Mark Daniel Reckase is an educational psychologist and expert on quantitative methods and measurement who is known for his work on computerized adaptive testing, multidimensional item response theory, and standard setting in educational and psychological tests. Reckase is University Distinguished Professor Emeritus in the College of Education at Michigan State University.

    Jacqueline P. Leighton is a Canadian-Chilean educational psychologist, academic and author. She is a full professor in the Faculty of Education as well as vice-dean of Faculty Development and Faculty Affairs at the University of Alberta.

    Fumiko Samejima (1930–c2021) was a prominent Japanese-born psychometrician best known for her development of the Graded Response Model (GRM), a fundamental approach in Item Response Theory (IRT). Her innovative methods became influential in psychological and educational measurement, particularly in improving the accuracy of tests involving Likert-scale questions and other graded responses. She published her seminal paper “Estimation of Latent Ability Using a Response Pattern of Graded Scores” in 1969. This publication became a foundational reference in psychometric literature, significantly advancing the analysis of ordered categorical data.

    References

    1. "Our Staff - Matthias von Davier". timssandpirls.bc.edu.
    2. "Methodology of Educational Measurement and Assessment". Springer.
    3. 1 2 "Presenters, Moderators, and Discussants' Biographies" (PDF).
    4. 1 2 "Awards - NCME". www.ncme.org.
    5. 1 2 "Awards". www.aera.net.
    6. "2021 AERA Fellows". www.aera.net.
    7. "Matthias von Davier Newly Elected Member of the National Academy of Education | IEA.nl". www.iea.nl.
    8. "New Executive Editor for Psychometrika". Psychometric Society. July 24, 2023.
    9. "Large-scale Assessments in Education". SpringerOpen.
    10. "Anastasi Lecture 2022 | Fordham". www.fordham.edu.
    11. "Conference Program | IEA.nl". www.iea.nl.
    12. Yelie, Yuan (April 17, 2023). "Interdisciplinary Seminar: Matthias von Davier, Boston College | Department of Statistics".
    13. "Guest lecture: Dr. Matthias von Davier - Munich Center of the Learning Sciences - LMU Munich". www.en.mcls.uni-muenchen.de.
    14. "PIAAC Methodological Seminar" (PDF).
    15. "Bio". September 2, 2016.
    16. "Matthias Von Davier | National Education Policy Center". nepc.colorado.edu.
    17. "NEPS > Project Overview > Advisory Experts > Matthias von Davier". www.neps-data.de.
    18. "Our Staff - Matthias von Davier". timssandpirls.bc.edu.
    19. "Meet Matthias von Davier - Lynch School of Education and Human Development". Boston College.
    20. "Matthias von Davier - Lynch School of Education and Human Development". Boston College.
    21. "Parallel computing for data analysis using generalized latent variable models".
    22. "Systems and methods for evaluating multilingual text sequences".
    23. "Mixture general diagnostic model".
    24. "System and Method for Large Scale Survey Analysis". www.ets.org.
    25. von Davier, Matthias; Rost, Jürgen (1995). "Polytomous Mixed Rasch Models". Rasch Models. pp. 371–379. doi:10.1007/978-1-4612-4230-7_20. ISBN   978-1-4612-8704-9.
    26. von Davier, Matthias (February 2014). "The DINA model as a constrained general diagnostic model: Two variants of a model equivalency". British Journal of Mathematical and Statistical Psychology. 67 (1): 49–71. doi:10.1111/bmsp.12003. PMID   23297749.
    27. von Davier, Matthias (December 2016). "High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models". ETS Research Report Series. 2016 (2): 1–11. doi:10.1002/ets2.12120.
    28. von Davier, Matthias; Cho, Youngmi; Pan, Tianshu (March 2019). "Effects of Discontinue Rules on Psychometric Properties of Test Scores". Psychometrika. 84 (1): 147–163. doi:10.1007/s11336-018-09652-3. PMID   30607661.
    29. 1 2 Ulitzsch, Esther; von Davier, Matthias; Pohl, Steffi (November 2020). "A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response". British Journal of Mathematical and Statistical Psychology. 73 (S1): 83–112. doi:10.1111/bmsp.12188. PMID   31709521.
    30. 1 2 Leng, Dihao; Bezirhan, Ummugul; Khorramdel, Lale; Fishbein, Bethany; Davier, Matthias von (September 2024). "Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model". Educational Measurement: Issues and Practice. 43 (3): 64–75. doi: 10.1111/emip.12606 .
    31. 1 2 von Davier, Matthias (December 2018). "Automated Item Generation with Recurrent Neural Networks". Psychometrika. 83 (4): 847–857. doi:10.1007/s11336-018-9608-y. PMID   29532403.
    32. 1 2 Bezirhan, Ummugul; von Davier, Matthias (2023). "Automated reading passage generation with OpenAI's large language model". Computers and Education: Artificial Intelligence. 5: 100161. arXiv: 2304.04616 . doi:10.1016/j.caeai.2023.100161.
    33. 1 2 Jung, Ji Yoon; Tyack, Lillian; von Davier, Matthias (8 April 2024). "Combining machine translation and automated scoring in international large-scale assessments". Large-scale Assessments in Education. 12 (1). doi: 10.1186/s40536-024-00199-7 .
    34. von Davier, Matthias; Khorramdel, Lale; He, Qiwei; Shin, Hyo Jeong; Chen, Haiwen (December 2019). "Developments in Psychometric Population Models for Technology-Based Large-Scale Assessments: An Overview of Challenges and Opportunities". Journal of Educational and Behavioral Statistics. 44 (6): 671–705. doi:10.3102/1076998619881789.
    35. von Davier, Matthias; Yamamoto, Kentaro; Shin, Hyo Jeong; Chen, Henry; Khorramdel, Lale; Weeks, Jon; Davis, Scott; Kong, Nan; Kandathil, Mat (4 July 2019). "Evaluating item response theory linking and model fit for data from PISA 2000–2012". Assessment in Education: Principles, Policy & Practice. 26 (4): 466–488. doi:10.1080/0969594X.2019.1586642.
    36. "PISA 2022 Technical Report" (PDF).
    37. Rost, Jürgen; Carstensen, Claus; von Davier, Matthias (1997). "Applying the mixed Rasch model to personality questionnaires". In Rost, Jürgen (ed.). Applications of Latent Trait and Latent Class Models in the Social Sciences. Waxmann. pp. 324–332. ISBN   978-3-89325-464-4.
    38. von Davier, Matthias; Naemi, Bobby; Roberts, Richard D. (October 2012). "Factorial Versus Typological Models: A Comparison of Methods for Personality Data". Measurement: Interdisciplinary Research & Perspective. 10 (4): 185–208. doi:10.1080/15366367.2012.732798.
    39. von Davier, Matthias; Shin, Hyo-Jeong; Khorramdel, Lale; Stankov, Lazar (June 2018). "The Effects of Vignette Scoring on Reliability and Validity of Self-Reports". Applied Psychological Measurement. 42 (4): 291–306. doi:10.1177/0146621617730389. PMC   5978608 . PMID   29881126.
    40. Pohl, Steffi; Ulitzsch, Esther; von Davier, Matthias (23 April 2021). "Reframing rankings in educational assessments". Science. 372 (6540): 338–340. Bibcode:2021Sci...372..338P. doi:10.1126/science.abd3300. PMID   33888624.
    41. "Matthias von Davier". scholar.google.com.
    42. Bradstreet, Thomas E.; Cohen, Allan S.; Anderson-Cook, Christine M.; Cook, John R.; Robinson, Timothy J.; Cavanaugh, Joseph; Embrechts, Paul; Oleson, Jacob J. (2008). "Telegraphic Reviews". Journal of the American Statistical Association. 103 (481): 433–436. doi:10.1198/jasa.2008.s227. JSTOR   27640065 via JSTOR.
    43. Ackerman, Terry (3 July 2015). "Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2009). Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis. New York, NY: CRC Press". International Journal of Testing. 15 (3): 274–289. doi:10.1080/15305058.2015.1034867.
    44. Bao, Yu; Mireles, Nicolas Emundo (2 October 2023). "Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages , by Matthias von Davier, Young-Sun Lee, New York, United States, Springer, 2019, 656 pp., ISBN: 978-3-030-05583-7: by Matthias von Davier, Young-Sun Lee, New York, United States, Springer, 2019, 656 pp., ISBN: 978-3-030-05583-7". Measurement: Interdisciplinary Research and Perspectives. 21 (4): 282–285. doi:10.1080/15366367.2022.2159686.
    45. Rutkowski, Leslie; Gonzalez, Eugenio; Joncas, Marc; von Davier, Matthias (March 2010). "International Large-Scale Assessment Data: Issues in Secondary Analysis and Reporting". Educational Researcher. 39 (2): 142–151. doi:10.3102/0013189X10363170.
    46. von Davier, Matthias; Gonzalez, Eugenio; Mislevy, Robert (January 30, 2009). "What are plausible values and why are they useful?" (PDF). ETS Research Report Series.
    47. von Davier, Matthias (November 2008). "A general diagnostic model applied to language testing data". British Journal of Mathematical and Statistical Psychology. 61 (2): 287–307. doi:10.1348/000711007X193957. PMID   17535481.
    48. von Davier, Matthias (December 2005). "A General Diagnostic Model Applied to Language Testing Data". ETS Research Report Series. 2005 (2): i–35. doi:10.1002/j.2333-8504.2005.tb01993.x.
    49. von Davier, Matthias (December 2014). "The Log-Linear Cognitive Diagnostic Model ( LCDM ) as a Special Case of the General Diagnostic Model (GDM)". ETS Research Report Series. 2014 (2): 1–13. doi:10.1002/ets2.12043.
    Matthias von Davier
    Occupation(s)Psychometrician, academic, inventor, and author
    AwardsETS Scientist Award, Educational Testing Service (2006)
    Bradley Hanson Award for Contributions to Educational Measurement, National Council on Measurement in Education (2012)
    Award for Significant Contribution to Educational Measurement and Research Methodology, American Educational Research Association (AERA) (2017)
    Academic background
    EducationMasters in Psychology
    Dr. rer. nat.
    Alma mater Kiel University