Alternatives to Probabilistic Psychological Measurement

To explore historical and contemporary critiques of probabilistic psychometrics and survey alternative mathematical and conceptual frameworks proposed for psychological measurement.

Philosophical framing

Psychological measurement, at its core, grapples with the fundamental question of whether psychological attributes possess a quantitative structure amenable to numerical representation. This isn't merely a technical issue; it delves into the very nature of psychological reality. The prevailing probabilistic psychometrics, with its reliance on latent variables and statistical inference, implicitly assumes such a quantitative structure, often without rigorous empirical justification. This lecture will explore traditions that challenge this assumption, seeking to understand psychological phenomena through lenses that do not presuppose commensurability with standard numerical scales, drawing from philosophical critiques of measurement itself.

Introduction

The landscape of psychological measurement has long been dominated by probabilistic approaches, primarily rooted in Classical Test Theory (CTT) and Item Response Theory (IRT). These frameworks conceptualize psychological attributes as latent variables, measurable with some degree of error, and rely heavily on statistical inference to quantify these constructs [McNeish et al., 2020]. However, this probabilistic paradigm has faced persistent, fundamental critiques, questioning its very foundations and its suitability for capturing the complexity of psychological phenomena.

One of the most potent criticisms centers on the assumption that psychological attributes are inherently quantitative. Michell argues that psychometrics often assigns numbers to attributes "without ever considering whether those attributes can sustain the operations represented within the empirical numeric relation system so imposed" [Barrett, 2003, с. 75]. This "measurement imperative," stemming from a Pythagorean philosophy that equates science with quantification, has led to a "pathology of science" where numerical manipulation precedes empirical validation of quantitative structure [Barrett, 2003]. This lecture aims to survey the intellectual alternatives that have emerged from such critiques, exploring frameworks that seek to measure psychological phenomena without relying on probability theory, or at least, without its foundational assumptions.

Critique and limitations

One significant limitation of current psychometric practices, particularly those relying on sum scores, is the implicit assumption of a quantitative structure that may not empirically exist for many psychological attributes. As Michell and Barrett forcefully argue, psychometrics often assigns numbers without first verifying if the underlying attribute can sustain the operations implied by those numbers. If a psychological construct, say "anxiety," does not possess a true interval or ratio scale property, then treating sum scores as such for statistical analyses (e.g., calculating means, standard deviations, or using parametric tests) can lead to misleading conclusions about the magnitude and differences between individuals. Without this foundational empirical validation, any subsequent statistical inference, however sophisticated, operates on potentially flawed premises, making it difficult to interpret what the numbers truly represent beyond mere ordinal rankings.

Another critical weakness lies in the often-unexamined assumptions embedded within statistical models, which can severely compromise the generalizability and replicability of findings. Greenland et al. highlight that "every method of statistical inference depends on a complex web of assumptions," many of which are "unrealistic or at best unjustified" in practice. For instance, the assumption of random sampling or independence, crucial for many probabilistic models, is frequently violated in psychological research, especially with convenience samples. If these foundational assumptions are not met, the statistical tests and confidence intervals derived from them become unreliable, leading to inflated Type I error rates or inaccurate effect size estimates. This means that even if a study reports statistically significant results, these findings might not generalize to other populations or contexts, and attempts to replicate them could fail not due to a lack of a true effect, but due to the invalidity of the underlying statistical model's assumptions in the new context.

Finally, the widespread reliance on traditional probabilistic models often overlooks the dynamic, contextual, and emergent nature of many psychological phenomena. Jana Uher critiques the application of propensity probabilities to individuals, arguing that "the individual’s empirical frequencies cannot converge (i.e., stabilise) in the long run as may be true for physical systems." Psychological attributes, such as personality, are assumed to develop and change over the lifespan, meaning that probabilities themselves are not static. This transience and context-dependency challenge models that seek to measure fixed, latent traits. If psychological states are better understood as emergent properties of complex, interacting systems, as suggested by network approaches [Borsboom et al., 2021], then static probabilistic models that treat attributes as stable, independent entities will inevitably misrepresent the phenomenon. The difficulty in addressing this limitation lies in developing mathematical frameworks that can adequately capture such inherent dynamism and context-sensitivity without sacrificing the rigor required for scientific inquiry.

Sources

Joseph F. Hair; G. Tomas M. Hult; Christian M. Ringle; Marko Sarstedt; Nicholas P. Danks; Soumya Ray. Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R (2021) ↗ doi
Nelson Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity (2001) ↗ doi
Dale H. Schunk. Self-Efficacy and Academic Motivation (1991) ↗ doi
Nikolaus Kriegeskorte. Representational similarity analysis – connecting the branches of systems neuroscience (2008) ↗ doi
Sander Greenland; Stephen Senn; Kenneth J. Rothman; John B. Carlin; Charles Poole; Steven N. Goodman; Douglas G. Altman. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations (2016) ↗ doi
Jens Hainmueller; Daniel J. Hopkins; Teppei Yamamoto. Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments (2013) ↗ doi
T. Bedirhan Üstün; Somnath Chatterji; Nenad Kostanjsek; Jürgen Rehm; Cille Kennedy; JoAnne E. Epping‐Jordan; Shekhar Saxena; Michael Von Korff; Charles B. Pull. Developing the World Health Organization Disability Assessment Schedule 2.0 (2010) ↗ doi
Robert M. Sellers; Mia Smith Bynum; J. Nicole Shelton; Stephanie J. Rowley; Tabbye M. Chavous. Multidimensional Model of Racial Identity: A Reconceptualization of African American Racial Identity (1998) ↗ doi
Jose Benitez; Jörg Henseler; Ana Castillo; Florian Schuberth. How to perform and report an impactful analysis using partial least squares: Guidelines for confirmatory and explanatory IS research (2019) ↗ doi
Monica Eriksson; Bengt Lindström. Antonovsky’s sense of coherence scale and the relation with health: a systematic review (2006) ↗ doi
R.Duncan Luce; John W. Tukey. Simultaneous conjoint measurement: A new type of fundamental measurement (1964) ↗ doi
Sarah Stewart‐Brown; Alan Tennant; Ruth Tennant; Stephen Platt; Jane Parkinson; Scott Weich. Internal construct validity of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS): a Rasch analysis using data from the Scottish Health Education Population Survey (2009) ↗ doi
Denny Borsboom; Marie K. Deserno; Mijke Rhemtulla; Sacha Epskamp; Eiko I. Fried; Richard J. McNally; Donald J. Robinaugh; Marco Perugini; Jonas Dalege; Giulio Costantini; Adela‐Maria Isvoranu; Anna Wysocki; Claudia D. van Borkulo; Riet van Bork; Lourens Waldorp. Network analysis of multivariate data in psychological science (2021) ↗ doi
David Meunier; Renaud Lambiotte; Edward T. Bullmore. Modular and Hierarchically Modular Organization of Brain Networks (2010) ↗ doi
Sacha Epskamp; Lourens Waldorp; René Mõttus; Denny Borsboom. The Gaussian Graphical Model in Cross-Sectional and Time-Series Data (2018) ↗ doi
Jane C. Weeks; Paul J. Catalano; Angel M. Cronin; Matthew Finkelman; Jennifer W. Mack; Nancy L. Keating; Deborah Schrag. Patients' Expectations about Effects of Chemotherapy for Advanced Cancer (2012) ↗ doi
JC Hobart. The Multiple Sclerosis Impact Scale (MSIS-29): A new patient-based outcome measure (2001) ↗ doi
Kari Sentz; Scott Ferson. Combination of Evidence in Dempster-Shafer Theory (2002) ↗ doi
Lex Borghans; Angela Duckworth; James J. Heckman; Bas ter Weel. The Economics and Psychology of Personality Traits (2008) ↗ doi
Fanni Bányai; Ágnes Zsila; Orsolya Király; Anikó Maráz; Zsuzsanna Elekes; Mark D. Griffiths; Cecilie Schou Andreassen; Zsolt Demetrovics. Problematic Social Media Use: Results from a Large-Scale Nationally Representative Adolescent Sample (2017) ↗ doi
Hans Strasburger; Ingo Rentschler; Martin Jüttner. Peripheral vision and pattern recognition: A review (2011) ↗ doi
Gerd Gigerenzer. Why Heuristics Work (2008) ↗ doi
A. E. Ades; G. Lu; Julian P. T. Higgins. The Interpretation of Random-Effects Meta-Analysis in Decision Models (2005) ↗ doi
Jens B. Asendorpf; Mark Conner; Filip De Fruyt; Jan De Houwer; Jaap J. A. Denissen; Klaus Fiedler; Susann Fiedler; David C. Funder; Reinhold Kliegl; Brian A. Nosek; Marco Perugini; Brent W. Roberts; Manfred Schmitt; Marcel A. G. van Aken; Hannelore Weber; Jelte M. Wicherts. Recommendations for Increasing Replicability in Psychology (2013) ↗ doi
Louis Guttman. A Basis for Scaling Qualitative Data (1944) ↗ doi
Daniel Kardefelt‐Winther; Alexandre Heeren; Adriano Schimmenti; Antonius J. van Rooij; Pierre Maurage; Michelle Colder Carras; Johan Edman; Alex Blaszczynski; Yasser Khazaal; Joël Billieux. How can we conceptualize behavioural addiction without pathologizing common behaviours? (2017) ↗ doi
Wolf Mehling; Michael Acree; Anita L. Stewart; Jonathan Silas; Alexander Jones. The Multidimensional Assessment of Interoceptive Awareness, Version 2 (MAIA-2) (2018) ↗ doi
Åsa Lundgren‐Nilsson; Ingibjörg H. Jónsdóttir; Julie Pallant; Gunnar Ahlborg. Internal construct validity of the Shirom-Melamed Burnout Questionnaire (SMBQ) (2012) ↗ doi
Marco Fabbri; Alessia Beracci; Monica Martoni; Debora Meneo; Lorenzo Tonetti; Vincenzo Natale. Measuring Subjective Sleep Quality: A Review (2021) ↗ doi
Daniel McNeish; Melissa Gordon Wolf. Thinking twice about sum scores (2020) ↗ doi
Lotfi A. Zadeh. FUZZY SETS (1996) ↗ doi
Patrick Mair; Reinhold Hatzinger. Extended Rasch Modeling: TheeRmPackage for the Application of IRT Models inR (2007) ↗ doi
Joel Michell. Measurement in Psychology (1999) ↗ doi
Matt N Williams; Carlos Alberto Gomez Grajales; Dason Kurkiewicz. Assumptions of Multiple Regression: Correcting Two Misconceptions (2020) ↗ doi
Richard Perline; Benjamin D. Wright; Howard Wainer. The Rasch Model as Additive Conjoint Measurement (1979) ↗ doi
Fabian Dablander; Max Hinne. Node centrality measures are a poor substitute for causal inference (2019) ↗ doi
E.H. Mamdani. Fuzzy sets and applications: selected papers by L A Zadeh (1988) ↗ doi
Jana Uher. Personality Psychology: Lexical Approaches, Assessment Methods, and Trait Concepts Reveal Only Half of the Story—Why it is Time for a Paradigm Shift (2013) ↗ doi
Joseph A. Goguen. L. A. Zadeh. Fuzzy sets. Information and control, vol. 8 (1965), pp. 338–353. - L. A. Zadeh. Similarity relations and fuzzy orderings. Information sciences, vol. 3 (1971), pp. 177–200. (1973) ↗ doi
David H Krantz. Conjoint measurement: The Luce-Tukey axiomatization and some extensions (1964) ↗ doi
Wenhua Zhao; Wei Jiang; Huilin Wang; Jianbo He; Cuiyun Su; Qitao Yu. Impact of Smoking History on Response to Immunotherapy in Non-Small-Cell Lung Cancer: A Systematic Review and Meta-Analysis (2021) ↗ doi
Paul Barrett. Beyond psychometrics (2003) ↗ doi
Probabilistic Models for Some Intelligence and Attainment Tests (2010) ↗ doi
Clark. Commerce, Culture, and Liberty: Readings on Capitalism Before Adam Smith (1776)
Clark. The Distribution of Wealth: A Theory of Wages, Interest and Profits (1908)
L. S. Vygotsky. Mind in Society: The Development of Higher Psychological Processes (1978)
Alice Ambrose; Ludwig Wittgenstein; G. E. M. Anscombe. Philosophical Investigations. (1954) ↗ doi
Max van Manen. Researching Lived Experience: Human Science for an Action Sensitive Pedagogy (1990)
Jean Piaget. The origins of intelligence in children. (1952) ↗ doi
Francisco J. Varela; Eleanor Rosch; Evan Thompson. The Embodied Mind: Cognitive Science and Human Experience (2017)
R. L. French; Kurt Lewin; Dorwin Cartwright. Field Theory in Social Science (1953) ↗ doi
Bärbel Inhelder; Jean Piaget. The growth of logical thinking: From childhood to adolescence. (1958) ↗ doi
Jean Piaget. The Construction Of Reality In The Child (2013) ↗ doi
Jean Piaget. The construction of reality in the child. (1954) ↗ doi
Jean Piaget. The Moral Judgment Of The Child (2013) ↗ doi
Kurt Lewin. Principles of topological psychology. (1936) ↗ doi