Latent Variables and Evaluation

Why evaluation needs latent variable realism to make sense

Sep 21, 2024

In October, I’ll be speaking about the philosophy of measurement at a panel during the American Evaluation Association’s annual conference in Portland, OR. When my talk starts, I’ll simultaneously post the text to Substack for those who can’t attend. (This seems like an obvious use case for Substack - documenting conference talks that I probably won’t publish in a journal anyway. Maybe we should all do it.)

The topic of my talk is the set of philosophical requirements for measurement of subjective mental traits and states, one of which, I argue, is realism about latent variables. This post is an introduction to the idea of latent variable realism for evaluators. If you are reading this post in the future and your interest in latent variable realism was piqued by my AEA talk, this is a great place to start. For most readers, however, I hope this will simply be a helpful introduction to what I consider a critical topic

What is a latent variable?

To understand why “latent variable realism is important to evaluation”, let’s break down my phrase into its component parts. In psychometrics, a latent variable (also known as an underlying or hidden variable) refers to a theoretical construct that cannot be directly observed or measured, but can be inferred from patterns in observable data. Latent variables are often considered to be the true cause of the relationships between the manifest variables - the latter being variables that we can observe easily.

For example, in a test of reading comprehension, a latent variable might represent an individual's true reading ability. This ability cannot be directly observed, but the test scores on specific reading comprehension items (the manifest variables) can be used to estimate this underlying ability. In a psychological study of personality traits, a latent variable might represent a person's true level of extraversion or neuroticism. These traits are not directly observable, but the answers to specific questions or scales (the manifest variables) can be used to estimate these underlying traits.

Indirectly observable causal forces rule the world around us. Photo by Sören Funk.

Latent variables are modeled using statistical techniques such as factor analysis, structural equation modeling (SEM), or Rasch measurement theory. These techniques get off the ground by making some mathematical assumptions about the data (FA and SEM) or specifying that the data need to meet certain requirements (Rasch). From there, they use mathematical functions to explore or confirm the relationships between manifest variables (usually items on a test or survey). These correlational relationships can be very simple or complex and help us to make inference about the unobserved variable that causes all the manifest variables.

Many people of a strong empirical tendency are instantly skeptical when they hear that we are making inferences from observables to unobservables. After all, moving science towards “observation language” was a major project of analytic philosophy for much of the 20th century. Without going too deeply into this debate, for now I’ll simply make the observation that we couldn’t get very far without some sort of reasoning about the common causes of observable things. When you come home after being away for the day and find that 1) there is a new hole in the sofa, 2) there is stuffing on the floor, and 3) your dog is giving you that guilty1 look, you do not have any difficulty making an inference to the common cause of these observables. This is inference is defeasible - meaning that it could be wrong - but it is justifiable.

What is realism about latent variables?

In general, ontological realism refers to the idea that at least some things exist independently of the mind. The most dramatic alternative to realism is idealism, which asserts that nothing exists independently of the mind. There are various epistemic (that is, knowledge-related) positions that we could use to qualify our stance on the existence of mind-independent things: naive realism is the position that it is easy to know things about reality, while critical realism is the position that it is very hard to know things about reality. Some positions, like logical positivism (e.g. Carnap) and radical constructivism (e.g. von Glasersfeld) conflate ontology and epistemology - both claim in different ways that, because we cannot directly observe reality, it either doesn’t exist or is actually multiple fragmented “realities.” (This paragraph is a broad sketch to get us going - there are some versions of empiricism and constructivism that make more sense.)

Realism about latent variables means that latent variables have an existence independent of their measurement or use as theoretical terms. Things like “satisfaction with the program” or “reading ability” are entities or attributes with properties that exist whether or not we are looking. These constructs are not merely statistical fictions created to simplify data analysis but rather represent genuine psychological or scientific phenomena. They are not just tools for modeling and understanding observable behavior and they are not reducible to an “operational” definition. We can’t define the thing we are measuring solely in terms of the manifest variables we use to measure it.

Realism about latent variables does not commit us to the idea that we can reliably know anything about them. This is perhaps the single most important philosophical lesson about realism that I can teach. Why is measurement of latent variables so hard? It turns out that we need to get a lot of things of right for it to work. We need to have an excellent and complete definition of the construct (the thing we are measuring). We need to be able to measure all its important dimensions or to cut it up into smaller unidimensional constructs. We need to remove as much noise as possible from the measurement process by removing construct-irrelevant dimensions. An academic researcher could spend an entire career trying to accurately measure a latent variable and still fall short. People who are obsessed with IQ demonstrate immense hubris in this regard.

Nevertheless, realists know that if we don’t try to measure latent variables, we will be missing a lot of what is really going on in the world. US inflation has slowed down according to economic indicators and wages are rising, but how do people feel about the economy? The answer to this question will matter for both markets and elections. The personality trait of neuroticism appears to predict chronic illness2 - wouldn’t it be great to develop health interventions that work well for neurotic people? We have to identify them first.

What does this have to do with evaluation?

Imagine explaining the following to your evaluation stakeholders:

To show that the program worked, we measured participants’ level of domain knowledge before and after the training. Of course, domain knowledge in this area doesn’t actually exist independent of our attempt to measure it, or it may exist in one reality while participants exist in another reality, honestly we aren’t sure.

I imagine that the follow-up questions would be versions of “What?” and “Huh?”. The fact that some evaluators privately think this but never say it out loud does not improve the situation much. The realist account would go something like this:

To show that the program worked, we measured participants’ level of domain knowledge before and after the training. Of course, domain knowledge can be hard to measure, so we took great care to validate our instruments using several sources of evidence which you can find in the main report, including but not limited statistical techniques. It is always possible to do a better job measuring constructs like “knowledge gain”, so we plan to continue refining our instruments until we can no longer identify areas for improvement or the program ends.

This sounds like the beginning of a good conversation about the program. The main issue here is not just that the alternatives to realism sound ridiculous, but that the entire evaluation enterprise hinges on the constructs being real. Programs are supposed to cause things to happen - real things. If our evaluation is not picking up on those things, then it isn’t an evaluation.

In my last post, I wrote about realist evaluation and mechanisms. Realist evaluation is not the same thing as latent variable realism (my topic today) but they are related. Realist evaluation asks us to identify the mechanisms by which programs do what they do rather than to evaluate the program’s effects as a black box. Latent variables are often part of the mechanism in human services programs. Training programs work by increasing knowledge (latent), mental health programs work by reducing the severity of mental illnesses (latent), and so forth. If we don’t think that these are real things, then it is unclear from the perspective of realist evaluation 1) what programs are supposed to be doing, and 2) how we might identify mechanisms.

Yes, I know that dogs aren’t guilty, they’re actually scared. Like I said, inference to unobservable common causes is difficult and theory-laden.

Hudek-Knežević, J., & Kardum, I. (2009). Five-factor personality dimensions and 3 healthrelated personality constructs as predictors of health. Croatian medical journal, 50(4), 394-402. Link here.

Program Evaluation

Discussion about this post