Faking It

Stufflebeam (2002) on Pseudoevaluation

Feb 03, 2025

I read a lot of evaluations in my daily work. It’s not uncommon for me to begin my own evaluations by summarizing all the previous reports that have been written about an evaluand, sometimes going back many years if the organization is a large one. What I have found while doing this is that the overall quality of evaluation work is poor. Most things that are called “evaluations” do not actually evaluate anything.

While reading the reports, I could make two piles. The first pile would contain evaluations that are good in theory but fail in execution. The second pile would be the much larger stack of evaluations that are not even on the right track to begin with, so their execution doesn’t matter. People who work in academia who ask me about my work tend to assume that most of the problems with evaluation are of the former sort: inadequate sample sizes, randomization failures, issues collaborating with institutions, and so forth. One of the factors they generally underestimate, and which will land a report in the second pile, is the problem of pseudoevaluation.

Daniel Stufflebeam introduces the issue of pseudoevaluation like this:

Evaluators and their clients are sometimes tempted to shade, selectively release, or even falsify findings. While such efforts might look like sound evaluations, they are aptly termed pseudo-evaluations if they fail to produce and report valid assessments of merit and worth to all right-to-know audiences.
Pseudoevaluations often are motivated by political objectives. For example, persons holding or seeking authority may present unwarranted claims about their achievements and/or the faults of their opponents or hide potentially damaging information. These objectionable approaches are presented because they deceive through evaluation and can be used by those in power to mislead constituents or to gain and maintain an unfair advantage over others, especially persons with little power. If evaluators acquiesce to and support pseudoevaluations, they help promote and support injustice, mislead decision making, lower confidence in evaluation services, and discredit the evaluation profession.

Stufflebeam1 names two types of pseudoevaluations: public relations-inspired studies and politically controlled studies. He considers these “bad and questionable practices” because they do not produce and report valid assessments of merit and worth to all audiences who have a right to know.

These two types of pseudo-evaluations are similar but distinct, so I’ll compare and contrast them before getting further into our discussion.

Public Relations-Inspired Studies

PR-inspired studies do not seek truth but rather aim to create a favorable impression of a program, often by acquiring and broadcasting biased information. They are driven by the information needs of propagandists (Stufflebeam’s term) and seek to confirm an organization's claims of excellence and secure public support. They avoid gathering or releasing negative findings, using methods like biased surveys, inappropriate norms, and selectively reporting only the positive results. The fatal flaw of these studies is the built-in bias to report only positive information, which mislead stakeholders.

These evaluations are built to fail. When the client says “don’t look in that cupboard” the evaluator dutifully leaves it closed. When negative or even neutral results arise, the evaluator succumbs to pressure to quash them.

Politically Controlled Studies

Politically controlled studies may seek the truth, but the findings are not shared with all right-to-know audiences. The client's purpose is to gain or maintain influence, power, or money, and the questions addressed are those of interest to the client and special groups who share their aims. Typical methods include covert investigations, private polls, and the selective release of findings. These studies can be considered pseudoevaluations when clients violate agreements to disclose some or all of the findings.

As a point of clarification, Stufflebeam does not argue that all evaluation reports need to be public information by default. Rather, he is arguing that it is malpractice to hide evaluation findings from right-to-know audiences when there is a pre-existing agreement to share them with those specific audiences.

Put differently, we could fix a politically controlled study by sharing methods and results more transparently with the right-to-know audiences. The evaluation study itself may be well done, but powerful actors have prevented the whole truth from going public.

Comparing the two types of pseudoevaluations

Both types of pseudoevaluations are motivated by political objectives, where people in power present unwarranted claims or hide damaging information. In PR-inspired studies, this happens at the level of the study design, while in the politically controlled studies this happens at the level of dissemination. The final effect is the same, although the damage caused by politically controlled studies is easier to fix – all it takes is for a whistleblower to emerge.

Smart authoritarians will choose to conduct PR-inspired studies to provide political cover for their activities over politically controlled studies that create potential ammunition for their enemies. The exception to this logic occurs when authoritarians actually need a good evaluation for decision-making purposes, so they take the risk.

PR-inspired studies call for more “dirty tricks”, so there is usually a black hat social scientist or statistician in the mix somewhere to make sure that the study arrives at the positive results the PR department (or their equivalent representative) expects. This may show up as the requirement to work closely with with an in-house “analyst” who will have the final say over all instruments. You will usually realize you are involved in a PR-inspired pseudoevaluation before it’s over. Politically-controlled evaluations are harder for evaluators to avoid, since the study itself may be of high quality and the evaluator may be given assurances that the results will be shared with all right-to-know parties and politely ushered off the project.

At a systemic level, both kinds of pseudoevaluations tend to create the same problem for the evidence base, namely publication bias. Only reporting positive results while hiding negative or null findings, either by biasing the instruments or suppressing negative findings, generates a lopsided evidence base for future decision-makers. Next time someone is trying to decide whether this sort of intervention or product is a good idea, they will do a literature review and see that, on balance, the approach works, even if this isn’t actually true. In other words, both kinds of pseudoevaluation not only mislead the public about the particular evaluand, they actually make progress less likely for everyone else.

Are pseudoevaluations “approaches”?

Stufflebeam’s monograph divides the field of evaluation into 22 different “approaches” and then evaluates these approaches on disciplinary criteria. He says that he “uses the term evaluation approach rather than evaluation model because the former is broad enough to cover illicit as well as laudatory practices” (p.9). However, I do think it is interesting to ask whether “approach” is the right word to describe pseudoevaluations, since it implies that these are ways that people choose to initially set up an evaluation. In my experience, a better word for these two types of pseudoevaluation, borrowed from dynamical system theories, would be attractor - since attractors are states towards which systems tend to evolve. I would wager that many people who are involved in pseudoevaluations did not begin with the intentions of doing things this way. Rather, they responded to incentives in their local system as the evaluation developed. What might these incentives be?

Early results are received poorly by leadership, who react by punishing or threatening stakeholders who are involved with the evaluation.
The organizational environment becomes more politically hostile during the timespan of the evaluation, causing stakeholders to fear the consequences of public disclosures.
The organization starts having problems in other areas that have nothing to do with the evaluand and stakeholders decide that the evaluation could give them a “win” if they can control the results.
Internal politics can cause the futures of particular people to become yoked to the perceived success or failure of the evaluand. These politics can change while the evaluation is ongoing, even if they were not there at the beginning.

In other words, the relationship between stakeholders and the evaluator is a relationship between a principal and an agent. This relationship can get out of sync and the interests of the evaluator can begin to diverge from the interests of (at least some) of the stakeholders, creating a classic principal-agent problem. In most cases, the evaluator is responsible with upholding the interests of many stakeholders, which makes this is multiple principle problem, which is special kind of principal-agent problem in which the agent is tasked with taking actions on behalf of multiple stakeholders. One popular attempt to deny that this is a problem is utilization-focused evaluation, which hold that evaluators are only only beholden to one principal – the primary intended user. Rather than define it out of existence, I prefer to stare the problem in the face and think in terms of user networks.

As a result of the multiple principal problem, the principal-agent relationship that sets the stage for a good evaluation is fragile. Disturbances can undermine it easily, sending the system on a journey towards one of the pseudoevaluation attractors.

This distinction may be important, I think, because thinking of pseudoevaluation as a type of evaluation means that it is a fixed type of evaluation. We either are or are not doing a pseudoevaluation, and once we admiit that we are doing one, it’s conceptually difficult to imagine fixing the problem. If we think of pseudoevaluation as an attractor, there may be actions we can take to push its trajectory back towards a different state and it is easier to imagine fixing an evaluation that has gone pseudo.

How do we avoid faking it?

When evaluators participate in pseudoevaluations, they promote injustice, mislead decision-making, and discredit the profession. How do we avoid this situation?

The first step is, of course, prevention. To prevent PR-inspired evaluation, watch out for intense pressure to go against best practices and your own professional standards to change survey instruments or interview questions. One red flag that I have learned to look for are stakeholders who get openly angry about the development of a research instrument. Provided you haven’t done anything really weird, anyone who yells and stamps their feet about a survey question is engaging a theatrical performance to put political pressure on you. Being desperate to please and conflict averse will make you a bad evaluator, and more generally, open to manipulation. To prevent politically controlled evaluations, start planning for the sharing of results early in the study, before results are collected. Get a list of right-to-know parties and include them in meeting invitations as the evaluation goes along. Don’t create easy opportunities for stakeholders to get cut out of the process. If possible, disseminate results yourself instead of relying on stakeholders to do their own dissemination.

If you find yourself in a situation in which certain political forces are already conspiring to push things towards pseudoevaluation, it’s time to take a step back and strategize. Do not go with the flow. If you see the evaluation moving toward a PR-inspired inspired evaluation, it may be time to do more education about the fundamentals of evaluation with stakeholders. Return to what they really want to know and why this is important to them. Explain that biasing the data will waste their time and resources. Play through some positive scenarios that can happen if they do evaluation right. If you see the evaluation moving towards a politically-controlled attractor state, start talking explicitly about the situation with stakeholders. Point out the benefits of involving right-to-know parties and the high chances that they will find out about the results anyway, sooner or later.

If it is too late for all of this, you may need to leave the evaluation. While this will probably cost you money in the short term, putting your name to a pseudoevaluation will cost you in credibility and self-respect in the long term. Talking to evaluators with longer careers than mine, I’ve heard several harrowing stories of uncovering fraud and major mismanagement – none of these evaluators expressed any regret about doing the right thing and walking away. Unfortunately, however, individual acts of conscience will not be enough to get us out of the of pseudoevaluation problem.

Winning the coordination game

What happens to clients whose evaluators walk away after refusing to conduct a pseudoevaluation? Presumably, they hire another evaluator. This evaluator in turn has a choice to conduct a pseudoevaluation or quit, and the scenario repeats. In game theory, this is known as a coordination game. A coordination game occurs when multiple actors must decide whether to “cooperate” or “defect.” If all evaluators commit to rigorous, high-quality evaluations, then these will become the norm. If some evaluators succumb to incentives for bias, they may gain short-term rewards (funding, continued contracts), creating pressure for others to follow suit.

The more evaluators who participate in pseudoevaluation, the harder it is for any single evaluator to remain honest—because funders will simply hire the ones who produce favorable results. This creates a self-reinforcing equilibrium: as more evaluations become politicized, organizations expect evaluations to serve political purposes, and evaluators who insist on truthfulness may be starved out of the system.

Now imagine that this goes on for a long time. Do things get better or worse? Eventually they stabilize in a Nash equilibrium, which is what happens to the system when no player can improve their individual situation by adopting a different individual strategy. A Nash equilibrium in this context emerges when 1) no individual evaluator benefits from unilaterally switching to a more truthful stance because doing so would cost them future contracts and 2) no stakeholder benefits from demanding more rigorous evaluation, as doing so could expose the weaknesses of their evaluand. Once this equilibrium is reached, pseudoevaluation becomes an attractor.

The main reason I like to play out Nash equilibria in this way is because they are excellent for showing us the consequences of a failure to cooperate. Nash equilibria result from individuals acting only in their own interests and only as atomized units. The attractor in this equilibrium is the default condition towards which the system drifts unless a major intervention disrupts it. What might this major cooperative intervention be? Well, for starters, a more aggressive professional association analogous the American Medical Association or the American Bar Association would be able to name and shame bad actors, such as stakeholders who demand pseudoevaluation. If this association offered testing and licensure, it could withdraw licensure from professionals who repeatedly produced biased evaluations. It could provide legal support and professional insurance for smaller evaluation firms who might be vulnerable to lawsuits from more powerful clients. It could lobby for policy such as rules in government contracting that shift from hiring evaluators based on previous “satisfactory” results to rewarding those whose findings are later validated by independent audits.

There are other cooperative solutions as well, such as creating a large union (think SAG-AFTRA for evaluators), but the strategy is the same from a game theoretic standpoint: shift the incentives by imposing additional costs on players who engage in pseudoevaluation.

Every time I say “Daniel Stufflebeam” out loud in a classroom or similar setting someone snickers at his last name. At these times I am moved to mention that, like several of the foundational figures of our discipline, Stufflebeam was a certifiable badass, whose accomplishments included founding the renowned evaluation center at Western Michigan University and the Center for Research on Educational Accountability and Teacher Evaluation (CREATE). He was tough too, surviving being hit by a bus and a kidnapping by Basque separatists. See my previous post on his CIPP framework.

Program Evaluation

Discussion about this post