10 Comments
User's avatar
Jane Smith's avatar

Just to check my understanding, it sounds like your main critique is that “exploratory” evaluations are essentially glorified p-hacking.

Expand full comment
Anthony Clairmont's avatar

Hi Jane - thanks for reading! You're a step ahead of me as I'm going to get into the statistical methodology in a post coming up soon. P-hacking is definitely a concern I have, since they ran tests on so many potential outcomes. I see nothing wrong with exploratory evaluations (like goal-free evaluations) per se. In this post, my main critique of this exploratory evaluation is that they never got around to 1) choosing from among the many potential criteria for success that they nominated, 2) setting any standards within those criteria, or 3) rendering an evaluative judgment about whether the program works. Stuff happened, some of it looked good, but we don't get a verdict on whether it was good enough to justify the intervention. What do you think?

Expand full comment
Jane Smith's avatar

Do you think an exploratory evaluation should have pre-set criteria and come to a verdict about the worth of a program? I always thought the goal of exploratory evaluations was to surface questions, metrics, and learnings for future evaluations and program adaptation. Wouldn’t coming to a conclusion about the overall worth of a program be more of a question for a summative evaluation?

Based on your summary and analysis, it seems the problems with the MOT evaluation as described are:

1) that the results were presented as as if they came from a more robust study rather than an exploratory one (that is definitely how the MOT study was presented in my classes, with its lack of pre-set outcomes relegated to the fine print); and

2) that the MOT evaluation findings were not contextualized among other programs that cost a similar amount and/or produced results of a similar magnitude.

Perhaps you are making the argument that evaluation should always contextualize results in this way, regardless of evaluation approach. I definitely wish this were more common.

Expand full comment
Anthony Clairmont's avatar

Your remarks here are highlighting something about the MTO study that doesn't sit right with me: the "final" evaluation report is presented as a summative evaluation but the methodology is that of an exploratory study. I think that in formative evaluation, I still want to have criteria and standards. Perhaps we should coin a term for a type of "research inquiry" or something that doesn't actually make any evaluative claims so that we don't have to worry about the central logic of evaluation when conducting one.

I agree with your first and second points. My main concern about the evaluation is that there are no criteria or standards for success, which makes it a non-evaluation.

Finally, I want to be clear that I am not above my own critique here. I've worked on plenty of evaluations, particularly earlier in my career that don't meet my current definition of evaluation. It's easy to accidentally do "program research", particularly if key data don't come through or you run out of runway with the time or budget. My goal in these pieces is not to wag my finger at practitioners - it's really to try to generate some coherent theoretical groundwork for myself and others.

Expand full comment
Jane Smith's avatar

Interesting—so what criteria or standards for success might an exploratory evaluation use?

Expand full comment
Anthony Clairmont's avatar

To paraphrase Joseph Wholey's chapter in the Handbook of Practical Program Evaluation (since I have it handy on my shelf!), exploratory evaluation commonly refers to a couple of things. It can be an evaluability assessment, which has its own criteria based on the readiness of the program. It can also be a rapid feedback evaluation, which "begins only after there is agreement on the goals (including goals for assessing, controlling, or enhancing important side effects) in terms of which a program is to be evaluated." Thus, in the case of RFE, I think that the criteria should still already be upstream of our data collection.

Expand full comment
Jane Smith's avatar

This is very helpful, thank you! Have you considered making a substack post of something like your “recommended resources” or “10 books that brought me here”?

Expand full comment
Julian King's avatar

Thanks also for linking to my article about different VfM methods. I would argue that CBA doesn’t let us off the hook from selecting criteria. It comes packaged with a single criterion (Kaldor-Hicks efficiency) so when we select CBA we are taking up a values position whether we declare it (or know it) or not. I argue, let’s define explicit, context-specific criteria and standards first, and then decide whether CBA has a place among our mix of methods.

Expand full comment
Anthony Clairmont's avatar

Thanks for the insight, Julian - you've introduced me to the idea of Kaldor-Hicks efficiency. What you're saying about defining criteria before selecting a method makes a lot of sense.

Expand full comment
Julian King's avatar

Another great article, thanks Anthony! Heather Nunns reviewed a sample of public sector evaluations in NZ and found only a minority of them (8/30) fully modelled explicit evaluative reasoning:

https://www.semanticscholar.org/paper/Evaluative-reasoning-in-public-sector-evaluation-in-Nunns-Peace/f0638341627a7fd7f1b5b3f944bc5b1051f9b5fe

Expand full comment