Abductive Coding

A good default strategy for qualitative data analysis in evaluation. (Vila-Henninger et al., 2024)

Jan 06, 2025

The ability to handle qualitative data is an important skill for an evaluator. I have the good fortune to train a lot of evaluators on methodology and to have taught both qualitative and quantitative methods at the undergraduate and graduate levels. I find that the easiest way to summarize the sorts of qualitative methods I teach and commonly encounter is to place them on a continuum from theory-drive to data-driven. I notice that many researchers seem naturally drawn to one of these poles – perhaps its a matter of personality. People endowed with strong beliefs and a sense that the world generally fits together in some logical (if not necessarily rational) way tend to gravitate towards theory-driven approaches. People who see the world as full of paradoxes, contradictions, and devilish details seem to go for data-driven methods. The more qualitative data analysis I do as an evaluator, though, the more I’m convinced we need to do both. This is my main argument for abductive coding versus the common alternatives.

But first: what is abduction?

You’ve heard of deduction. You’ve heard of induction. How about abduction?

Abduction is a form of reasoning that focuses on generating explanatory hypotheses from surprising or unexpected observations. Unlike deduction, which starts with general principles and applies them to specific cases, or induction, which derives general conclusions from specific instances, abduction seeks to identify an explanation for a puzzling phenomenon. In simpler terms, it's detective work: you start with an intriguing clue and work backwards to construct a plausible scenario that accounts for it.1

To illustrate, imagine you are evaluating a program designed to improve literacy rates among elementary school students. After implementing the program, you find, surprisingly, that while reading comprehension scores have increased as expected, vocabulary scores have stagnated or even declined. This unexpected outcome calls for an abductive inquiry to understand what might be happening.

You might start by revisiting the program components and the existing literature on literacy development. Perhaps the program, while effective in enhancing reading comprehension strategies, inadvertently reduced the time devoted to vocabulary instruction. Or maybe the assessment tools used to measure vocabulary were not sensitive enough to capture the specific types of vocabulary gains fostered by the program.

Through an iterative process of considering different possible explanations, gathering further evidence, and refining your hypotheses, you arrive at a plausible explanation that accounts for the unexpected results. You gather more information and decide whether your hypothesis accounts for this. If you there are more surprises in the new data, you refine your theory until there are no more surprises. This kind of reasoning, which seems to me like an integral part of science, was first named by the American pragmatist philosopher CS Peirce.2

Abductive reasoning forms the basis of a form of qualitative coding called "abductive coding” (Vila-Henninger et al., 2024).3 In this essay, I’m arguing that it should probably be your default way to handle qualitative data as a program evaluator. Of course, certain specific situations may call for other approaches, but I think abductive coding should be your first stop. Before I explain why this is the case, let me tell you how abductive coding works.

How to Perform Abductive Coding

Step 1: Generating an Abductive Codebook

Start with a Deductive, Broadly-Themed Codebook: Begin by creating a codebook based on existing theories and the research questions driving your analysis. This codebook should be comprehensive, encompassing multiple theoretical perspectives related to your research questions. For instance, in an evaluation of a temporary housing program for people with criminal records, your initial codebook might include codes for "job security," "housing stability," and "rehabilitation."

Integrate Inductive Codes through Group Coding: Apply the initial codebook to a strategically chosen, diverse subset of your data. During this process, allow for the emergence of inductive codes that capture unexpected patterns or anomalies observed in the data. For example, while coding data on the housing program, you might discover an emergent theme of "interactions with family" that was not initially anticipated. Regularly convene with your research team to discuss these inductive codes, refine their definitions, and integrate them into the codebook.

Finalize the Abductive Codebook: After saturating the inductive coding process, finalize the codebook. Apply this comprehensive codebook to the entire corpus of data, recoding previously coded segments to maintain consistency.

Step 2: Abductive Data Reduction Through Code Equations

Construct Code Equations for Text Reduction: Craft "code equations" by combining codes using Boolean operators like "AND," "OR," and "COOC" (co-occurrence). These equations operationalize complex phenomena that extend beyond individual codes, allowing for a more focused analysis. For example, to investigate the relationship between housing and jobs, you could create an equation like "Housing Stability AND Job Security AND Sobriety."

Verify Code Equations: Meticulously analyze the text excerpts identified by each code equation. Verify whether these excerpts genuinely reflect the phenomenon the equation intends to capture. Document any "false positives" — instances where the codes co-occur but the intended phenomenon is absent. For example, a text excerpt coded with "Housing Stability" and "Job Security" may not necessarily discuss "Sobriety." This verification process ensures the accuracy and analytical rigor of the coding. Revise the code equations as needed based on this analysis, refining them until they effectively capture the intended phenomena.

Step 3: In-depth Abductive Qualitative Analysis

Inductively Code Reduced Cases: For the text segments identified as valid instances of the phenomena under investigation (those matching the verified code equations), develop a further layer of inductive coding. This step goes deeper into the nuances of the identified phenomena, capturing granular details that may not be apparent at the broader code equation level.

Conduct Manual Qualitative Analysis: Engage in in-depth qualitative analysis of the coded data. This stage involves interpreting the findings, identifying patterns, and constructing theoretical explanations grounded in the data. This process is iterative, demanding careful consideration of the codes, the relationships between them, and their theoretical implications. This step culminates in the articulation of research findings, contributing to theory development and refinement.

Abductive Coding is a Good Default for Evaluation

In evaluation, we usually don’t start coding data with no idea of what is going on. We’ve already consulted the literature, talked to program staff and participants, analyzed statistical data, and toured the facilities. It is silly to pretend we have a tabula rasa level of understanding of what is going on. It gets even sillier as we gain experience in the field. Thus, it makes sense to start with a hypothesized codebook.

However, we don’t want to miss anything that would defy our expectations, so we need a procedure to revise that codebook and capture surprises. In fact, we want to draw special attention to surprises, because they surprised a knowledgeable evaluator. This is why we want Step 1.

Next, the reason we are doing qualitative data analysis in evaluation is always to answer an evaluation question. I regularly end up with 300 pages of transcribed interviews that need to turn into an answer to a question that stakholders asked. This requires data reduction. We need to determine what sorts of ideas in our datasets fall into the which categories and the relationships between those categories. Code equations formalize those relationships at the highest level.

However, we also need to make sure we aren’t building castles in the sky. The code equations we end up with have to made sense in terms of the original data. So much for Step 2.

Just arriving at a series of categories that have a certain number of coded tokens in them is not usually very helpful for stakeholders, as in: “We found 20 instances of people complaining about the program and 25 instances of praising the program…” We want another layer of inductive interpretation within each category. Doing this makes writing about qualitative data much easier. We also want to look for patterns holistically among our data and extrapolate to human-level interpretations of what is going on here. I like to think of this as the “narrative layer” of the analysis, at which we tell the story of the data and analysis. People who are more skeptical of methodology usually just start at this step without the previous analysis, weaving stories about what they see. In high-stakes scenarios, like using qualitative data to make recommendations about the future of publicly-funded programs and policies, I would argue that we want these narrative layer analyses to be conditional on a systematic coding process provided in steps one and two. Thus, Step 3.

To understand the benefits of abductive coding, consider some alternative qualitative coding methods. Grounded theory is a very popular method for evaluators to say that they are using, but as basically everyone who is serious about grounded theory has pointed out by now, it is a sophisticated method and most people do it wrong.4 The main problem with grounded theory from an epistemic perspective is that is asks us to forget what we know and try to be purely inductive in order to build theory. However, this is involves throwing away potentially important information, including our background knowledge as seasoned professionals.

Another alternative I commonly see in evaluations would be “open coding” using a framework like qualitative content analysis (QCA). This is more lightweight framework from a theoretical perspective and it doesn’t make as many demands in terms of strict induction. We start coding the data, create a category for each new kind of token we find, and keep going until we have a full listing of all the categories we need to describe the data. We may do some cleanup on the categories in the name of parsimony, but that’s the basic idea. The issue here is that we tend to end up with coding frames that perfectly fit the data like shrink wrap, with no codes for anything that isn’t there (but perhaps which we might have expected) and no sense of the appropriate level of generality of the codes.

Abductive coding gives us the right might of theory and empirical attention to detail. It puts a special premium on discovering surprising insights and telling us things we didn’t already know about the data. It has multiple rounds of quality checking built in to the process. It helps scaffold the writing process of moving from “coded dataset” to results. I think that these features would help improve most of the evaluations that I read.

If you have a preferred qualitative coding method for evaluation data, what is it? If you’ve tried abductive coding what was your experience? Drop me a comment or a direct message.

In this post, I’m talking about abduction in the original sense that CS Peirce used it. In contemporary philosophy, you will sometimes hear abduction defined as “inference to the best explanation.” That is an interesting kind of inference, but not the on I’m talking about today, and not the kind that abductive coding is about.

The collected works of Peirce are in the public domain. You can read about abduction here.

Dr. Vila-Henninger and colleagues have made their paper on abductive coding open access, so you can read a detailed version of the method I am summarizing for free here.

I highly recommend reading the original 1967 Glaser and Strauss The Discovery of Grounded Theory to everyone who is passionate about qualitative methods, particularly before having any public opinions about grounded theory. The original text will surprise you in ways that I hope you can abductively accomodate.

Program Evaluation

Discussion about this post