Audrey Watters, scholar and author at Hack Education, once told me that if an email response requires more than 500 words, hit PUBLISH. So here I am…

Thing is, I’m not quite sure if I’m answering the right question. That should bother me, but it doesn’t, because much bigger issues around the question have been nagging at me since I started this response last week. So, while I wait for clarification of the intent of the initial inquiry that sparked my little internal dialog, I’m just going to continue with what’s perplexing me.

This current email not withstanding, I’ve had a number of conversations lately about whether studies exist to link X (some form of edtech) with Y (some measure of “student achievement”). Really, this is not an unreasonable request. Billions of dollars have been invested in the technology industry, so does it work? Well, herein likes the rub.

Short answer: I have no idea.

Since I hit publish, obviously there’s a long answer coming.

X -> Y

Asking the question, does X lead to Y implies a relationship between inputs and outputs. For example, a recent Brookings article asked if E-rate funding (input) led to improved student outcomes in the form of test scores (output). Similarly, the 2015 OECD report examined the relationship between access to technology (input) and PISA scorers (output). The email that sparked this post asked if any specific plans/curricula with technology (input) led to increased student outcomes (output). As I said, I’m, still waiting on clarification so I am not sure if this person associated outcomes with test scores, but I’m making that assumption for now.

At the crux of this discourse lies an often overlooked component of these sorts of questions: what happened between X and Y? Leviton and Lipsey refer to this as “Black Box” treatment theory – the assumption that while an input goes into a black box and an output emerges, what’s most important is what happened inside. To understand that requires two things: a theory of treatment to explain what should happen and a logic model to document the intended process.

Let’s take personalized learning as an example. I’ve been reading about it in the form of the effects of blended learning, Artificial Intelligence, learning analytics, and adaptive platforms. If personalized learning is the solution or the input… what’s the problem it’s trying to solve? (I wish that my blog had sound effects because this is when you hear the screech of a record needle.) This lack of problem-definition is actually the first big issue. For the moment, let’s assume that test scores – serving as a proxy for “achievement” – is the problem. Would raising test scores actually solve the problem intended to be addressed by personalized learning? I’m going to let this philosophical, yet critical question, hang for now. This post is about measuring the effects of edtech, so let’s get back on track.

So, why make the assumption that personalized learning solves the problem of raising test scores? in other words, what’s the theoretical underpinning? What framework would support this theory of change? Since we are working off of a hypothetical, let’s go with Universal Design for Learning as our hypothetical theoretical framework. If learning is universally accessible then students have the scaffolds and supports to develop as expert learners; and if they work to become expert learners, then they gain the knowledge, skills, and attitudes to demonstrate their learning and understanding which would manifest as “improved achievement” (aka. test scores).

Hold on a second… are test scores the output or is achievement the outcome? In which case, shouldn’t “achievement” be defined more holistically to include the tenets of expert learning? Because, could it be possible that test scores might go up before a student really develops into an expert learner?

Correlation & Causation

Now, let’s stop here for a second and think about how relationships are represented in research. Broadly speaking, researchers talk about correlation and causation. Correlation refers to the idea that a statistical relationship between two items somehow exists. Broadly speaking, these relationships are illustrated by scatter plots that show a clustering of data points that infer some form of relationship. However, as Edward Tufte has famously illustrated, a a recent article from John Jerrim at University College London beautifully illustrates, correlations can be somewhat absurd. As a warning about the upcoming release of PISA scores, Jerrim explains the need to critically analyze any correlational data. To make his point explicit, he illustrates the comical – yet correlational – relationship between eating fish and ice cream with increased PISA scores.

Correlational data has also created issues for those of us in the edtech space. From the statistics discussed in Dr. Jean Twenge’s IGen linking the rise in screen time to teenage depression rates to recent articles associating screen time with decreasing NAEP reading scores, these reports can instill panic and mask deeper understanding of problems and solutions. And yet, though I haven’t run the numbers myself, consumption of ice cream and fish could also present similar correlations, so this data needs to be viewed critically and from multiple perspectives.

Now, equally problematic is the constant quest for causal evidence of effect. Researchers often present these claims as “statistically significant.” This push largely comes from policy associated with federal legislation calling for teachers to use “research-based practices” and the establishment of Randomized Control Trials (RCTs) as the gold standard of research design. Like with correlation, there are issues with the idea of proving anything in social science. As educators know all too well, no two students or classes are the same. What works for one may not work for others. Beyond this knowledge from experience lies the reality that it is not possible to completely mitigate all of the external factors that influence education research.

Let’s go back to our hypothetical study of examining the effects of personalized learning on student achievement (aka. test scores). If an improvement can be detected by some measure, did personalized learning CAUSE that increase or was the increase an EFFECT of the personalized learning? This is where the lightbulb comes in.

A professor explained the problem of making causal claims in social science using this metaphor. If you flip a light switch, the light turns on. Did the switch CAUSE the light to turn on, or did the light turn on as an EFFECT of flipping the switch? A basic understanding of electricity would have you choose the latter option. The switch opens an electric circuit that sends a signal to the lightbulb that then illuminates. Yes, that’s an oversimplification, but it does illustrate that light is an EFFECT of the switch – not the other way around. With education research, a counterfactual – or an alternate explanation – is not available. Therefore, we cannot with 100% certainty claim that it was the program itself that caused the measured effect.

So now, let’s say that the change resulting because of some program – oh wait! New issue. Let’s talk about what’s being compared to measure the EFFECT. If I design my own study with my own sample, which may or may not be representative of any other sample of students, then I could do some form of pre and post-tests to look for a change over time. More interesting is the larger studies like PISA and NAEP that may use similar demographics of kids (i.e. all fourth graders) but not the same, exact, kids to make claims between different measurement periods. Hmm… Let’s go back to the challenge that not all kids are identical. So in reality, if there is a variation in scores between two points in time, whether statistically significant or not, whether the same sample of kids or not, we still have a remaining question before we can even ask whether personalized learning led to achievement: what happened inside the box?

Into the Box

Let’s work through the logic of our hypothetical program designed based on the theoretical framework of Universal Design for Learning. This logic model is far from complete but will give you an idea…

  • Assumptions:  students develop as expert learners when they experience learning that has been personalized in terms of representation, action & expression, and affect & motivation
  • Inputs: some AI platform, learning analytics, content, resources, teachers, etc
  • Activities: students use the technology, students read & discuss & do things, students take formative and summative assessments, teachers use data to adjust instruction
  • Outputs: test scores, projects, presentations, conversations, data dashboards, etc.
  • Outcome(s): “achievement”, develop as an “expert learner”

Now let’s look at all of the items that could be measured inside the box:

  • How the content culturally responsive and multicultural?
  • How do students have equitable access to the content and curriculum?
  • How are projects and activities meaningful and authentic to the student?
  • What is the ratio of time spent with technology as compared to time spent on collaborative, social, student-centered tasks or activities?
  • How does the teacher use formative data to inform instruction and differentiate/personalize to meet the needs of the students?

All of these questions then become necessary in order to evaluate the process of implementation as well as to explain the results of any outcome measures. Back to the question that sparked this post… Does any form of educational technology, or curriculum associated with technology, lead to improved student outcomes and achievement? I’m not sure. Similarly, could screen time be a “cause” of decreased NAEP or PISA scores? Maybe. But so could ice cream, or political climate, or natural disasters, or school safety, or lack of available professional development, or teachers using technology – or any other program – in ways that were unintended by the design and thus of low quality, or fish, or any other host of things.

While all of this was swirling in my head, I stumbled on an article from a team of esteemed scholars calling for researchers to Abandon Statistical Significance. I think that it’s time that, as researchers, we really work to document reality in context and then present it in a way that helps teachers and leaders to transfer and adapt what they learned from those studies to their own districts, schools, and classrooms. If, as researchers, we really want to help improve practice, then it’s time that we focus less on the outcomes and more on the process that allowed us to get there.

Can't find what you're looking for? Try a search.