What Most Education Startups Get Wrong

Companies that define a few key metrics, then iterate and optimize those metrics, will be more likely to achieve success. At least that’s how the conventional wisdom goes. Many prominent startup investors advocate measuring categories such as growth, engagement, and financials. Jeff Jordan, a partner at Andreessen Horowitz summarizes this sentiment:

Good metrics are about running the business in a way where founders know how and why certain things are working (or not) … and can address or adjust accordingly.

But for education technology companies, conventional metrics are insufficient. None of the standard measures of growth, engagement, financials directly measure student learning outcomes, and many education entrepreneurs have only a vague notion of exactly how their tools impact the students they serve. Why build a product for classrooms without knowing if it improves education and provides more learning opportunities to the students?

As education entrepreneurs, we need consistent, standardized measures of educational outcomes to help us make informed decisions and optimize for student outcomes. We need to report and reflect on them just as frequently and vigorously as with other core outcomes such as revenue and engagement. Companies should answer these two questions:

When students use your program, do they do well on measureable learning outcomes?
How much of that performance can be attributed to your program?

These are hard questions. It has become easier in recent years as schools have collected more data. Standardized tests may not always be a complete or accurate measure of efficacy—but when used carefully, they can be a useful, consistent indicator of a tool’s impact on student learning. I propose two key metrics that most learning products can track and report on.

Let’s start with the first question — are students that use a program learning any better?

1. Track shifts in median performance percentile

Performance percentiles are a commonly used measure of how well students are learning. A student’s percentile indicates how they are performing relative to their peers. For example, a student at the 45th percentile in math is scoring higher than 45% of students nationwide. The median student, by definition, scores at the 50th percentile.

Median percentile growth is a specific, broadly useful metric for measuring overall academic progress for a set of students. If the median percentile in a district grows year over year, then it means the students in the district learned more than their peers nationwide over that period. If the gains are large and sustained, then it is a good indicator that the education those students are receiving is improving (although by itself, percentile shift does not tell you why the district saw that growth).

Districts and their partner vendors are increasingly using median percentile growth to understand what works in their district. Districts that require students to take standardized tests—such as the widely used NWEA MAP, and newer assessments like PARCC and SBAC—can use that data to understand the impact of programs they adopt. Companies that create those programs can use the data to understand whether they are having the positive impact that they are aiming for.

2. Use students as their own control group

Students can see an increase in scores for a number of reasons—such as better teachers, new facilities, or an innovative technology program. As entrepreneurs, how do we know whether changes in student learning can be attributed to a specific program as opposed to other factors?

A controlled experimental study is an effective, if somewhat impractical, way to distill causality. In such an experiment, students are randomly split into two groups; one receives a program, the other does not, and their results are compared. However, these studies are impractical in many cases. For instance, if every student in a school adopts a program, then students in the control group have to miss out. While the controlled study is ideal and necessary, the results are not as consistent and regular as we need to inform product development.

Companies whose products address a subset of the curriculum may able to measure impact on student learning without relying on a control group. At eSpark, we analyze how students perform in one or two of the Common Core subject domains, while the other domains are covered in class. We can then measure the percentile gain in these focus areas, versus the remaining curriculum.

For example, a third-grade math student may work on just Fractions and Measurement & Data, but not with other knowledge domains (Geometry, Operations & Algebraic Thinking, and Number & Operations in Base Ten) specified for that grade level. In this case, we’d look at the relative difference in performance between Fractions and Measurement & Data (the target domains) with Geometry, Operations & Algebraic Thinking, and Numbers & Operations in Base Ten (the regular domains). If students score higher on their target domains than their regular domains, that suggests that a particular tool or new instructional approach may have had a positive impact.

Measure, rinse, repeat

As an industry, we need to better track and rigorously report on the educational outcomes of our products. We should report these to our employees, investors, and customers, to remain aware of the ultimate goal of student impact.That way, we can optimize for the right thing, and ensure that the reality of educational technology lives up to its promise to improve opportunities for students. Only then can we build products that truly change the world.