Why ‘Black Box’ Software Isn’t Ready to Teach College

Algorithms will one day play a pivotal role in teaching and learning, so colleges need to pay close attention to how they work and who builds them.

That’s the argument of Candace Thille, who has a love-distrust relationship with adaptive software. She pioneered the use of adaptive learning in college teaching, starting the Online Learning Initiative at Carnegie Mellon University more than 15 years ago, and she sees how powerful it can be in a classroom. But she has real ambivalence about how adaptive learning is moving from the laboratory to the classroom.

Specifically she worries that some companies have made their products a “black box” that professors and researchers can’t understand or control. Just as professors wouldn’t bring in a teaching assistant they couldn’t talk to or collaborate with, they shouldn’t adopt interactive tools that are mysterious and locked.

Thille, an assistant professor of education at Stanford University's Graduate School of Education who previously worked in the private sector, stresses that she’s not against software companies. But she does worry about them gaining too much control over the educational algorithms of tomorrow.

EdSurge sat down with Thille last week at the ASU+GSV Summit, as part of our Thought Leader Interview series on the future of education. Below is an edited and condensed version of the conversation, or watch the complete interview.

EdSurge: You were doing adaptive learning before it was cool. What got you into this?

Thille: The Open Learning Initiative, which is the project that I founded at Carnegie Mellon, started with a desire from the William and Flora Hewlett Foundation to get access to high-quality post-secondary education to those who would not otherwise have access. Their first big project in that space was MIT’s Open Courseware. That was way back in 2001. And they came to Carnegie Mellon looking for other schools that would be interested in putting all of their course materials online—to drive that open access.

Well, two things: At Carnegie Mellon we were never going to do anything that's a derivative work of something MIT is doing. But we also had a different idea to propose. So Mike Smith and Cathy Casserly [from the Hewlett Foundation] came and listened to the idea, and they were not interested—they were about to leave—but the day they were visiting was September 11th, 2001.So they ended up spending three days in Pittsburgh that they hadn't planned on spending, which gave a lot of time to talk through and really understand what their interest was. And their interest was to find some way to provide access to high-quality education.

And our argument was that high-quality education is more than just putting more materials that support education online and making them available. That actually there's a lot of pedagogical knowledge that has to go into how you teach someone something. And so we came up with the idea of the Open Learning Initiative, which was to take the years, decades of research that had been done at Carnegie Mellon in intelligent-tutoring systems, and blend that AI perspective with disciplinary experts with the idea of the open content to create the OLI courses.

We didn't say we were going to put all of Carnegie Mellon's courses online, but we started with just four. And so we started collecting the data, actually, because we recognized that we would never see the learners that were supposed to be benefiting from this. So we wanted to just get some insight into, "Is there any evidence that these environments that we're carefully constructing based on what we know about human learning are actually supporting learning?"

We started collecting the data for that purpose, and then about that same year when we started OLI, we applied for an NSF grant for a science and learning center. And the recognition that this data that we're collecting is not only good for understanding whether the environment's supporting individual learners, but also it can give us insight into human learning.

More recently you have raised concerns about that the way many colleges are doing adaptive learning. You’ve said you’re worried that companies are creating “black boxes” that colleges can’t learn from, and that it could lead to losing control of an essential aspect of the teaching process. Can you talk more about that?

Thank you for asking that question because I think oftentimes when my work is presented in the press, sometimes it is argued that I'm anti-business—that I think the commercial sector shouldn't be in this space. And a lot of people don't realize that before I became an academic, I worked for 18 years in the private sector, in a strategy and management consulting firm. So I'm very supportive of business. I understand how businesses work.

The other thing that people often say to me is, "Candace, you're just thinking about the unethical startups out there, and there are a lot of startups that are incredibly ethical. So you shouldn't paint them all with that unethical brush." And my response to that is, I completely believe that most—99.999 percent—of the educational-technology companies out there are ethical. That's not a question to me. And I believe they are truly trying to create products that support learners and attend to challenges in education.

But I also know from being in the private sector that the number one driver has to be, when you're a for-profit company, returning shareholder value. And if you are not taking that as your primary driver, then you are behaving unethically.

So it is not that I don't think people are ethical, it is that I think there might be a conflict between the primary ethic that has to drive you when you are returning shareholder value, and meeting the needs of the public good. It's sort of the reason that when we developed blood banks, the decision was made years ago not to pay people to donate blood, not to have them be for-profit entities, because the state of need of the public good was a bit in conflict with what they would become if they were totally driven by markets.

So do I think that markets have no place in this? Absolutely not. I think that there's going to be a big and robust place for the commercial and private sector. I just think that we haven't figured out yet a functional and productive relationship between the private sector and not-for-profit higher education to really drive the development of these technologies in a way that will maximize the public good.

It seems part of your argument is that the algorithms being built for adaptive learning are going to be part of the academic core in the future—and so you worry that colleges are outsourcing something that is their primary mission when they work with adaptive learning companies.

That is something else I got from being in the private sector, is one of the core tenets of any business is you do not outsource your core business process. And I think we have, in higher education, gotten into a pattern of outsourcing material development. Like textbooks—that's a classic example. And people are perfectly fine developing their courses, buying large textbooks, or asking your students to buy large textbooks, and then just selecting stuff out of that that they're going to use as the foundational material to teach from.

So at that point, when you're using textbooks in that way, the pedagogical decision making is still being made by the institution or by the faculty member. The challenge with these new environments is that they are, in essence, making pedagogical decisions. So if you design an environment to support a learner to achieve some outcome, and then you're collecting the way the student is interacting with that asset, collecting that data, and then hopefully doing something more sophisticated than just, "Did they get it right or wrong?" but actually modeling it in an appropriate way to make a fairly accurate prediction about the learner's knowledge state, then you can do one of two things with that piece of information if you're the system developer.

You can take it and feed it back into the system so that the system can start making autonomous decisions about what's best for the learner next. Or, you can do what we were doing at OLI, which was take that information, model it, and then present it to the faculty member who's teaching the course so they can get insight into their learners or their class's knowledge state, and then use that to make a pedagogical decision.

Whichever way it is, essentially the system is making a diagnosis. It is making a pedagogical decision. And the science is such, I would argue, that to make really good diagnoses, and really good pedagogical decisions, the people are probably going to shout at me for this, the science isn't robust enough yet to modify that.

And what I mean by that is, learning is really complex. We think about recommendation systems, like from Netflix or Amazon. And I don't know, I'd be interested in the audience out there, how often you think that they get it right.

What are your hopes for the technology, if done the way you think is best?

Our brains are not really good at handling multiple dimensions of information to try to make a decision. Our brains aren't that great at handling real complexity. So we manage complexity by reducing it to something we can manage, so we can represent it to ourselves in a way that we can manage it.

One of the big powers of the new data-science tools is a computer can manage many, many, many, many more dimensions of complexity, and represent it in a way that we can understand, so we can manage the complexity without having to reduce it. But that decision about how you go about taking many, many, many, many, many pieces of information and creating some kind of model that makes it manageable, that is a decision. And that's where a lot of bias can be introduced into these algorithms.

A lot of people make the mistake of thinking, "Oh, if a computer's doing it, it's objective, it's unbiased." No. It's going to be biased, either by the person who made the design decisions in designing what features to put into the algorithm, how to weight them, which algorithms to use. Those are all decisions that have an impact on what that algorithm's gonna produce.

So it seems like if the stakes are high, and the problem is so hard. Is it still just too soon? Do you think there just needs to be more research before we get to products?

That isn't to say that we should stop making products. It's just that we shouldn't make products with false assurances. And we should acknowledge that this is a really active and interesting frontier of science we're in. And so that's why I said what I think we need to do is figure out a more functional productive relationship between not-for-profit higher education, commercial developers—and even a much more functional relationship between researchers and practitioners within not-for-profit higher education. And I don't think we have that figured out.

But to any edtech developer out there, any researcher out there, any practitioner out there, I am interested in talking to you so we can try and figure this out.

We've all had a great teacher—someone who may have even changed our lives with some inspiring moment in the classroom. Do you think in some distant future, that someone might have one of those types of experience from an AI adaptive learning machine?

I guess it depends on the experience you're talking about. The other thing that when people write about my work, they often put as a big headline, "She's trying to replace us with computers." And the "us" being faculty or teachers. And I just want to assure everybody, I am not trying to replace anybody with a computer.

When I think about designing a learning environment, I think about that there are multiple resources you can bring to bare. There's what a learner can do with a computer, and I'm trying to make that as good as it can be. There's what a learner can do with their peers. There's what a learner can do with an instructor, or an expert, or a human who knows more than they do who's trying to support their learning.

All of those resources have their affordances and their limitations. And so when designing a learning environment, I always try and think about how can we blend those affordances to best support that learner or that group of learners in that context at that point in time?

And the technology piece should be designed not just to support what the learner is doing with the technology, but also support the other human actors in the system. So giving the expert or the teacher or the faculty member or the caretaker the best information we can give them so that they can best do what they do best to support the learner.

Do I think that a computer will completely take away from any need for any kind of human, whether that be a peer or a teacher? No. But I do think that we can use the technology to make the role of the other humans much more effective.

I will say, though, that when we were designing the OLI courses back when I was at Carnegie Mellon, one of the things we did was really unpack when a student would make a move, to try to make an inference based on what we collected about their knowledge state at that point in time. And we could make a pretty good inference about what they were thinking when they made that incorrect move. So we would create these alerts and then give them feedback that would say something to the effect of, "Oh, so this part of your thinking was square on. And you were doing this because of this. But you probably did this because you were thinking that. And this is where that's not what you should be thinking about this." And I would have students come up to me and say, "How did the computer know what I was thinking?"