It all started with a seemingly playful personality quiz called “thisisyourdigitallife,” coded by a researcher named Aleksandr Kogan from Cambridge University. It has evolved into the most prominent data scandals in Facebook’s history, after personal information from some 87 million Facebook was passed along by Kogan to the political consulting company Cambridge Analytica.
Kogan, apologized Sunday in an interview on 60 Minutes, saying he was unaware of the implications of his actions. But plenty of other academics said they saw the potential for such abuse clearly, and that this could have been avoided if Facebook had spent more time consulting privacy experts and scholars.
One of those scholars is Jennifer Golbeck, a professor at the University of Maryland's College of Information Studies, who has been talking about the privacy risks of Facebook data for years. In a 2013 TED talk, she warned that algorithms can deduce a range of other traits from just a few Facebook “likes,” and that privacy policies and laws need to be reformed to protect users in today’s digital era. Today, she feels like a Cassandra, except that people are starting to listen to her more carefully.
Golbeck has plenty of suggestions for the social media giant, and she hopes the company will turn more often to researchers and privacy experts to advise them on how to better protect users.
EdSurge sat down with Golbeck last week as part of our EdSurge Live video discussion series. She predicted that hundreds, if not thousands of educational apps might have grabbed extensive data from Facebook users before the company changed its policies several years ago. The conversation has been edited and condensed for clarity.
You can listen to a complete version below, or on your favorite podcast app (like iTunes or Stitcher). Or watch a video version.
EdSurge: How did you get into studying issues of data privacy and social media?
Golbeck: I do research, and I build these algorithms that take your social media data and find out things about you, which is what Cambridge Analytica was doing. They cited my research papers which kind of creeps me out.
But as I was doing that work, probably in like 2011 or 2012, what we found is that anything we wanted to find out about people (we could).
You mean by just looked at what they “liked” on Facebook and guessed about other factors from that?
In the TED talk as you mention, I cite this study that was actually done at Cambridge by some of the people who overlapped in the Cambridge Analytica space. Using Facebook “likes”, they can find out personality traits, political preferences, demographics like race, religion, gender, sexual orientation; and behavioral things like whether you drink, or smoke, or use drugs. All of this just from “likes.”
But it's not only “likes.” We can find these things out by analyzing the structure of the text that you write. So not the content, but the verbs and adverbs you use, and what do your sentences look like? There's a great study that uses your Instagram profile picture to diagnose if you're clinically depressed. Basically any data source that we found has had enough signal that we can identify these kinds of attributes.
And that's super powerful because you can't hide from them. It's not like this logical connection. It's very statistical.
You mention in your TED talk that you can tell a person’s intelligence level by whether they “like” curly fries. How does that work?
In that Cambridge paper that I mentioned where they were looking at the “likes,” they listed the four “likes” that were the strongest predictors of intelligence. And they were “liking” the pages for science, thunderstorms, the Colbert Report, and curly fries. Yeah, and you go like, "Why curly fries?" And the answer is that we don't know. But what it is, on a bigger level, is that when we like stuff on Facebook, it's not just these inherent personal things about us, it's a social process.
Even if you love curly fries, you might not go say, "Oh, let's go find a curly fries page so I can ‘like’ it." You come across it because a friend “likes” it, or you stumble into it, and then you do that. And these algorithms can pick up these little remnants of social interaction. They don't actually see your network interaction, but they see the things you do because of it, and that allows them to make these inferences that seem completely illogical. But essentially they're pulling up little traces of social interaction. And you can't hide that.
So that made me super concerned. Because obviously I think there are good ways these algorithms can be used, that's why I make them, but obviously we have seen finally that people are going to use them in really bad ways, too. So I did that TED talk in October of 2013 and I feel like I have been kind of screaming, not necessarily into the void because a lot of people have listened to me for the last four years. But the last month has really been this Cassandra moment for me—I've been saying this was going to happen, and now finally people have to believe me.
Yeah, as the world was watching Mark Zuckerberg testify before congress, they were asking him questions like, ‘What should Facebook do to better protect users?’ But it sounds like folks at Facebook knew this was possible, since they probably read your research and others like it before this incident came to light.
Oh, absolutely. I was building Facebook apps at the same time that this stuff was going on, where Cambridge Analytica was grabbing all the data that we've seen in the news, what has been called this data breach. I was building Facebook apps to do the same things but with university ethical approval, and really strict rules about how to use the data. And Facebook had rules that Cambridge Analytica violated.
So they said if you're an academic, you collect this data, you're not allowed to hand it over to somebody else. I followed those rules. But there's no way they would know if I followed those rules or not. I remember thinking at the time, "It would be so easy just to grab this massive amount of data, Facebook would never know that I kept a copy of it, and I could do whatever I wanted with it."
So, we're hearing about it now with Cambridge Analytica, and they definitely broke some rules, but I'm sure that there's hundreds of companies, if not thousands of apps, that did the same kind of thing. The smart people at Facebook obviously have thought about this. And every time something happens, they change the rules and fix it, which is good that they're fixing the rules, but they haven't really been as proactive as they need to be in thinking about what are the ways that people could misuse the data that we're handing over to them. And I think that has been irresponsible of them. They don't hold all the responsibility for people breaking their rules, but you have to think about that if you're the stewards of all this personal information.
But since you are a researching using Facebook data, I’m guessing you probably wouldn't advocate them cutting you and other researchers off. How can Facebook protect users while also kind of allowing genuine research to happen?
Yeah, this is such an important point because it's a real risk of the moment that we're in: that they might just cut off this access. You know, researchers aren't the problem. So this guy at Cambridge Analytica broke the rules, but generally we're not the ones that are collecting all of these huge amounts of data, and we're not making any money off them. All we want to do is science with it. Facebook, Twitter, they have their own research teams internally that are doing this, but if you only have the research done internally, you're not getting this critical eye to it.
So, sure there's some risk in having academics have this data, but we are all governed by very strict ethics rules within the university. I need a board to approve every experiment I do with personal data. It has a schedule for when I destroy it. It requires me to say how I'm going to encrypt it, what computer it's going to be on, who's going to have it, is the office going to be locked where the computer is? I mean, it really gets detailed about protecting that data.
I mean, there's a very strong infrastructure in place that grew out of really terrible things like the Tuskegee experiments, right? Non-consensual medical experiments. That's the foundation of our ethics rules.
We are less of a worry than say, random companies who are building apps who you're going to give the data to, who could then sell it and aggregate it and do all these other things. So cutting researchers out is sort of like the worst step because they're the ones who are going to figure out what are the bad ways that this could be used, and what are the good ways it could be used that's not just making money for companies?
We've been trying to figure out how many educational apps might have done what you and others did with the data that Facebook allowed several years ago that also allowed the Cambridge Analytica to do its data grab. Do you think there are educational apps out there that might have been doing some of this as well?
I don't know any in particular, but I wouldn't be surprised. The concerning thing wasn't so much that if you installed my app I could grab your data. You knew that something was going on there. But the problem was if you installed my app, I could grab all of your data and all of your friends’ data. And I used to show this to my students to be like, "Hey, you guys, you gotta be careful about this." But it was so easy to do. I mean, it was literally one line of code, you could grab all of that.
I would not at all be surprised to find out that some educational apps were grabbing that friend data and using it, maybe not in a super-sophisticated way, but using it to do something to personalize their known user's experience.
But that, of course, means then they're holding data from lots of people who never consented for them to have it. So hopefully all these apps are going to start going back through their data and figuring out how they got it and cleaning up some of that stuff that they really shouldn't have.
And what do recommend that Facebook do, in your ideal world?
Oh, I've got so many to-dos for Facebook. Let me give you like a top three that come to mind first:
One is that I think posts should be able to have an expiration date. I actually think ephemerality, if that's a word, is super critical going forward with social media. And we're seeing a lot of people shift to using things like Snapchat for a lot of interactions, where the stuff goes away. I actually do this in my own social media use. On Facebook I use a tool called Facebook timeline cleaner. Every two weeks I run it, I delete anything that's more than two weeks old, so there's not much data there. None of these algorithms work one me.
If you look at social media posts that you wrote six months ago, they're usually mortifying, especially on Facebook. I think letting people see the data that Facebook has about them, and insist that it be deleted, which is something that Europeans are gonna get in the GDPR. I think that's really important.
And then a third thing is that I’d really like to see [Facebook] have a board of technically educated advisors on these privacy issues. To really push them, to be like, "This is a bad idea, I don't care about how it affects your bottom line, I care how it affects your users and their privacy, and you need to do better here.”
Because it seems from the stuff that they've been saying that they really haven't been considering how could this be misused? How is this gonna be perceived by people? And if they're not considering that well enough internally, get together a board of people like me—people from other fields—to come in and really have those conversations and talk to their developers about it, so they can't have any more claims of like, "Gosh, we didn't know that this would upset people.” or, "We didn't know that this would be bad."
I'm happy to volunteer to be that person.