Can a Mathematical Model Detect Fake News? Two Penn State Professors Want to Find Out

Penn State professors Dongwon Lee and S. Shyam Sundar initially wanted to know more about the fake news epidemic. Now, with $300,000 in funding from the National Science Foundation, the duo is working to build a model that could be used to detect fake news before it hits social media newsfeeds.

As part of their interdisciplinary research project, the Penn State professors will use a dataset of news stories that have been verified as either legitimate or fake. Lee says he and Sundar assume stories that have been debunked by sites such as Snopes and Politifact are fake, while stories from major outlets such as the New York Times and the Washington Post are considered legitimate.

Sundar is taking the lead to identify traits in the articles that determine whether they’re legitimate or fake. Lee will then break down the articles by their traits so that a supervised machine learning algorithm can be trained to pick up on when an article might be real or fake.

Lee tells EdSurge that some of the earlier news coverage of his and Sundar’s project confused some readers. The researchers’ goal is not to build a physical device, but rather an algorithm that will catch fake news articles. They also do not intend, nor have the power, to delete fake news articles from the internet. The Collegian, Penn State’s student newspaper, reports that during a recent presentation, Lee “amusingly recounted” that “each article detailing the $300,000 grant” to study fake news became further distorted.”

Melissa Zimdars is an assistant professor of communication at Merrimack College who’s currently teaching a class on fake news. She also works on OpenSources.co, which says on its website that it provides “a continuously updated database of information sources for developers to leverage in the fight against fake, false, conspiratorial, and misleading news.” Last year, her list of “False, Misleading, Clickbait-y, and/or Satirical ‘News’ Sources” went viral.

Zimdars and her colleagues have analyzed almost a thousand different websites to determine if they’re fake news, misleading or highly partisan. But it’s a time consuming effort that doesn’t always pay off—she says about a third of the websites they’ve spent hours analyzing have changed domains or no longer exist.

“It’s like playing whack-a-mole,” Zimdars says.

If an algorithm like what Lee and Sundar are working on could automate that work to detect patterns in fake news, she says, “clearly that process will go much faster.”

Zimdars says detecting fake news via automation could have several downsides, though. Designers of fake news websites could figure out what algorithms are looking for, and adjust how their content is created to get around those systems set up to prevent their circulation.

Lee says the problem of reverse engineering is a fundamental limitation of any machine-based learning approach. He points out that if the mathematical model he and Sundar are working on is deployed into the real world, they would have to continue updating the system as the creators of fake news content keep learning tricks.

Another concern Zimdars has is how a lot of websites that aren’t categorized as “news” occupy a gray area. “It’s really hard to determine overall that a source of information is fake news, or if they sometimes circulate fake news articles but other times circulate maybe partisan information,” she says.

Lee says if the algorithm gets something wrong and labels a legitimate news source as fake, it would be “quite detrimental.” The solution is to tune the algorithm so it doesn’t become too aggressive. He says he and Sundar will eventually have to handle ambiguous cases, but for now, they’re aiming at obvious legitimate and fake cases.

Zimdars sees another issue with using automation to detect fake news; one that is arguably out of Lee’s hands: Unless major tech companies “buy into” it, automation probably won’t have the desired effect.

“Studies are already showing that finding, flagging and categorizing barely makes a dent,” Zimdars says. “The problem is that once it’s circulated on Facebook, there’s kind of no going back from there. So you have to prevent it from circulating in the first place, it can’t really be an after-the-fact situation.”

She thinks Facebook and Google already determine what people see and don’t see in their newsfeeds—they just need to do so in a better and more transparent way.

“If there are known web sites which could be determined through this automated detection that are repeatedly circulating fake news or propagandistic information, they should be pushed down in what’s showing up on our feeds.”