In higher education, predictive analytics often draw from data that an institution has readily available about its students: grades, attendance, online school-related activity, and even historical and demographic information.
But one university is trying to incorporate unstructured data—in particular, college admission essays—to predict how likely a student will persist and graduate on time.
“We always felt there could be intelligence in narrative data that we weren’t tapping,” says Ron Mitchelson, provost and senior vice chancellor of East Carolina University.
East Carolina University is working with IBM’s artificial intelligence platform, Watson, to explore how natural-language processing technologies can enhance their existing efforts to increase graduation rates.
According to the National Center for Education Statistics, 37 percent of full-time students graduate from ECU after four years. And the school’s overall retention rate, meaning first-time students who return to continue their studies the following fall, is 83 percent. Mitchelson says the school hopes to bring that number up to 85 to 90 percent.
The university already looks at student data such as grades or how frequently a student interacts with Blackboard, the school’s instructional platform. These systems capture structured data that Mitchelson believes the school is “probably well equipped to analyze.”
But when it comes to more nuanced, language-based data, the provost says, “there is a lot of unstructured narrative data that we are not well-positioned to retrieve and analyze, and IBM will help us with that.” In particular, Mitchelson says the school is interested in finding out if the language used in admission essays “might indicate something about a student’s feelings about going to school and their level of commitment.”
To capture data from admissions essays, Watson will be used to analyze and look for patterns in the text, such as keywords that have emotional connotations and could suggest how a student might persist through school, says Richard Strasser, associate partner at IBM working on higher education consulting. According to Strasser, that data is then turned into structured data and given a “correlations risk.”
Researchers who study predictive analytics in higher education say admissions essays and natural language processing could help identify students who may be at-risk of dropping out, in particular during their first year.
“Colleges really struggle when students first come on campus because they only have so much information on them,” says Iris Palmer, a senior policy analyst with the Education Policy program at New America. “They look at demographics or what high school or neighborhood a student came from. But this information is very limited and it doesn’t predict [retention] very well.”
She says if school officials and technologists carefully design student interventions and train staff around implicit bias, applying language-processing technology to admissions data could offer valuable insights about a student.
Palmer isn’t without concerns, however. In particular, she warns that the predictive data gleaned from admissions essays could affect a student’s likelihood of being admitted or how much financial aid is awarded, further exacerbating inequality in higher education. She has also authored multiple reports on the risks around using predictive analytics, including how it “can aid in discriminatory practices, make institutional practices less transparent, and make vulnerable individuals’ data privacy and security.”
ECU’s Mitchelson says the technology will not be used to make enrollment decisions, and he’s aware that “there is tremendous sensitivity to the deployment of predictive analytics.”
He also recognizes there are concerns about IBM’s technology itself. For example, the Wall Street Journal recently reported that after billions of dollars were poured into using Watson for cancer research and treatment oncology, “no published research shows Watson improving patient outcomes.”
When asked whether the technology’s shortcomings in healthcare might impact its education efforts, the company replied in an email: “IBM has never shied away from grand challenges, and we know they don’t happen overnight and are not easy.”
Mitchelson says his team has been in conversations with other universities about the ways that they are deploying predictive analytics, and he’s seen examples of how the technology has helped students as well as where it has exacerbated issues.
Much work remains to be done. The university has not yet designed how advisors will access the data or intervene with students who are flagged to be at-risk. Campus officials and IBM are working to first analyze the essay data—and the provost says the university will be adding more human advisors once the data is available. Currently, the school has an enrollment of nearly 29,000 students and around 60 full-time academic advisors to support them.
“We do not view these tools as a substitute [for human advisors],” says Mitchelson. “We will increase the number of advisors, but also restructure it.”
Strasser adds that IBM tries “to be as agnostic as possible when it comes to solutions” for how advisors should act on the data.
The IBM partnership is part of East Carolina University’s Finish in Four program, a campus-wide initiative that aims to increase the percentage of students who complete their majors in four years. Other efforts have included easing registration holds and restructuring degree programs.
It could be at least another year before East Carolina University begins designing interventions around admissions essay and other unstructured data. “Our approach to this is pretty slow and deliberate,” says Mitchelson. “We are bringing this along at a pace that makes sense to the campus.”