It can be easy to think of data science as cut and dry analysis consisting solely of numbers. But according to Economics major Leah Giacalone ’17, if people think of it that way it’s just because they haven’t tried it yet. “Personally, I’ve always found being able to code super exciting,” she said. “The first time I wrote code and then it worked was the most exciting thing ever. I always tell people that and they don’t believe me.”
If you are someone who doesn’t believe in the passion underlying data science, then maybe it’s time to give it a go, because an increasing number of companies are utilizing passion as a power source for their problems. An example of this is Kaggle, a website founded in 2010 that allows companies to post their data and research problems online so that people from around the world can compete to create the best solution. Kaggle is using the overflow of big data to its advantage to create a sort of Kickstarter for data science. It’s engaging, fresh, and possibly a good way for data analysis hopefuls to break the ice with coding.
Recently, Kaggle was used at Wesleyan by Professor Pavel Oleinikov in his class “Introduction to Text Mining.” Pavel asked his students to use customer reviews from Yelp to create an algorithm that could best predict whether a Yelp review was positive (4 or 5 stars) or negative (1 or 2 stars). The students used reviews of Arizona hotels as a guide for creating dictionaries of 20 positive and 20 negative words that would then be built into their model. The accuracy of the students’ models was then tested on Pennsylvania hotel reviews and judged by a mean utility metric that assigned a score between -1 (wrong in every case) and 1 (got everything correct). Narin Luangrath ‘17, a Math and Computer Science double major, won the class competition with a mean utility score of .72.
Leah is pretty competitive, so she was excited when she learned about this aspect of the assignment. “When I took a data mining course we didn’t use Kaggle, and because you couldn’t see how your classmates were doing I think you were a little more okay with getting a lower score,” she said, explaining that seeing how classmates were moving amongst the ranking throughout the week kept people on their toes.
“There were definitely people I would talk to and we would joke around like, ‘I’m coming for you,’” said Joseph Bongo ’18, an Economics and Math double major. But he countered that the competition never became intimidating due to the fact that Pavel graded the students’ assignments independently from their Kaggle ranking. “You don’t want to make it stressful. You want people to be engaged, but you also don’t want to freak people out if it’s the case where they could be doing everything right and still not getting a model that’s too accurate.” The fact that Pavel chose to separate grades from a competitive atmosphere is important, because it allows him to avoid creating a toxic atmosphere while still utilizing the passion in the class – passion that can sometimes get buried under the hum-drum of the daily college workload.
“We’re all particularly quantitative majors, so we’re used to cookie-cutter assignments and doing things the exact same way every time,” said Joe.
Nahrin agreed, adding, “With all the QAC courses it’s more like: here’s what we want to get done. And then we learn only enough theory so that we can solve these problems. It’s nice to have the opposite approach.” In Nahrin’s opinion, this is the kind of experience with data and coding that first-time students want, but they don’t always know where to go to get it. “I TA-ed “Intro to Programming.” It’s full of people who want to get some experience, and I feel like a lot of those people need to be coming to QAC classes, because they’re much more practical.”
Advertising has always been a point of experimentation and improvement for the QAC, and while the focus is often on how to reach the most people, perhaps a better model would be creating activities that attract people’s passion. The Kaggle website has clearly enthused the students in Pavel’s class in this way. “I think there is an essential foundation of passion in coding and data science. But then when you add the competition, it adds another layer,” Leah explained.
“It would be funny if there was a Wesleyan University leaderboard,” Nahrin laughed. “Or a QAC leaderboard.” Everyone agreed: That would be really cool.