Crowdsourcing Data Analysis: The complexities of free data labor in a data hungry market

Companies don’t know where to look to find the data analysts they need. A February 2017 article reported that 40% of major companies are struggling to find reliable data analysts to hire. According to TechTarget, “a lack of skills remains one of the biggest data science challenges,” and many tech magazines have reported something similar. This has led to companies sponsoring campaigns encouraging people to learn coding and universities to create comprehensive data analysis training programs. But it has also led to the widespread use of crowdsourcing data analysis. Crowdsourcing, while not a new tool in data science, has recently become extremely popular as a way for companies to fulfill their data analysis needs, from gritty data cleaning to full blown model creation. Last month DataCrunch reported on Kaggle, a website that allows companies to host competitions with a dataset they need to be analyzed in some way. Another example is DrivenData, who do activism work themselves but have a similar competition layout that runs their projects. The way the competition model works is that the participant or group whose model is chosen as the best by the company receives a cash prize. However, these competitions get a large enough number of submissions that the chance of winning the prize is rather low.

Read more

Making Books Unfamiliar: The Art of Novel Analytics

On March 2nd, Matthew Jockers gave a talk at Wesleyan about his research on using quantitative methods for analysis in literature. His talk was titled “Novel Analytics: From James Joyce to the Bestseller Code.” The following article is an exploration of his talk and the ideas he brought forward.

What makes something a piece of art? This might sound like a pretty theoretical question, but English professor Matthew L. Jockers believes that it is possible to take a technical approach.

“Art shows how things are perceived, not known,” Jockers explained. This is a definition that could cause tension in the literary world. After all, writing is messy, personal, and painfully subjective. And yet – “We tend to emphasize the idea that the text is withholding an ‘essential truth,’” Jockers explained. In this way, a literary critic wants to be able to anticipate a certain meaning, causing an endless tug-of-war between objectivity and subjectivity. Jockers does not wrestle with this tension, as evidenced by his book The Bestseller Code, in which he uses analysis to tackle that all-elusive question: What makes a bestselling novel?

Read more

Can we Utilize Passion in Data Science?

It can be easy to think of data science as cut and dry analysis consisting solely of numbers. But according to Economics major Leah Giacalone ’17, if people think of it that way it’s just because they haven’t tried it yet. “Personally, I’ve always found being able to code super exciting,” she said. “The first time I wrote code and then it worked was the most exciting thing ever. I always tell people that and they don’t believe me.”

If you are someone who doesn’t believe in the passion underlying data science, then maybe it’s time to give it a go, because an increasing number of companies are utilizing passion as a power source for their problems. An example of this is Kaggle, a website founded in 2010 that allows companies to post their data and research problems online so that people from around the world can compete to create the best solution. Kaggle is using the overflow of big data to its advantage to create a sort of Kickstarter for data science. It’s engaging, fresh, and possibly a good way for data analysis hopefuls to break the ice with coding.

Read more

Racism and Diversity in Data Analysis

In light of the recent election, it is more important than ever to look at how and where we are responsible for perpetuating prejudice. In a previous article, DataCrunch introduced the concept of “Weapons of Math Destruction,” which are data models built from a limited or biased sample of data that result in toxic feedback loops. Since this explanation is most often attributed to artificial intelligence, there is little discussion about how this description could also illuminate the workings of the human mind. While many might want to think of this narrow-mindedness as below the mental capacity of human beings, such a viewpoint is dangerous in that makes having a conversation about prejudice difficult.

Read more

Seasons of Internships

I’ve feared the moment that my summers would be turned over to internships for a long time. I can’t remember for how long I’ve known internships are important – probably for as long as I’ve known about applying for college. My relationship with the idea of internships has gone through stages, with me sliding from thinking that they are silly resume builders to valuable and necessary work experience almost every day. I recently decided that I wanted to pursue some sort of consulting internship, and then felt a drop in my stomach similar to when I decided to apply for Wesleyan. But while there is a large and personalized application process still ahead of me, I don’t want to feel as scared as I did then. With this in mind, I sat down with Asie Makarova ’17 and Taylor Chin ’18 to discuss two of the main myths about internships and what truths, based on their experience, lie beneath.

Read more