Going Back to the Basics?

With data science languages, sometimes learning the basics can be the hardest part. The QAC offers several .25 credit classes that introduce students to the necessities of different languages, but even fitting all the necessary information into a half a semester can be difficult. This past quarter, Professor Pavel Oleinikov utilized a website called DataCamp to help his students get comfortable with the basics of Python. DataCamp is an online collection of data science lessons that teaches users through videos and repetitive exercises. The website has an in-browser code box that allows users to code right on the website without having to download any software. Each lesson takes roughly 30 minutes to 1 hour to complete, making it a convenient way to nail down a specific skill.

Students in Pavel’s Working with Python class really enjoyed being assigned DataCamp lessons as homework. “We only have 3 hours, and that may seem long but that’s not a lot of time considering the concepts that we’re learning,” said Anthony Price. These .25 credit classes move quickly, and so there isn’t much time to backtrack if students are lost. And students can always Google around for answers, but sometimes the vast amount of material returned can be overwhelming. This is why it is important to have resources in place so that students don’t give up before they get comfortable.

Read more

Detecting Trends in Community Engagement: At Wesleyan and Beyond

When it comes to activism and community service, Wesleyan has always tried to stay ahead of the curve. But this can be difficult, as the concerns and trends of community engagement are constantly shifting. Often, new topics will seemingly erupt out of nowhere, and it will take a while for word to spread. There are so many existing concerns that it can be difficult for new voices to be heard and for old voices to catch on to the changes. It might seem as though the trends in community engagement are shifting constantly, without any pattern. But can technology detect one?

Wesleyan’s Text Mining class was assigned the task of investigating this dilemma. They were asked to analyze the relationship between approaches to community engagement in the past and what people desire from it in the future. For past data, they collected the text of old Argus articles tagged “community engagement.” These articles were meant to illuminate what kinds of activities were most popular. Present data was collected through focus groups that were asked about the current state of community engagement at Wesleyan and how it could be improved. From this data, class groups hoped to discover how much the current activities overlap with the desires of the focus groups, as well as identify which community engagement topics are popular and which ones are new.

Read more

Crowdsourcing Data Analysis: The complexities of free data labor in a data hungry market

Companies don’t know where to look to find the data analysts they need. A February 2017 article reported that 40% of major companies are struggling to find reliable data analysts to hire. According to TechTarget, “a lack of skills remains one of the biggest data science challenges,” and many tech magazines have reported something similar. This has led to companies sponsoring campaigns encouraging people to learn coding and universities to create comprehensive data analysis training programs. But it has also led to the widespread use of crowdsourcing data analysis. Crowdsourcing, while not a new tool in data science, has recently become extremely popular as a way for companies to fulfill their data analysis needs, from gritty data cleaning to full blown model creation. Last month DataCrunch reported on Kaggle, a website that allows companies to host competitions with a dataset they need to be analyzed in some way. Another example is DrivenData, who do activism work themselves but have a similar competition layout that runs their projects. The way the competition model works is that the participant or group whose model is chosen as the best by the company receives a cash prize. However, these competitions get a large enough number of submissions that the chance of winning the prize is rather low.

Read more

Making Books Unfamiliar: The Art of Novel Analytics

On March 2nd, Matthew Jockers gave a talk at Wesleyan about his research on using quantitative methods for analysis in literature. His talk was titled “Novel Analytics: From James Joyce to the Bestseller Code.” The following article is an exploration of his talk and the ideas he brought forward.

What makes something a piece of art? This might sound like a pretty theoretical question, but English professor Matthew L. Jockers believes that it is possible to take a technical approach.

“Art shows how things are perceived, not known,” Jockers explained. This is a definition that could cause tension in the literary world. After all, writing is messy, personal, and painfully subjective. And yet – “We tend to emphasize the idea that the text is withholding an ‘essential truth,’” Jockers explained. In this way, a literary critic wants to be able to anticipate a certain meaning, causing an endless tug-of-war between objectivity and subjectivity. Jockers does not wrestle with this tension, as evidenced by his book The Bestseller Code, in which he uses analysis to tackle that all-elusive question: What makes a bestselling novel?

Jockers and his co-author Jodie Archer nailed down the qualities that make a book a bestseller by analyzing 30 years’ worth of New York Times bestselling novels. The idea of taking an analytical look at novels was appealing to Jockers because, while a literature enthusiast, he is largely interested in the way parts fit together. Rather than focusing on the novel itself, Jockers believes we should focus on the relationship readers have with the literal words on the page. “Books are a map of grammar, syntax, and word order used to direct our attention to a certain meaning,” he explained. This transforms the previous abstract question to a new, more concrete one: How do best-selling writers write?

Read more

Can we Utilize Passion in Data Science?

It can be easy to think of data science as cut and dry analysis consisting solely of numbers. But according to Economics major Leah Giacalone ’17, if people think of it that way it’s just because they haven’t tried it yet. “Personally, I’ve always found being able to code super exciting,” she said. “The first time I wrote code and then it worked was the most exciting thing ever. I always tell people that and they don’t believe me.”

If you are someone who doesn’t believe in the passion underlying data science, then maybe it’s time to give it a go, because an increasing number of companies are utilizing passion as a power source for their problems. An example of this is Kaggle, a website founded in 2010 that allows companies to post their data and research problems online so that people from around the world can compete to create the best solution. Kaggle is using the overflow of big data to its advantage to create a sort of Kickstarter for data science. It’s engaging, fresh, and possibly a good way for data analysis hopefuls to break the ice with coding.

Recently, Kaggle was used at Wesleyan by Professor Pavel Oleinikov in his class “Introduction to Text Mining.” Pavel asked his students to use customer reviews from Yelp to create an algorithm that could best predict whether a Yelp review was positive (4 or 5 stars) or negative (1 or 2 stars). The students used reviews of Arizona hotels as a guide for creating dictionaries of 20 positive and 20 negative words that would then be built into their model. The accuracy of the students’ models was then tested on Pennsylvania hotel reviews and judged by a mean utility metric that assigned a score between -1 (wrong in every case) and 1 (got everything correct). Narin Luangrath ‘17, a Math and Computer Science double major, won the class competition with a mean utility score of .72.

Read more

Racism and Diversity in Data Analysis

In light of the recent election, it is more important than ever to look at how and where we are responsible for perpetuating prejudice. In a previous article, DataCrunch introduced the concept of “Weapons of Math Destruction,” which are data models built from a limited or biased sample of data that result in toxic feedback loops. Since this explanation is most often attributed to artificial intelligence, there is little discussion about how this description could also illuminate the workings of the human mind. While many might want to think of this narrow-mindedness as below the mental capacity of human beings, such a viewpoint is dangerous in that makes having a conversation about prejudice difficult. Like artificial intelligence, humans create internal models in their minds, which often lead to the creations of ideals such a stereotypes. The flaw of these models is that they are based on the past, and if not updated or addressed they will continue to run off of the same data they were originally fed – even if times have changed.

At an individual level, racism is a toxic feedback loop. People want to be able to predict how other people will behave, and it can be far too easy to create “a binary prediction that all people of that race [or gender, sexuality, religion, political group, etc] will behave that same way,” (O’Neil, 23). If an issue like this develops in a technical model, it isn’t too difficult to go back and manually adjust the data input or change the important factors. But people with racist beliefs “don’t spend a lot of time hunting down reliable data to train their twisted models,” (23) They will continue to gladly absorb the data that seems to confirm their beliefs, and will refute data that tests them.

Read more

Seasons of Internships

I’ve feared the moment that my summers would be turned over to internships for a long time. I can’t remember for how long I’ve known internships are important – probably for as long as I’ve known about applying for college. My relationship with the idea of internships has gone through stages, with me sliding from thinking that they are silly resume builders to valuable and necessary work experience almost every day. I recently decided that I wanted to pursue some sort of consulting internship, and then felt a drop in my stomach similar to when I decided to apply for Wesleyan. But while there is a large and personalized application process still ahead of me, I don’t want to feel as scared as I did then. With this in mind, I sat down with Asie Makarova ’17 and Taylor Chin ’18 to discuss two of the main myths about internships and what truths, based on their experience, lie beneath.

Both Asie and Taylor had similar beginnings to their internship journey. “I started pretty early applying to things Junior fall,” Asie remembered. She found her connection through LinkedIn, by reaching out to a friend’s dad who then put her in contact with FTI Consulting. Taylor also came across his internship on LinkedIn when he noticed that an old friend from high school had connections at an energy intelligence software company called EnerNOC. From there, both Taylor and Asie got offered interviews at their respective companies.

Read more

Peeking Into Design’s Toolbox: Design, Data, and the Liberal Arts Education

In 1994, a small company called Marvel acquired the rights to sell children’s toys and comic books based off of their characters. During this time they were riding the wave of the comic book boom, a time when comic book consumption and production reached a sudden high. Marvel entered this period of success with high hopes, and followed the lead of other comic book companies to find success. This follow-the-leader approach turned against them when the market collapsed in 1997, forcing Marvel to declare bankruptcy.

All of this happened before Marvel Entertainment was the media power house we know today. Now, it seems as if Marvel is expanding into every corner of product design, churning out movie and TV series with a built in comic book and merchandise market at such a pace that some are calling this Marvel’s Golden Age. This approach is startlingly different than the company’s mantra in 1997, leading many Marvel enthusiasts to ask themselves what has changed between then and now.

Wesleyan alum Peter Olson ’97 was hired by Marvel in 2004, the year before Marvel changed their name from Marvel Enterprises to Marvel Entertainment – a move that made their expansionist dreams quite clear. Peter’s main assignment was to re-launch Marvel’s website, in a hope that they could rebuild through better online communication with fans. But Peter knew that, in order to really reach their full potential, Marvel needed to become a business. While working there, he landed on a golden question for Marvel’s future: “How can we take Marvel’s data and turn it into something useful for fans?” One of the results that came from this line of thinking was a visualization of all the character relations in the Marvel Universe, color-coded by the major franchises. Shown below, each node represents a character, and the thickness of each edge correlates to the number of interactions between the characters. Peter was only a cog in a large mechanical shift within Marvel, but the thinking that led to the creation of this data visualization is very representative of the change that took place after Marvel’s bankruptcy in 1997 – they stopped thinking about how they could use their data to merely market products, and instead focused on a way to draw in customers by using their data for interactive and proactive design.

Read more

The Invisible and Pervasive Power of College Rankings

This article is inspired by and quotes from Weapons of Math Destruction by Cathy O’Neil, a book about O’Neil’s growing disillusionment with the data economy as she learned that data can be used to fuel toxic feedback loops. This post is the first in a series DataCrunch will be doing based on the examples cited in her book.

 

When preparing to apply to college, one of the first references that people often turn to are lists of college rankings. Almost every newspaper/journal has one – Forbes, Princeton Review, U.S. News. They are a big deal within higher education, with students and parents often referring to the lists as a point of reference when choosing where to apply. But the scope of influence goes beyond that. Alumni and teachers will also look at these lists to decide if they want to apply or donate money. These simple rankings of colleges have become somewhat of a bible in higher education that destines a school to fly or flop – all based on what their ranking is.

Does this sound scary to you? It should. It’s hard to truly understand the amount of power we give to these lists until you step back and look at how far the cycle of impact spans: The process of applying for college has become so much more than just “applying.” High schools will start prepping students their freshman year to be wary of their grades, ranked GPA, AP scores, extracurriculars, volunteer work, honors society, SAT scores, ACT scores…. And when high schoolers are stressing out about how much there is to do, they surely don’t think back to those college rankings that they started reading with your parents for fun. But the truth is that they are the center point of a vicious feedback loop that now controls our higher education system.

Read more

Why are Textbooks so Expensive?

As Drop/Add week comes to an end, students are finishing up one of the most dreaded activities of the semester: Acquiring textbooks. Whether you have already purchased all your textbooks or are heading to Broad Street to pick up the final ones, you will all end the week having dealt with one of the worst cases of sticker shock possible. Because while the mile-long line is annoying, nothing is more horrifying than seeing your purchase total turn to a three-digit number for the nine textbooks your English class requires. Of course, there are cheaper options: you can buy used or rent new/used copies, borrow, lend, sellback after buying, have parents pay, pay part, pay full. But sellback can be difficult and asking parents difficult to navigate, especially if the money situation is tight. And with the prices sitting at a cringe-worthy level no matter what, paying for textbooks has become a serious concern for most college students.

Have you ever wondered why you’re textbooks are so expensive? Even normal books aren’t always cheap, but textbook prices have soared far above that level. There is some disparity in data, but on average it is reported that textbook prices have risen 800% in the last 30 years.  And with choosing to not buy a textbook possibly hurting your grade, it poses questions about a rigged college system that favors those with money, even after you get past the golden gates. So what’s causing prices to be so high?

Read more