We hope you’ve all had an okay time settling in for the 2016-2017 year. We’re excited you’re here, and for all the great things you can expect on DataCrunch this year. Stay tuned!
In case you have not noticed from the multiple TV ads, for a few years now IBM has been positioning itself as a Big Data company, with its Watson platform and cloud-based services. One of them is the Alchemy Language API, which packs together functions for text analysis and information retrieval. As part of learning how to handle this API from R, I tried it on a news story about a sci-fi book publishing business. Overall, the results were strong, although not without some amusing quirks…
As Drop/Add week comes to an end, students are finishing up one of the most dreaded activities of the semester: Acquiring textbooks. Whether you have already purchased all your textbooks or are heading to Broad Street to pick up the final ones, you will all end the week having dealt with one of the worst cases of sticker shock possible. Because while the mile-long line is annoying, nothing is more horrifying than seeing your purchase total turn to a three-digit number for the nine textbooks your English class requires. Of course, there are cheaper options: you can buy used or rent new/used copies, borrow, lend, sellback after buying, have parents pay, pay part, pay full. But sellback can be difficult and asking parents difficult to navigate, especially if the money situation is tight. And with the prices sitting at a cringe-worthy level no matter what, paying for textbooks has become a serious concern for most college students.
Have you ever wondered why you’re textbooks are so expensive? Even normal books aren’t always cheap, but textbook prices have soared far above that level. There is some disparity in data, but on average it is reported that textbook prices have risen 800% in the last 30 years. And with choosing to not buy a textbook possibly hurting your grade, it poses questions about a rigged college system that favors those with money, even after you get past the golden gates. So what’s causing prices to be so high?
In 2013, Netflix came out with Orange is the New Black, one of the first original series to be debuted on an online streaming network. It was an immediate success, and ushered in years of Netflix continuously “getting it right”: House of Cards, Arrested Development, Unbreakable Kimmy Schmidt, numerous Marvel shows, Sense8 – and, most recently, Stranger Things.
What’s fascinating is that there seems to be pattern of streaming video networks coming out with great original shows while cable TV shows are declining in quality and originality. When Stranger Things came out in early July of 2016, Netflix had another hit, and I heard many people saying in awe, “How does Netflix keep getting it right?”
It turns out there’s a secret to their success: Big Data.
When seen through a news report or a computer screen, the impact of current political research can seem very disconnected from what’s really taking place. It can be hard to try to understand the results and implications of politician’s behaviors and opinions without an already-written history book. But with this new age of media presentation comes the new age digging tool of data analysis, which is once again proving to be the key to decoding to today’s political discourse.
And that’s not its only use. Once again, Wes students are proving it possible to not only use online politics for research purposes, but to get your foot in the door of Data in the Real World. In April, John Murchison ‘16, Grace Wong ‘18, and Joli Holmes ’17 attended the Midwest Political Science Association conference in Chicago to present a poster about their research on congressional politics.
With rainy days finally over, students from the Introduction to Text Mining course (QAC 386) decided to hold the class outside, which they successfully did on the lawn near Allbritton Hall. The topic of the day was tree parsing using openNLP package in R.
In the photo, left to right: first row: Trisha Arora ’16, Taran Carr ’16, Antonio Robayo ’16, and Jack Trowbridge ’16; second row: Grace Wong ’18, third row: Sara Eismont ’18.
As interest in data and data analysis grows, future students interested in the career have to work harder to understand the boundaries and guidelines. It won’t always be as simple as it was for Evan Thorne ‘15, who came to Wesleyan thinking he wanted to study Economics before discovering the QAC department. “I was starting to work with data sets in my math and computer science classes when I heard about big data,” Evan said, explaining that it was soon after he began taking classes with the QAC that he realized he wanted this as a career.
But what is “this”? Data science? Data analysis? Data manipulation? Sometimes it can be hard to define. But Evan did not flounder when I asked him for a definition of his job at CKM Advisors, the company where he was hired right after graduation. He began by explaining to me that an analyst is someone who is able to take in what’s readily available to them and then dissect it to look at more basic stats and trends. Data scientists, however, are able to find things that aren’t available – unstructured data – and take it in raw. “Every data scientist is an analyst in a way,” Evan explained, “but it’s at a much bigger level.” At CKM, Evan is a data scientist, and he is responsible for all of the analytic process: data ingestion, wrangling, manipulation, analysis, and visualization.
Sundays seem natural for large TV events. Why wouldn’t they? NFL’s Super Bowl has been on Sundays forever. It feels like the proper order of things that the Academy Awards ceremony is also on a Sunday. Every year, somewhere near the end of February, start of March. Yet, a simple dataset of telecast dates points out that this practice is a relatively recent phenomenon and for a long while things were quite different. For a quick summary of the data, look at the chart below: it shows the progression of the ceremony dates from the most distant (1953) to the closest (2014). For more details on why the changes occurred, keep reading on.
With the Oscars ceremony just two days away, it is nearly impossible to escape the media buzz around the potential winners. Many commentators believe that the Oscar for the best actor in a leading role should go to Leonardo DiCaprio. Analysis of data from the Academy’s database shows that, even for the superstars, nothing is written in stone. A little bit of background reading, aided by New York Times Article API, revealed a history of intricate balancing between the Academy, the studios, and the public.
Transparency is a hot issue; in politics, in business, and in journalism, people are all itching to know how truthful the truths their being fed really are. However, truth is no longer as easy a thing to gauge as it once was. It turns out, the public can be fed information that is, technically, true, yet at the same time only one version of the truth.
A good example to look at is weather forecasts. Most people think of weather as easy and straight-forward data to access. There are tons of websites that allow search by location (eg/ www.weather.com), and on TV news we can be given an explanation of a weather chart, as seen below: