Many know that Wesleyan has a very large collection of prints dating back to the 15th century, stored in the Davison Art Center (DAC). Not many are aware that, through the efforts of the DAC staff, the collection comes with an extensive dataset containing metadata for all records. In the fall of 2017, students from the Introduction to Network Analysis (QAC 241) got a chance to view some of the famous prints and then search for new insights in art history using their quantitative skills. This post describes the experiences, accomplishments, and challenges of working with art history data.
In case you have not noticed from the multiple TV ads, for a few years now IBM has been positioning itself as a Big Data company, with its Watson platform and cloud-based services. One of them is the Alchemy Language API, which packs together functions for text analysis and information retrieval. As part of learning how to handle this API from R, I tried it on a news story about a sci-fi book publishing business. Overall, the results were strong, although not without some amusing quirks…
With rainy days finally over, students from the Introduction to Text Mining course (QAC 386) decided to hold the class outside, which they successfully did on the lawn near Allbritton Hall. The topic of the day was tree parsing using openNLP package in R.In the photo, left to right: first row: Trisha Arora ’16, Taran … Read morePhoto of the day – April 14, 2016
Sundays seem natural for large TV events. Why wouldn’t they? NFL’s Super Bowl has been on Sundays forever. It feels like the proper order of things that the Academy Awards ceremony is also on a Sunday. Every year, somewhere near the end of February, start of March. Yet, a simple dataset of telecast dates points out that this practice is a relatively recent phenomenon and for a long while things were quite different. For a quick summary of the data, look at the chart below: it shows the progression of the ceremony dates from the most distant (1953) to the closest (2014). For more details on why the changes occurred, keep reading on.
Twitter has emerged as a convenient source of data for those who want to explore social media. The company provides several access endpoints through APIs. There is a REST API for collecting past tweets and a streaming API for collecting tweets in real time. R has libraries for working with both. As is usual in data collection, the catchphrase is “more” – we want more tweets, ideally all that are relevant to our research question. While REST API is rate-limited (a user can submit 180 requests per 15 minutes, with each request returning 100 tweets), the streaming API holds a promise of delivering much more. The nagging question, though, is “how much?”