The hidden power of calendars: history of Oscars told through their telecast schedules

Sundays seem natural for large TV events.  Why wouldn’t they?  NFL’s Super Bowl has been on Sundays forever. It feels like the proper order of things that the Academy Awards ceremony is also on a Sunday. Every year, somewhere near the end of February, start of March.  Yet, a simple dataset of telecast dates points out that this practice is a relatively recent phenomenon and for a long while things were quite different.  For a quick summary of the data, look at the chart below: it shows the progression of the ceremony dates from the most distant (1953) to the closest (2014).  For more details on why the changes occurred, keep reading on.

f1

Read more

How many tweets would your code crunch if it could crunch Twitter, or why holidays are bad for using Twitter’s streaming API

Twitter has emerged as a convenient source of data for those who want to explore social media. The company provides several access endpoints through APIs. There is a REST API for collecting past tweets and a streaming API for collecting tweets in real time. R has libraries for working with both. As is usual in data collection, the catchphrase is “more” – we want more tweets, ideally all that are relevant to our research question. While REST API is rate-limited (a user can submit 180 requests per 15 minutes, with each request returning 100 tweets), the streaming API holds a promise of delivering much more. The nagging question, though, is “how much?”

Read more

FiveThirtyEight’s Riddler #1 – using R to evaluate the answers

Last week a prominent data journalism blog FiveThirtyEight.com has launched The Riddler – a section dedicated to math and probability related puzzles. The deadline for submitting solutions to the first riddle is over and this post will illustrate how you can use R to evaluate potential answers without doing any analytic derivations.

Read more