FiveThirtyEight’s Riddler #1 – using R to evaluate the answers – DataCrunch

Last week a prominent data journalism blog FiveThirtyEight.com has launched The Riddler – a section dedicated to math and probability related puzzles. The deadline for submitting solutions to the first riddle is over and this post will illustrate how you can use R to evaluate potential answers without doing any analytic derivations.

As a caveat, at least two other solutions have been already posted online (Barron Wasteland – here and BlunderBear on Reddit – here ). Please stop reading this post if you would like to avoid spoilers.

The puzzle was formulated in the following way (source: What’s the Best Way to Drop a Smartphone):

You work for a tech firm developing the newest smartphone that supposedly can survive falls from great heights. Your firm wants to advertise the maximum height from which the phone can be dropped without breaking.

You are given two of the smartphones and access to a 100-story tower from which you can drop either phone from whatever story you want. If it doesn’t break when it falls, you can retrieve it and use it for future drops. But if it breaks, you don’t get a replacement phone.

Using the two phones, what is the minimum number of drops you need to ensure that you can determine exactly the highest story from which a dropped phone does not break? (Assume you know that it breaks when dropped from the very top.) What if, instead, the tower were 1,000 stories high?

The simplest approach would be to test-drop a phone from every floor: start at 1 and go on until we reach the floor at which the phone breaks. For instance, if the 2nd floor were the highest floor where the phone does not break, we would need to drop the phone from the 1st, 2nd, and 3rd floors. We need the drop from the 3rd floor because, when we are on the 2nd floor, we do not know yet that this is the highest one that is safe.

Very quickly, though, two thoughts come to mind. First, this is slow. Second, we are using only one phone. We could probably do better if we were to use a ‘leapfrog’ strategy: set a leap/step size (for instance, 3) and go in increments of this step. Once we encounter a floor where the phone breaks, we backtrack one step (to the floor we know is safe) and then start doing drops in increments of 1. For instance, suppose 7 is the highest safe floor and the step size we chose is 3. We would do the drops from the 3nd and 6th floors, and then we would do a drop from the 9th floor. At this point the first phone breaks, we backtrack to the 6th floor, and then go to 7th where we drop the second phone (safely), and we go to the 8th floor, at which point the second phone also breaks. In total, we made 5 drops (at floors 3, 6, 9-break, 7, 8-break). Interestingly, if 8th were the highest safe floor, then we would still need only 5 drops, taking advantage of our recollection that the first phone broke at floor 9.

How big should the step size be? We can use R to examine this: write a function that will simulate the process and count the number of drops we make for a set size of the step. The code uses while- and for-loops. You can read more about them here.

step_model = function(x, step) {
  steps_made = 0
  pos = 0
  go_on = TRUE
  while (go_on) {
    pos = pos + step
    steps_made = steps_made + 1
    if (pos > x) {
      pos = pos - step
      for (j in 1:(step-1)) {
        steps_made = steps_made + 1
        if ((pos + j) > x | j == (step - 1)) {
          go_on = FALSE
          break
        } 
      }
    }
  }
  return(steps_made)

The while-loop simulates the ‘leaps’ and the for-loop – the steps of increment 1 that occur after we backtrack. Now we can call the function with various step sizes and obtain the number of drops for each possible value of x – the highest safe floor.

For instance, for the earlier examples of 7 and 8 as the highest safe floors and step size = 3 we get

> step_model(x=7, step=3)
[1] 5
> step_model(x=8, step=3)
[1] 5

We can write x=7 and step=3 because R allows specifying names of arguments in the function call. Instead of doing a for loop to find the number of drops for every x, we can apply the function to a vector of values by using sapply(vector, function) – it will call the function with one element of the vector at a time and combine the results into a vector.

> x = 1:10
> sapply(x, step_model, step=3)
 [1] 3 3 3 4 4 4 5 5 5 6

The first line declared x as a vector containing values from 1 to 10. The second line applied the function step_model to each element of x and passed additional argument step=3. The output is as expected: values 7, 8, and 9 as the highest safe floors would take 5 drops to discover. Value 1 will take 3 drops: the ‘leap’ to the 3rd floor – we drop the first phone and it breaks, backtrack to 0, safe drop from the 1st floor, and the destructive drop from the 2nd floor.

To generate charts we use the code below: first we use plot() to initialize the graphing area and place the first chart, then we use points() with argument type=”l” to superimpose additional charts.

x = 1:100
plot(x=x, y=sapply(x, step_model, step=2), type="l", lty=1, col=1)
for (i in c(3, 33, 50)) {
  points(x = x, y=sapply(x, step_model, step=i), type="l", lty=1)
}

The charts below use the same data but were made using d3js – a JavaScript library for web-based vector graphics.

[d3-source canvas=”wpd3-7041-0″]

The dark red line corresponds to step size = 2, gray – to step size of 3. Strictly speaking, neither of the strategies dominates for all values of x: e.g., when x = 1 step size 2 produces a better result than step size 3, but for large values of x step size 3 is better.

As an afterthought, the chart shows the values for two other step sizes: 50 and 33. They represent the ‘reciprocal’ of sizes 2 and 3: for the worst cases (e.g., safe floor is 98th) they produce same values. For instance, step size 2 will determine that 98th is the safe floor after 51 drops (49 to reach 98th floor, plus one to test 100th floor, and another one to test 99th floor); step size 50 will produce the same value (2 steps with drops to reach 100th floor, plus 48 to reach 98th, plus one more to test that the phone breaks on 99th).

Such is the outcome for the deterministic case: no single strategy consistently outperforms other strategies. However, we can also examine the probabilistic case: if the highest safe floor was a uniformly distributed random variable, what would be the expected value for each strategy (that is, for each step size)?

The expected value is calculated from the formula:

$\bar{x} = \sum_{i}^{n} {p(x_i) \cdot stepmodel(x_i, j)}$

The probability here is the same for every value: for 100 story building it is $p(x_i)=\frac{1}{100}$ and for 1000 story building: $p(x_i) = \frac{1}{1000}$

We can use R to iterate over the values of $x_i$ and find the expected value of phone drops for each $x$ . First we do it for the 1000-story building, then for 100-story one.

x = 1:1000
exp_value = numeric(length(x))
for (i in 1:length(x)) {
  print(paste("Working on value", i))
  exp_value[i] = sum((1/length(x))*sapply(x, step_model, step=i))
}

x2 = 1:100
exp_value2 = numeric(length(x2))
for (i in 1:length(x2)) {
  exp_value2[i] = sum((1/length(x2))*sapply(x2, step_model, step=i))
}

The results are below: you can see the expected values for the varying step sizes for a 100-story building and 1000-story building. If you let the mouse cursor rest on a data point for a second, you will see information about the step size and the exact expected value. (This useful feature is based on the fact that each circle has tag <title> and the text from this tag is displayed when the mouse hovers over.)

[d3-source canvas=”wpd3-7041-1″]

The chart for 100-story building shows that the most optimal (from a probabilistic point of view) size of a step is 10: the square root of the height of the building.

[d3-source canvas=”wpd3-7041-2″]

The chart for the 1000-story building confirms this point: the minimal expected value of the drops occurs around step size 33, which is close to the square root of 1000. So, if we assume that the minimal number of drops meant minimal expected value of the number of drops, the answer is $\sqrt{s}$ where $s$ is the number of stories in the building.

Authors of the blog Barren Wasteland posted their solution to this riddle on December 12th: The Riddler 1… The solution involves a more complicated strategy: after choosing the size for the initial ‘leap’, each consecutive ‘leap’ is decremented by 1. The authors find that for the 100 story building, the starting ‘leap’ should be 14, and for 1000 story building – 45. Verification of this strategy for the probabilistic case is left, as mathematics textbooks like to put it, as an exercise for the reader.