Monday, December 9, 2013

What's in a name

So seeing as this blog is pretty new. I figured I'd explain where I got the name from. It's a pun on "fails to reject", a phrase which will be familiar to anyone with any statistics training. For those of you who haven't gotten around to taking a statistics course yet, this feels like an appropriate time to blog about hypothesis testing.

Imagine you find a coin. You decide to keep it so that you can flip it whenever you have a decision to make but don't want to think too hard about said decision. Sadly you're somewhat paranoid and worry that the coin might be biased, which is to say that it might give you heads more often than tails or tails more often than heads.

To spite the worry you figure that it's probably a fair coin. (Most of us actually do assume that coins we find on the street/beach/fires of mount doom are fair.) Using epic levels of  ignoring anything even remotely resembling  a discussion of the epistemology of this assumption for the moment I'm just going to say that we provisionally accept that the coin is fair because this is a "natural thing to do". The key word here is provisionally.

Now we'd like to see if the coin is actually fair. How do we do this? By actually flipping it of course. Now imagine for example that your first five flips turn out to be heads tails tails heads heads. This is pretty unremarkable and seems consistent with the idea that our coin is fair, and we might be tempted to accept the fact that our coin is fair. The trouble is that at this point we meet Alf who stares at our coin and decides (just because he feels like being contrary) that it's a coin that comes up heads 65% of the time (Don't we all know an Alf). As it turns out you've gotten 3 heads out of 5 flips, or 60% heads. Which seems more or less as consistent with 65% as with your 50%. Of course 50% feels a lot more natural to you  than 65%, but this joker at least claims that 65% seems more natural to him; sadly 3 out of 5 heads is pretty consistent with getting heads 65% of the time as well. It's also consistent with 60% heads or 61% heads.

Anyway you and Alf then flip the coin another 495 times to have a total of 500 flips. It turns out that your total number of flips is now 254. This is consistent with your 50% and not consistent with his 65% idea. He now needs to relinquish his idea of 65% heads in the face of data (yes, I'm being intentionally vague here about when exactly Alf should become overwhelmed by the data).  Alf now needs to reject his 65% hypothesis.

Which means you get to accept yours!!! Not so fast, along comes the even more annoying Bob (yes we're going alphabetical you should have figured this out when I named a character Alf). Bob has watched the whole exchange and been quietly muttering to anyone who'll listen that it's actually a 51% heads coins. He won't let you make the group accept your idea of a 50% coin. You can't accept the idea of the coin being fair yet because Bob could be right about the 51% but as the first 500 flips haven't been very different to what you expected you can hold onto your 50%  hypothesis. That is you can fail to reject your initial 50% estimate. As Bob is snarkily pointing out this isn't the same thing as accepting the 50% hypothesis.

Anyway you and Bob spend the next year locked in a coin flipping data collection battle and finally get one million total flips. These include exactly 501,218 heads. Bob is wrong (haha Bob). So Bob rejects his null hypothesis and you once again fail to reject yours. Finally Carl appears and claims the coin comes up heads 50.01% of the time. After using a hyperbolic time chamber for several millenia you manage to flip the coin 10^50 times , and it turns out that you have 5.0010001*10^49. heads. Yeah Carl seems right so you reject your null hypothesis (which is why you failed to reject instead of accepting) and Carl decides to fails to reject HIS null hypothesis. Later Daisy comes along and suggests it's actually a 50.00983839% coin and hilarity (or a least a lot of time) ensues testing both of these hypotheses.

Now we call one of these initial hypothesis a "null hypothesis" and we speak of rejecting or failing to reject the null hypothesis. Notice that we always fail to accept it (hence the blog title).

Of course at some point the effective difference between 50.01% and 50.0099% is something no-one actually cares about (hopefully).

Again I've left out the description of exactly how much data is needed to reject a null hypothesis, but that's probably a post for another day.


  1. This comment has been removed by the author.

  2. Hey, perhaps you should add or some other way for people interested to follow the blog?