So I haven't used this blog in a while, but I'm back-ish. Which means I'm panning to start blogging again but may or may not actually get around to it this time.
In my first ever post I discussed the idea of a hypothesis test. However I didn't get into any of the technical details in that post. For example suppose I believe that my null hypothesis is that my coin comes up head 50% of the time for data I've flipped it 30 times and gotten 20 heads and 10 tails.
Does this mean that I need to give up the idea of a coin which produces heads exactly half the time? In the last post we spoke about the idea of rejecting a hypothesis which is inconsistent with the data, if we had 29 out of 30 heads we'd chuck the idea of a 50-50 coin, if we had 16 heads (again out of 30) we'd keep it. For 20 however it's less obvious. Where is the cutoff?
More generally where should the cutoff be? The usual answer to this is ask "How likely would I be to see data this or more strongly against the null hypothesis if the null hypothesis was indeed true?". We then reject the null hypothesis if this probability, which we call a p-value, is "small" and fail to reject it otherwise.
By tradition "small" is usually taken to mean less than 0.05, sometimes this tradition is broken, some fields have different traditions and some statisticians absolve themselves of this by saying "the p-value is... ".
Going back to our original question of 20 heads in 30 flips, how likely would I be to see data this or more strongly against the null hypothesis if the null hypothesis was indeed true? The surprising answer is: "It depends". A fuller version of this answer is "It depends on what you mean by more strongly against".
The probability of getting 20 or more heads from 30 flips works out to about 0.04937, just under the traditional 5% (0.05). However the probability of getting 20 or more heads OR 10 or less heads is twice that. Would 3 heads out of 30 stronger evidence that our null hypothesis is wrong than getting 20? Before you say "yes of course" what if we had exactly 15 heads but they alternated? i.e. HTHTHTHTHTHTHTHTHTHTHTHTHTHTHT.
Again the answer here seems like it should be "yes of course that's different" but this dependency isn't something we were thinking about beforehand. Worse it's pretty easy to find patterns in a lot of sequences so we could in principle keep adding in things more surprising than our (unnamed) string of 20 heads in 30 flips.
For this reason we need to specify in advance what counts as stronger evidence against our null hypothesis. This is called an alternate hypothesis. Sometimes it's appropriate to make the alternate hypothesis "p>1/2" (or "p<1/2") and sometimes it's appropriate to say p is not 1/2, sometimes it's appropriate to say p depends on the last flip or two in this some way.
Of course this isn't the only way to do things. There are others which perform better in some situations and worse in others. I'll (maybe) discuss these in a later post.