Odds of a result
Posted: Sun Nov 24, 2013 5:53 am
Stats 101 Exam, essay question:
Q. Boris and Viktor play a best-of-12 chess match that ends up 6.5 to 3.5 as the final score. Furthermore, when Boris and Viktor play each other, one can expect at least half the games to be draws. From these data how certain might you be that Boris is better than Viktor? Discuss.
A. [Other answers may also get full credit]
The independence hypothesis is that the 10 played games are independent. This, however, is likely to be false in practice, so the final numbers may need some fiddling, but we shall take it as a starting point.
The null hypothesis is that Boris and Viktor are of equal strength. However, one must also take draw rate into account. One way to do this is to consider hypotheses H(x), that Boris and Viktor are of equal strength and each wins x of the time with 1-2x the draw rate, for each x from 0 to 0.25 [the condition that no more than half the games are expected to be decisive] and take the maximum likelihood (other choices are possible, but maximal gives a bound if nothing else) that Boris beats Viktor by at least 6.5 to 3.5. In other words, compute the sum S(x) of 10! / (W! D! L!) * x^W * x^L * (1-2x)^D over (nonnegative) integers W,D,L with W+L+D=10 and W+D/2 >= 6.5, which is a trinomial method to compute a surrogate toward the observed results (which do not specify how many games were draws) given the null hypothesis. This will give a likelihood measure for the null hypothesis.
Computing, we find that S(x) is x^3 times a degree 7 polynomial. The critical root has x approximately as 0.4519, and indeed the x=0.25 endpoint is the relevant maximum. This gives S(0.25) as approximately 0.13.
Conclusion: Assuming (rashly) all the games are independent, a result as lopsided as 6.5-3.5 or more under the stated hypothesis (no less than 50% expected draw rate) from equal players could be as insignificant as a 13% happenstance (one-sided p-value). Dropping the independence assumption should make the null hypothesis even more likely. Even assuming that the expected draw rate should be no less than 75% still leads to a p-value exceeding 0.05, similarly with a 7-3 result with at least 50% expected draws. The null hypothesis should (definitely) not be rejected on these data.
Caveat: Colour was ignored, this should have less of an effect on the conclusion than say the independence assumption.
Q. Boris and Viktor play a best-of-12 chess match that ends up 6.5 to 3.5 as the final score. Furthermore, when Boris and Viktor play each other, one can expect at least half the games to be draws. From these data how certain might you be that Boris is better than Viktor? Discuss.
A. [Other answers may also get full credit]
The independence hypothesis is that the 10 played games are independent. This, however, is likely to be false in practice, so the final numbers may need some fiddling, but we shall take it as a starting point.
The null hypothesis is that Boris and Viktor are of equal strength. However, one must also take draw rate into account. One way to do this is to consider hypotheses H(x), that Boris and Viktor are of equal strength and each wins x of the time with 1-2x the draw rate, for each x from 0 to 0.25 [the condition that no more than half the games are expected to be decisive] and take the maximum likelihood (other choices are possible, but maximal gives a bound if nothing else) that Boris beats Viktor by at least 6.5 to 3.5. In other words, compute the sum S(x) of 10! / (W! D! L!) * x^W * x^L * (1-2x)^D over (nonnegative) integers W,D,L with W+L+D=10 and W+D/2 >= 6.5, which is a trinomial method to compute a surrogate toward the observed results (which do not specify how many games were draws) given the null hypothesis. This will give a likelihood measure for the null hypothesis.
Computing, we find that S(x) is x^3 times a degree 7 polynomial. The critical root has x approximately as 0.4519, and indeed the x=0.25 endpoint is the relevant maximum. This gives S(0.25) as approximately 0.13.
Conclusion: Assuming (rashly) all the games are independent, a result as lopsided as 6.5-3.5 or more under the stated hypothesis (no less than 50% expected draw rate) from equal players could be as insignificant as a 13% happenstance (one-sided p-value). Dropping the independence assumption should make the null hypothesis even more likely. Even assuming that the expected draw rate should be no less than 75% still leads to a p-value exceeding 0.05, similarly with a 7-3 result with at least 50% expected draws. The null hypothesis should (definitely) not be rejected on these data.
Caveat: Colour was ignored, this should have less of an effect on the conclusion than say the independence assumption.