Page 1 of 1

Odds of a result

Posted: Sun Nov 24, 2013 5:53 am
by BB+
Stats 101 Exam, essay question:
Q. Boris and Viktor play a best-of-12 chess match that ends up 6.5 to 3.5 as the final score. Furthermore, when Boris and Viktor play each other, one can expect at least half the games to be draws. From these data how certain might you be that Boris is better than Viktor? Discuss.

A. [Other answers may also get full credit]
The independence hypothesis is that the 10 played games are independent. This, however, is likely to be false in practice, so the final numbers may need some fiddling, but we shall take it as a starting point.

The null hypothesis is that Boris and Viktor are of equal strength. However, one must also take draw rate into account. One way to do this is to consider hypotheses H(x), that Boris and Viktor are of equal strength and each wins x of the time with 1-2x the draw rate, for each x from 0 to 0.25 [the condition that no more than half the games are expected to be decisive] and take the maximum likelihood (other choices are possible, but maximal gives a bound if nothing else) that Boris beats Viktor by at least 6.5 to 3.5. In other words, compute the sum S(x) of 10! / (W! D! L!) * x^W * x^L * (1-2x)^D over (nonnegative) integers W,D,L with W+L+D=10 and W+D/2 >= 6.5, which is a trinomial method to compute a surrogate toward the observed results (which do not specify how many games were draws) given the null hypothesis. This will give a likelihood measure for the null hypothesis.

Computing, we find that S(x) is x^3 times a degree 7 polynomial. The critical root has x approximately as 0.4519, and indeed the x=0.25 endpoint is the relevant maximum. This gives S(0.25) as approximately 0.13.

Conclusion: Assuming (rashly) all the games are independent, a result as lopsided as 6.5-3.5 or more under the stated hypothesis (no less than 50% expected draw rate) from equal players could be as insignificant as a 13% happenstance (one-sided p-value). Dropping the independence assumption should make the null hypothesis even more likely. Even assuming that the expected draw rate should be no less than 75% still leads to a p-value exceeding 0.05, similarly with a 7-3 result with at least 50% expected draws. The null hypothesis should (definitely) not be rejected on these data.

Caveat: Colour was ignored, this should have less of an effect on the conclusion than say the independence assumption.

Re: Odds of a result

Posted: Sun Nov 24, 2013 10:27 pm
by ernest
BB+ wrote:Stats 101 Exam, essay question:
Is that really Stats 101 level ? ;)

Re: Odds of a result

Posted: Tue Nov 26, 2013 1:50 am
by BB+
OK: 10 coins are flipped, 3 of them land heads, the other 7 land on their edge. What is the probability of this event? :mrgreen:

Image
Image
http://search.dilbert.com/comic/Coin%20Flip

Re: Odds of a result

Posted: Tue Nov 26, 2013 2:49 am
by User923005
I doubt almost every premise {to some degree, small, but enough to be important}.
Given "one can expect at least half the games to be draws" this could be for a number of reasons.
First, the players might be equal.
Second, the players might both have a propensity to play for draws, though one is clearly stronger.
Third, a clearly inferior player might have special preparation to try to achieve draws. Consider, for instance, a player who is trying just to get a draw even when playing white.
Forth, it could be a thematic tournament where a given opening is very drawish so that even superior players might have trouble winning.
Many other possibilities that I did not consider.
I think that each of these demands a different model.

You mentioned assumption of independence, but I think it is fair to say that in a collection of games between the same opponents the games are clearly not independent. They players will use whatever they learn against their opponent in subsequent games. An obvious example is that if I win convincingly with some particular line, and I start out again to play that opening, the opponent is very likely to deviate unless they have found a brilliant counter-punch in subsequent study between the games.

This is a general stumbling block with any sort of experiment that involves biological entities. There are always a large number of unknown parameters to the experiment. And a very big difficulty with these sorts of things is also that it is very difficult to implement control experiments effectively.

On the other hand, we have to start somewhere with simplifying assumptions. On the other hand, there was a study some time ago that found a correlation on the number of tarantula spiders found in shipments of bananas and the probability of election of democrats verses republicans in US elections. My guess is that there really is no causal relationship.

Aside:
Psychics in news:rec.games.chess.computer back in the Deep Blue / Kasparov match time frame:
https://groups.google.com/forum/#!searc ... iFpmWyT7cJ