Don wrote:mwyoung wrote:Odeus37 wrote:The CCRL 40/40 rating you are refering to is irrelevant : only 53 games yet for Houdini Tactical...
http://www.computerchess.org.uk/ccrl/40 ... t_all.html
Code: Select all
Name Rating Elo+ Elo- Score Average Opponent Draws Games
Houdini 3 Tactical 64-bit 3252 +70 −68 65.1% −98.0 50.9% 53
Houdini 3 64-bit 3221 +20 −19 64.4% −106.7 41.3% 750
It is not irrelevant to CCRL, they publish the data and rating.
The results themselves are not irrelevant to the question to the thread. Is Houdini 3 Tactical the Strongest Chess Engine? Since you do not need 100's of games to answer this question. You only need 100's or 1000's of games if the elo's are very close, or if you are trying to get very exact elo ratings.
Actually you tens of thousands of games if the ELO's are close. For tactics you probably need thousands of positions unless the difference is very clear.
What we are asking Is Houdini 3 Tactical the Strongest Chess Engine? If we don't care by the exact amount, the data is very relevant.
Even with only 50 games and since the elo difference is not very close, you can say even after 50 games that is much more likely that Houdini 3 Tactical is stronger at 40/40 then Houdini 3. We just can't have high confidence in the exact value after 50 games.
I seriously doubt we can get a legitimate answer to the question of which program is best tactically. We have to define tactics and we usually go by how it does on some tactical test suite and the value of that I have doubt in. It's probably a good starting guess however. In other words it is hard to get a "number" that we can clearly agree means something.
No one is asking which program is best tactically, for what ever people think that means.
We are asking which program plays the overall strongest chess game at longer time controls.
CCRL Rating data right now suggest it is more likely then not, and that is all that can be said after 53 games. That Houdini 3 Tactical may play the stronger game at longer time controls of 40/40. Using the CCRL Rating testing protocol.
The understanding of statistics is not well understood by many here. Some think you not say anything about the rating data unless many hundreds or thousands of games have been played.
What is important is the question you are trying to answer with the existing data.
And extreme example of this would be if I wanted to know if program A is stronger then program B. And I don't care by how much. I can get an answer to this question in as little as 7 games with very good confidence. If program A beats program B 7-0 for example. This is a statistically meaningful result to answering that question. And in this example I can say with high confidence that program A is the stronger program.
But many here only see that it takes many hundreds or thousands of games to answer any rating questions.
In the human chess rating world this we be laughable, no one goes around saying GM Carlsen is not proven to be the best rated player because he has not played many thousands of games. And I certainly don't go around saying I could be the strongest player in the world, because I have not played GM Carlsen enough games yet. That is not how the ratings system was intended to work. With everything at 99.99...% confidence
"Fun fact: Over 10 million chess games were played for the development and tuning of Houdini 3!"
Except for maybe Robert Houdart who tested Houdini 3 with over 10 million games