Don wrote:Which is why I said, "Actually you need tens of thousands of games if the ELO's are close."mwyoung wrote:No one is asking which program is best tactically, for what ever people think that means.Don wrote:Actually you tens of thousands of games if the ELO's are close. For tactics you probably need thousands of positions unless the difference is very clear.mwyoung wrote:Odeus37 wrote:The CCRL 40/40 rating you are refering to is irrelevant : only 53 games yet for Houdini Tactical...
http://www.computerchess.org.uk/ccrl/40 ... t_all.html
Code: Select all
Name Rating Elo+ Elo- Score Average Opponent Draws Games Houdini 3 Tactical 64-bit 3252 +70 −68 65.1% −98.0 50.9% 53 Houdini 3 64-bit 3221 +20 −19 64.4% −106.7 41.3% 750
It is not irrelevant to CCRL, they publish the data and rating.
The results themselves are not irrelevant to the question to the thread. Is Houdini 3 Tactical the Strongest Chess Engine? Since you do not need 100's of games to answer this question. You only need 100's or 1000's of games if the elo's are very close, or if you are trying to get very exact elo ratings.
I seriously doubt we can get a legitimate answer to the question of which program is best tactically. We have to define tactics and we usually go by how it does on some tactical test suite and the value of that I have doubt in. It's probably a good starting guess however. In other words it is hard to get a "number" that we can clearly agree means something.
What we are asking Is Houdini 3 Tactical the Strongest Chess Engine? If we don't care by the exact amount, the data is very relevant.
Even with only 50 games and since the elo difference is not very close, you can say even after 50 games that is much more likely that Houdini 3 Tactical is stronger at 40/40 then Houdini 3. We just can't have high confidence in the exact value after 50 games.
We are asking which program plays the overall strongest chess game at longer time controls.
CCRL Rating data right now suggest it is more likely then not, and that is all that can be said after 53 games. That Houdini 3 Tactical may play the stronger game at longer time controls of 40/40. Using the CCRL Rating testing protocol.
The understanding of statistics is not well understood by many here. Some think you not say anything about the rating data unless many hundreds or thousands of games have been played.
What is important is the question you are trying to answer with the existing data.
And extreme example of this would be if I wanted to know if program A is stronger then program B. And I don't care by how much. I can get an answer to this question in as little as 7 games with very good confidence. If program A beats program B 7-0 for example. This is a statistically meaningful result to answering that question. And in this example I can say with high confidence that program A is the stronger program.
But many here only see that it takes many hundreds or thousands of games to answer any rating questions.I don't want to burst your bubble here but Calsen is not best with any serious confidence. It's clear statistically that he is among the very top but that is all you can say. That doesn't mean that he is NOT the best player in the world, it is just to say we cannot say that with a lot of confidence. Few people will deny he is best but that is because most people go by the hype. He has had a nice record, winning tournaments in grand style (and all sorts of noise is made over this) but there are still 2 or 3 players within 20 ELO of him. Statistically you just cannot say. Once you appear on the FIDE list as rank number 1 and hold it for a little time, then it's assumed that you "must" be the strongest player in the world. But statistically that is a nonsense claim. It's sort of like the winner of Wimbledon or the super bowl. They are clearly declared the "best of the best" and it's believed but in reality to win these things you have to be among the best and have some good fortune too because the second or third guy or team had a real shot too.
In the human chess rating world this we be laughable, no one goes around saying GM Carlsen is not proven to be the best rated player because he has not played many thousands of games. And I certainly don't go around saying I could be the strongest player in the world, because I have not played GM Carlsen enough games yet. That is not how the ratings system was intended to work. With everything at 99.99...% confidence
Below Carlsen there is a sudden drop after the top 3 or 4. So it's VERY clear he is in the top 5. It's almost a sure thing he belongs in the top 3 and there is a "good chance" he deserves to be called the best player. But 1 or 2 tournaments could easily turn this around and he could swap places with Anand for example. We are simply victims of human perception and when things happen slowly we assign permanence to them, especially when they are hyped and hailed as being true. If there are 2 tournaments in a row and Carlsen wins them both over Anand it's easy to say that Carlsen proved his superiority but the best we can say is that yes, in these 2 events he got better results.
I would mention that Carlsen is still improving but he is not world champion. Does that prove he is not best? Nope, for the same reason.
"Fun fact: Over 10 million chess games were played for the development and tuning of Houdini 3!"
Except for maybe Robert Houdart who tested Houdini 3 with over 10 million games
Don your not bursting my bubble, I agree 100%.
But if you are going to be extreme with the data, you not going to be able to say anything.
In the example of GM Carlsen, it is possible that any of the top 100 or more players is truly the strongest player, if you want to work only with extreme confidence levels.
What can be said in the ratings data, and this applies to any elo rating list when you have an established rating. When you are the higher rated player, you are more likely then not the stronger player. And the wider the rating spread the more likely this is to be true.
In the GM Carlsen example it is more likely then not GM Carlsen is the strongest chess player in the human world.
This is how the ratings are seen by people, and they are not wrong in saying GM Carlsen is the best human chess player in the world. Because this is the most likely scenario with the given data we have.
Holding the WC tile in chess is not relevant.