Is Houdini 3 Tactical the Strongest Chess Engine?

General discussion about computer chess...
mwyoung
Posts: 43
Joined: Thu Jan 05, 2012 1:13 am
Real Name: Mark Young

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by mwyoung » Wed Dec 26, 2012 10:41 pm

Don wrote:
mwyoung wrote:
Don wrote:
mwyoung wrote:
Odeus37 wrote:The CCRL 40/40 rating you are refering to is irrelevant : only 53 games yet for Houdini Tactical...

http://www.computerchess.org.uk/ccrl/40 ... t_all.html

Code: Select all

Name                         Rating   Elo+    Elo-   Score   Average Opponent   Draws   Games	
Houdini 3 Tactical 64-bit    3252     +70     −68    65.1%   −98.0              50.9%   53
Houdini 3 64-bit             3221     +20     −19    64.4%   −106.7             41.3%   750

It is not irrelevant to CCRL, they publish the data and rating.

The results themselves are not irrelevant to the question to the thread. Is Houdini 3 Tactical the Strongest Chess Engine? Since you do not need 100's of games to answer this question. You only need 100's or 1000's of games if the elo's are very close, or if you are trying to get very exact elo ratings.
Actually you tens of thousands of games if the ELO's are close. For tactics you probably need thousands of positions unless the difference is very clear.

What we are asking Is Houdini 3 Tactical the Strongest Chess Engine? If we don't care by the exact amount, the data is very relevant.

Even with only 50 games and since the elo difference is not very close, you can say even after 50 games that is much more likely that Houdini 3 Tactical is stronger at 40/40 then Houdini 3. We just can't have high confidence in the exact value after 50 games.
I seriously doubt we can get a legitimate answer to the question of which program is best tactically. We have to define tactics and we usually go by how it does on some tactical test suite and the value of that I have doubt in. It's probably a good starting guess however. In other words it is hard to get a "number" that we can clearly agree means something.
No one is asking which program is best tactically, for what ever people think that means.

We are asking which program plays the overall strongest chess game at longer time controls.

CCRL Rating data right now suggest it is more likely then not, and that is all that can be said after 53 games. That Houdini 3 Tactical may play the stronger game at longer time controls of 40/40. Using the CCRL Rating testing protocol.

The understanding of statistics is not well understood by many here. Some think you not say anything about the rating data unless many hundreds or thousands of games have been played.

What is important is the question you are trying to answer with the existing data.

And extreme example of this would be if I wanted to know if program A is stronger then program B. And I don't care by how much. I can get an answer to this question in as little as 7 games with very good confidence. If program A beats program B 7-0 for example. This is a statistically meaningful result to answering that question. And in this example I can say with high confidence that program A is the stronger program.

But many here only see that it takes many hundreds or thousands of games to answer any rating questions.
Which is why I said, "Actually you need tens of thousands of games if the ELO's are close."

In the human chess rating world this we be laughable, no one goes around saying GM Carlsen is not proven to be the best rated player because he has not played many thousands of games. And I certainly don't go around saying I could be the strongest player in the world, because I have not played GM Carlsen enough games yet. That is not how the ratings system was intended to work. With everything at 99.99...% confidence :)
I don't want to burst your bubble here but Calsen is not best with any serious confidence. It's clear statistically that he is among the very top but that is all you can say. That doesn't mean that he is NOT the best player in the world, it is just to say we cannot say that with a lot of confidence. Few people will deny he is best but that is because most people go by the hype. He has had a nice record, winning tournaments in grand style (and all sorts of noise is made over this) but there are still 2 or 3 players within 20 ELO of him. Statistically you just cannot say. Once you appear on the FIDE list as rank number 1 and hold it for a little time, then it's assumed that you "must" be the strongest player in the world. But statistically that is a nonsense claim. It's sort of like the winner of Wimbledon or the super bowl. They are clearly declared the "best of the best" and it's believed but in reality to win these things you have to be among the best and have some good fortune too because the second or third guy or team had a real shot too.

Below Carlsen there is a sudden drop after the top 3 or 4. So it's VERY clear he is in the top 5. It's almost a sure thing he belongs in the top 3 and there is a "good chance" he deserves to be called the best player. But 1 or 2 tournaments could easily turn this around and he could swap places with Anand for example. We are simply victims of human perception and when things happen slowly we assign permanence to them, especially when they are hyped and hailed as being true. If there are 2 tournaments in a row and Carlsen wins them both over Anand it's easy to say that Carlsen proved his superiority but the best we can say is that yes, in these 2 events he got better results.

I would mention that Carlsen is still improving but he is not world champion. Does that prove he is not best? Nope, for the same reason.

"Fun fact: Over 10 million chess games were played for the development and tuning of Houdini 3!"

Except for maybe Robert Houdart who tested Houdini 3 with over 10 million games :)

Don your not bursting my bubble, I agree 100%.

But if you are going to be extreme with the data, you not going to be able to say anything.

In the example of GM Carlsen, it is possible that any of the top 100 or more players is truly the strongest player, if you want to work only with extreme confidence levels.

What can be said in the ratings data, and this applies to any elo rating list when you have an established rating. When you are the higher rated player, you are more likely then not the stronger player. And the wider the rating spread the more likely this is to be true.

In the GM Carlsen example it is more likely then not GM Carlsen is the strongest chess player in the human world.

This is how the ratings are seen by people, and they are not wrong in saying GM Carlsen is the best human chess player in the world. Because this is the most likely scenario with the given data we have.

Holding the WC tile in chess is not relevant.

User avatar
Don
Posts: 42
Joined: Thu Dec 30, 2010 12:28 am
Real Name: Don Dailey

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by Don » Wed Dec 26, 2012 11:45 pm

mwyoung wrote:
Don your not bursting my bubble, I agree 100%.

But if you are going to be extreme with the data, you not going to be able to say anything.

In the example of GM Carlsen, it is possible that any of the top 100 or more players is truly the strongest player, if you want to work only with extreme confidence levels.
It's a question of being reasonable here, the error margin with Carlsen huge and it's not reasonable to pretend he is best. I don't work with extreme confidence levels, only what is reasonable for the situation at hand.

What can be said in the ratings data, and this applies to any elo rating list when you have an established rating. When you are the higher rated player, you are more likely then not the stronger player. And the wider the rating spread the more likely this is to be true.

In the GM Carlsen example it is more likely then not GM Carlsen is the strongest chess player in the human world.
Yes, of course. What I'm saying is wrong is to consider him the strongest player - there is not enough reasonable evidence to say that. I'm not talking about extreme confidence levels but just reasonable ones.

This is how the ratings are seen by people, and they are not wrong in saying GM Carlsen is the best human chess player in the world. Because this is the most likely scenario with the given data we have.
Yes, people usually call him the best player in the world because they define that by rating. I follow tennis and sometimes there will be a tournament where the top ranked player slides down a notch. I have even seen a player become number one even though he didn't even play in the same tournament - just because he was close and the top player loses. They will then say that this player is the new number one player.

Most people will not think twice and just say that this play is the "best" player in the world - not making a distinction between being "called" number one because of a number or actually being best.

In testing I will sometimes see the better program trailing a match for a long period of time - it reminds me that having a slightly higher ELO doesn't mean you are best.

So if you call Carlsen the number one player that is accurate and not subject to doubt - he has the highest rating. If you call him the "best" player in the world that is subject to a lot of noise, and not just nitpicking a little noise but a lot of noise.

Holding the WC tile in chess is not relevant.
Agreed. But a lot of people would not agree with us on this.

Don

mwyoung
Posts: 43
Joined: Thu Jan 05, 2012 1:13 am
Real Name: Mark Young

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by mwyoung » Thu Dec 27, 2012 1:54 am

If this is only noise, I wish my speakers were this loud :) I could run the rating pgn file myself to see the stats, but with over a +51 point rating spread it is kind of pointless. It would be an extreme argument to try and make that GM Carlsen is anything other then the best player by rating and test. IMO.

1 Carlsen Norway 2861.4 +13.4 8
22 (30.11.1990)
2 1 Kramnik Russia 2809.7 +14.7 8
37 (25.06.1975)
3 1 Aronian Armenia 2802.2 −12.8 8
30 (06.10.1982)
4 Radjabov Azerbaijan 2793.0 0.0 0
25 (12.03.1987)
5 Caruana Italy 2780.6 −1.4 11
20 (30.07.1992)

User avatar
Don
Posts: 42
Joined: Thu Dec 30, 2010 12:28 am
Real Name: Don Dailey

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by Don » Thu Dec 27, 2012 3:41 am

mwyoung wrote:If this is only noise, I wish my speakers were this loud :) I could run the rating pgn file myself to see the stats, but with over a +51 point rating spread it is kind of pointless. It would be an extreme argument to try and make that GM Carlsen is anything other then the best player by rating and test. IMO.
Why are you making such a big deal out of this? Whatever the current rating is is what it is. The December 2012 FIDE list shows 33 ELO advantage so I don't know what list you are looking at. But 33 ELO with error margins for him as well as the Krammik and Aronian makes it not completely clear who is best. But it's enough to indicate that he probably is the best player. It's not more complicated than that.

http://ratings.fide.com/top.phtml

I'm not trying to disagree with you but you seem to want to turn it into a disagree or argument that you must win. So just relax.

1 Carlsen Norway 2861.4 +13.4 8
22 (30.11.1990)
2 1 Kramnik Russia 2809.7 +14.7 8
37 (25.06.1975)
3 1 Aronian Armenia 2802.2 −12.8 8
30 (06.10.1982)
4 Radjabov Azerbaijan 2793.0 0.0 0
25 (12.03.1987)
5 Caruana Italy 2780.6 −1.4 11
20 (30.07.1992)

User avatar
Don
Posts: 42
Joined: Thu Dec 30, 2010 12:28 am
Real Name: Don Dailey

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by Don » Thu Dec 27, 2012 3:41 am

mwyoung wrote:If this is only noise, I wish my speakers were this loud :) I could run the rating pgn file myself to see the stats, but with over a +51 point rating spread it is kind of pointless. It would be an extreme argument to try and make that GM Carlsen is anything other then the best player by rating and test. IMO.
Why are you making such a big deal out of this? Whatever the current rating is is what it is. The December 2012 FIDE list shows 33 ELO advantage so I don't know what list you are looking at. But 33 ELO with error margins for him as well as the Krammik and Aronian makes it not completely clear who is best. But it's enough to indicate that he probably is the best player. It's not more complicated than that.

http://ratings.fide.com/top.phtml

I'm not trying to disagree with you but you seem to want to turn it into a disagree or argument that you must win.

1 Carlsen Norway 2861.4 +13.4 8
22 (30.11.1990)
2 1 Kramnik Russia 2809.7 +14.7 8
37 (25.06.1975)
3 1 Aronian Armenia 2802.2 −12.8 8
30 (06.10.1982)
4 Radjabov Azerbaijan 2793.0 0.0 0
25 (12.03.1987)
5 Caruana Italy 2780.6 −1.4 11
20 (30.07.1992)

User avatar
Don
Posts: 42
Joined: Thu Dec 30, 2010 12:28 am
Real Name: Don Dailey

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by Don » Thu Dec 27, 2012 3:41 am

mwyoung wrote:If this is only noise, I wish my speakers were this loud :) I could run the rating pgn file myself to see the stats, but with over a +51 point rating spread it is kind of pointless. It would be an extreme argument to try and make that GM Carlsen is anything other then the best player by rating and test. IMO.
Why are you making such a big deal out of this? Whatever the current rating is is what it is. The December 2012 FIDE list shows 33 ELO advantage so I don't know what list you are looking at. But 33 ELO with error margins for him as well as the Krammik and Aronian makes it not completely clear who is best. But it's enough to indicate that he probably is the best player. It's not more complicated than that.

http://ratings.fide.com/top.phtml

I'm not trying to disagree with you but you seem to want to turn it into a disagree or argument that you must win. So just relax.

1 Carlsen Norway 2861.4 +13.4 8
22 (30.11.1990)
2 1 Kramnik Russia 2809.7 +14.7 8
37 (25.06.1975)
3 1 Aronian Armenia 2802.2 −12.8 8
30 (06.10.1982)
4 Radjabov Azerbaijan 2793.0 0.0 0
25 (12.03.1987)
5 Caruana Italy 2780.6 −1.4 11
20 (30.07.1992)

mwyoung
Posts: 43
Joined: Thu Jan 05, 2012 1:13 am
Real Name: Mark Young

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by mwyoung » Thu Dec 27, 2012 4:40 pm

Don,

Here is the web site aka the live chess ratings list. Were the all 2700+ players ratings are calculated as if fide were putting out a list daily.
Rating are calculated per game using the fide rating protocols.

http://www.2700chess.com/

User avatar
Don
Posts: 42
Joined: Thu Dec 30, 2010 12:28 am
Real Name: Don Dailey

Re: Is Houdini 3 Tactical the Strongest Chess Engine?

Post by Don » Thu Dec 27, 2012 5:09 pm

mwyoung wrote:Don,

Here is the web site aka the live chess ratings list. Were the all 2700+ players ratings are calculated as if fide were putting out a list daily.
Rating are calculated per game using the fide rating protocols.

http://www.2700chess.com/
I have to ask, what is the point of this? You are ignoring my primary point simply because you found some very recent data that indicates that Carlsen may be pulling ahead, something I have expected for a long time in view of his young age. I'm not trying to dispute the fact that he has the highest rating or even that he MAY be the best player. Maybe that changed in the past 2 or 3 weeks since the FIDE list as your data indicates a recent surge - but it would not invalidate my point. If you wait long enough it is likely that Carlsen WILL be another Fischer or Kasparov, one of those players that undisputedly surpasses everyone else. I think because everyone expects this to happen they believe it happened already a year ago.

I have followed Carlsen for the past 2 years and I expect him to surpass everyone too. What I would LIKE to see and what is actually there are two different thing however. Your data makes me happy. But you have to know I'm right because if he really were so dominant he should be world champion, which he is not. A truly dominant player will become world champion, one that is not only has a chance to be. Maybe he is now at the point where he can do it, I hope that is the case. But I need more than one "hot month" for me to say so.

Post Reply