Page 1 of 4

Houdini routs Rybka to start, routs Rybka to end match

Posted: Mon Feb 07, 2011 9:40 pm
by notyetagm
Another tremendous example of the rather large strength disparity between Houdini 1.5a and Rybka 4.

Game 40 of their TCEC S1 Elite Match match, won by Houdini 23.5-16.5.

http://chessbomb.com/o/2011-tcecs1e/40- ... Houdini_a/

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Thu Feb 10, 2011 8:04 pm
by BB+
The link name says it all, from ChessVibes:
Free Houdini beats commercial Rybka 23.5-16.5
Not that I would have expected such a report if "dog had bitten man" instead, as it were. [Perhaps the best part is the ChessVibes link to a "ChessBase admits Fritz no longer best" post from 2008 :lol: ].

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Thu Feb 10, 2011 8:27 pm
by Martin Thoresen
Yes, Chessvibes put their article up today. :)

Best,
Martin

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Thu Feb 10, 2011 9:52 pm
by orgfert
notyetagm wrote:Another tremendous example of the rather large strength disparity between Houdini 1.5a and Rybka 4.

Game 40 of their TCEC S1 Elite Match match, won by Houdini 23.5-16.5.

http://chessbomb.com/o/2011-tcecs1e/40- ... Houdini_a/
I understand the reason for these matches, but I'm not a fan of them. For example, would you accept a result if Cray Blitz had been obliged to compete against a challenger, not running on a Cray, but on a PDP-11? No. So open up the hardware as far as the software can accommodate (and the tester can afford). That used to be the reason for WCCC and computer participation in human tournaments, i.e. to play one's best. But even this seems to have been taken over at ICGA by the grass-roots contra-design testing method, now the norm in cripple-ware rating lists.

You are taking another person's program and crippling it and then broadcasting to the world that it is weaker than another program. As much as one might think it a kind of justice (considering the diminished respect for the programmer of Rybka), it smells just as bad as the former status quo, IMHO.

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 12:00 am
by Jeremy Bernstein
orgfert wrote:
notyetagm wrote:Another tremendous example of the rather large strength disparity between Houdini 1.5a and Rybka 4.

Game 40 of their TCEC S1 Elite Match match, won by Houdini 23.5-16.5.

http://chessbomb.com/o/2011-tcecs1e/40- ... Houdini_a/
I understand the reason for these matches, but I'm not a fan of them. For example, would you accept a result if Cray Blitz had been obliged to compete against a challenger, not running on a Cray, but on a PDP-11? No. So open up the hardware as far as the software can accommodate (and the tester can afford). That used to be the reason for WCCC and computer participation in human tournaments, i.e. to play one's best. But even this seems to have been taken over at ICGA by the grass-roots contra-design testing method, now the norm in cripple-ware rating lists.

You are taking another person's program and crippling it and then broadcasting to the world that it is weaker than another program. As much as one might think it a kind of justice (considering the diminished respect for the programmer of Rybka), it smells just as bad as the former status quo, IMHO.
Houdini isn't running on it's strongest supported hardware either. I don't understand why people have a problem with the idea of reasonably equal conditions for "competitors" in a chess tournament. I mean, it's essentially a benchmark. If you have a 6-core machine, this is how Houdini or Rybka, respectively, will likely perform.

Jeremy

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 12:30 am
by hyatt
That is not a given. The parallel search itself influences the games. The program with the better parallel search will gain more from additional cores which will certainly produce an altered result...

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 12:41 am
by Jeremy Bernstein
hyatt wrote:That is not a given. The parallel search itself influences the games. The program with the better parallel search will gain more from additional cores which will certainly produce an altered result...
But that's sort of the point, right? Choose a reasonable "representative" platform and run everyone on it. If Crafty has kick-ass parallelization, we'll see how well it performs against a Komodo, which is currently SP, or a Houdini (which has a very strange parallelization scheme as far as I can tell) or a Rybka. But comparing 6-core parallel to 6-core parallel is more reasonable than comparing 6-core parallel against 200-core parallel.

Presuming that the core engine attributes are search and evaluation, how fast and how accurate given equivalent hardware chances, all other features are icing, and should certainly be allowed. But the idea that "Rybka" as a brand is only a 200-core monster is total bullshit, at least as long as Vas charges 50 dollars for a SP version of it.

There is a reason why Rybka doesn't compete in the WCSC. From the software perspective, it's not advancing by leaps and bounds.

Jeremy

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 1:09 am
by orgfert
Jeremy Bernstein wrote: But the idea that "Rybka" as a brand is only a 200-core monster is total bullshit, at least as long as Vas charges 50 dollars for a SP version of it. There is a reason why Rybka doesn't compete in the WCSC. From the software perspective, it's not advancing by leaps and bounds.
It is arbitrary to rip out the author's book, and impose your own, to limit the hardware to what you can afford, and then declare to the world one project is better than another on that basis. It's not right.

It would be better if the "list kiddies" would devote their hardware resources to a rating FICS. Then let any project that cares to participate submit their credentials for a free membership and obtain as many user IDs for as many versions of their project that they'd care to run. Hardware would be open of course. The rating FICS should set up mandatory participation in rated matches and rated swiss events for accounts as long as they are logged in. If the account is logged in, it must play. This would be understood up front. All other FICS rules for games, adjournments etc, would apply.

How many projects would voluntarily participate is unknown. But it would at least be a real rating based on the designer's intent instead of arbitrary and random according to some stranger's idea of egalitarianism.

As things stand now, list kiddies are arbitrarily crippling projects in favor of less sophisticated projects and then calling it a CC rating list.

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 1:58 am
by kingliveson
You could not get a more fairer tournament; equal hardware, the same openings (reverse color), equal time, etc.

There is no reason to believe that Rybka 4.0 would have still come out ahead of Houdini 1.5a with both engines using 200+ cores -- though it would probably be reasonable to say that Rybka has gotten a lot of development head start regarding parallel search given its available resources to test it.

Re: Houdini routs Rybka to start, routs Rybka to end match

Posted: Fri Feb 11, 2011 2:15 am
by hyatt
Jeremy Bernstein wrote:
hyatt wrote:That is not a given. The parallel search itself influences the games. The program with the better parallel search will gain more from additional cores which will certainly produce an altered result...
But that's sort of the point, right? Choose a reasonable "representative" platform and run everyone on it. If Crafty has kick-ass parallelization, we'll see how well it performs against a Komodo, which is currently SP, or a Houdini (which has a very strange parallelization scheme as far as I can tell) or a Rybka. But comparing 6-core parallel to 6-core parallel is more reasonable than comparing 6-core parallel against 200-core parallel.

Presuming that the core engine attributes are search and evaluation, how fast and how accurate given equivalent hardware chances, all other features are icing, and should certainly be allowed. But the idea that "Rybka" as a brand is only a 200-core monster is total bullshit, at least as long as Vas charges 50 dollars for a SP version of it.

There is a reason why Rybka doesn't compete in the WCSC. From the software perspective, it's not advancing by leaps and bounds.

Jeremy

The problem is, "which platform?" And then what about a program that uses GPUs? Or one with custom hardware ala' deep blue? Or one that runs on MIPS or something other than Intel? Or one that is NUMA-capable running on a bigger AMD box? The list goes on and on, so exactly which platform do you settle on.

There used to be an annual "uniform platform computer chess championship" run by Don Beal at Queen Mary in London. It died due to "lack of interest". It is interesting to test that way if you want to only compare the search/eval of a program. But not very fair if the programmer spent considerable effort on something else that doesn't get tested. Parallel search. Unusual architectural features. Special-purpose hardware. Etc...

The WCCC, in whatever form it ultimately survives in, should be completely open with respect to hardware. As the CCTs and ACCA events are. Rating lists I don't care about. They can test on an abacus if they want.