Houdini routs Rybka to start, routs Rybka to end match

General discussion about computer chess...
Jeremy Bernstein
Site Admin
Posts: 1226
Joined: Wed Jun 09, 2010 7:49 am
Real Name: Jeremy Bernstein
Location: Berlin, Germany
Contact:

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by Jeremy Bernstein » Fri Feb 11, 2011 10:16 pm

orgfert wrote:What can be afforded isn't interesting and doesn't matter if things are done properly. A rating FICS as suggested above could provide real ratings on diverse hardware with configuration accounts like Sjeng200cpu, Sjeng-1cpu, crafty23.4-16cpu, crafty23.4-2cpu, houdini2.0-2cpu, etc. No crippleware, no arbitrary settings. Real games between design-intended setups. Who cares if program A couldn't afford a cluster? It doesn't matter when Sjeng-1cpu can be compared to program-A-1cpu and then they can both be compared to sjeng-200cpu, because they are all in the same rating pool on the same rating FICS.

THAT would be a rating list.
I agree. Totally. Let's keep on dreaming that dream, then...

Jeremy

orgfert
Posts: 183
Joined: Fri Jun 11, 2010 5:35 pm
Real Name: Mark Tapley

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by orgfert » Fri Feb 11, 2011 10:23 pm

Martin Thoresen wrote:I think your posts doesn't make sense at all. Dumbed-down Rybka? Your analogy is the same as saying that a car with 300 hp which is driven legally at 100 km/h on the highway is dumbed down because it can theoretically run at 250 km/h.
I suppose a better analogy would be putting the same motor in two different automobile designs and then racing them. It makes no sense. The goal is the best chess possible with an intended design. For a match to mean something, the programmers should put forward their best setup. This is probably why interest was minimal in a uniform platform world championship. After all, why bother to win a crippleware world title? Ergo, why bother to take interest in yet another crippleware chess match? Such matches are done ad infinitum in CCRL and other places, and the resulting Elos touted as illumination. Dingoes kidneys.

People are rejoicing in Rybka's losses thinking that relative strength is being accurately represented when the Rybka designers have obviously spent much of their design capital in massive hardware capability. Your match is negating that capital and calling it all fair. And the limits of practicality aren't really an excuse to call it fair either.

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by Odeus37 » Fri Feb 11, 2011 10:30 pm

Orgfert, you are impressive ! :D But I am almost sure that if you buy Martin one month of rybka cluster time, he would agree to organize a nice match against houdini on his hardware ! :lol:

Now, if you really can't understand TCEC's goal, we can't really do much for you...

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by hyatt » Fri Feb 11, 2011 10:34 pm

kingliveson wrote:
hyatt wrote:
kingliveson wrote:You could not get a more fairer tournament; equal hardware, the same openings (reverse color), equal time, etc.

There is no reason to believe that Rybka 4.0 would have still come out ahead of Houdini 1.5a with both engines using 200+ cores -- though it would probably be reasonable to say that Rybka has gotten a lot of development head start regarding parallel search given its available resources to test it.
Unfortunately, this is a classic "non-author" claim. As a simple counter-point. For Cray Blitz, and for Crafty today, we _know_ that there are some openings we don't play particularly well, for several reasons. We could stop and try to solve those, or we can just avoid those kinds of positions for the moment and keep working. In my testing of Crafty, I don't pick and choose openings, I am trying to test/tune for "all" positions. But in a tournament, playing "all" positions can cost you 200 Elo or more. It would not be unexpected for a program outside the top ten on a rating list using a "single book" to win a WCCC event due to superior opening choices that led the programs into positions where it plays exceptionally well, while avoiding those where it plays exceptionally badly.
I don't think that you can look at it that way because here we are dealing with A.I. and its ability to adjust with both engines given the same legal chess starting positions. I see it to be more or less a generic test of the programs. It is simply saying what can engine A or B do given these sets of conditions.

Certainly, as a human chess player, I don't play every opening when I play the game. I have favourites that I have studied and understand, and I play those when the games mean something, as in a tournament as opposed to a club meeting playing 5 minute chess. If a human doesn't let you tell -him- which openings he must play, why is this then OK for a computer event? But wait, humans _do_ let you tell them if you decide to organize a "thematic tournament." But then everyone knows that is not a true indicator of overall chess skill. What about a program that has a new way of pondering so that it spends time on several moves. And you turn that off in a "no-ponder" tournament and disable a new idea that might be worth a significant number of Elo points.


These are programs capable of that which no human can today, so you can't really make that comparisons. In computer tournaments as you mentioned, using an opening book is allowed, but such is not the case with human chess. Besides, it would be a great idea to set up a tournament in which starting positions are randomly selected and watch top GMs battle it out.
Last time I played, I had memorized thousands of opening moves. How is that different from what the computer does? You certainly can't tell me "hey, I am white, I am going to play the fried liver attack, and you can't use the various lines you have memorized, to avoid the deep traps, so you have to play on your own..."



Those who would argue that the test is as fair as can be for generic strength, definitely could say the engines are using default settings. For example, pondering by default according to UCI protocol is not on. So we have to take the tournament for what it is which is not optimal configuration for either program.
So now a protocol gets to decide default? Default in Crafty is _on_. Why would I add a feature and then have it default to "off"? That would imply the feature makes it play worse... Wait. I do have such a feature. The skill command. Defaults to "full strength" of course.



While at it, why not arbitrarily change the value of a queen to 8 for all programs. That would still be "equal", correct? This idea that it is ok to arbitrarily disable something if you disable it for all is basically flawed. And that is why many criticize the approach.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by hyatt » Fri Feb 11, 2011 10:38 pm

Jeremy Bernstein wrote:
orgfert wrote:A rating list of arbitrarily non-optimally set-up programs yields nothing more significant than a sequence of artificially random numbers.
That's a reductio ad absurdum. They just aren't testing what you think is important.

I agree with you, to a point -- engines should be able to showcase their strengths beyond the realm of search and evaluation. But it's decidedly unfair to allow Engine A to compete with 200 cores against Engine B on 8 and expect the results to be significant, either. In the end, you don't know whether it was the engine or the hardware that won.

Play with own opening book (in case available)? Sure.
Play with ponder? Sure.
Play using best software settings? Sure (presuming the manufacturer can provide them, why not).

But it's obviously unfair to allow developers who can afford massive hardware to use it against developers forced to run on ordinary machines. Why is that so hard to swallow? "Magnus Carlson doesn't leave half of his brain at home when he goes to Wijk Aan Zee" is sophistry: Magnus Carlson is not configurable software, capable of running on a wide variety of hardware. I understand wanting to show your best stuff in a competition, but there should be limits, because it's highly disadvantageous to those developers who cannot afford a private or university cluster to play against those who can.

From the perspective of engine authors, I can understand not wanting their engine to be reduced to a benchmarking tool. But I think that there are ways to accommodate this wish, without destroying the potential for reasonably equal chances for all contenders.

Jeremy

DO you _really_ believe that all human brains are equal? Hint: look up the cellular analysis on Einstein's brain tissue (which was preserved, by the way). Do you have to make such a player play without his left brain, or without a frontal lobe? You just play with the tools you are given. Not all sprinters have the same muscle make-up. Some have better angles of tendons to muscles to bone attachment points to give them more leverage, or more speed, depending.

Everything on this planet is _not_ created equal. And it is actually impossible to design a completely equal playing field for computer chess, since you have to turn off every feature not in any single program, leaving you with a very vanilla (and very uninteresting) chess match.

Jeremy Bernstein
Site Admin
Posts: 1226
Joined: Wed Jun 09, 2010 7:49 am
Real Name: Jeremy Bernstein
Location: Berlin, Germany
Contact:

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by Jeremy Bernstein » Fri Feb 11, 2011 11:28 pm

hyatt wrote:
Jeremy Bernstein wrote:
orgfert wrote:A rating list of arbitrarily non-optimally set-up programs yields nothing more significant than a sequence of artificially random numbers.
That's a reductio ad absurdum. They just aren't testing what you think is important.

I agree with you, to a point -- engines should be able to showcase their strengths beyond the realm of search and evaluation. But it's decidedly unfair to allow Engine A to compete with 200 cores against Engine B on 8 and expect the results to be significant, either. In the end, you don't know whether it was the engine or the hardware that won.

Play with own opening book (in case available)? Sure.
Play with ponder? Sure.
Play using best software settings? Sure (presuming the manufacturer can provide them, why not).

But it's obviously unfair to allow developers who can afford massive hardware to use it against developers forced to run on ordinary machines. Why is that so hard to swallow? "Magnus Carlson doesn't leave half of his brain at home when he goes to Wijk Aan Zee" is sophistry: Magnus Carlson is not configurable software, capable of running on a wide variety of hardware. I understand wanting to show your best stuff in a competition, but there should be limits, because it's highly disadvantageous to those developers who cannot afford a private or university cluster to play against those who can.

From the perspective of engine authors, I can understand not wanting their engine to be reduced to a benchmarking tool. But I think that there are ways to accommodate this wish, without destroying the potential for reasonably equal chances for all contenders.

Jeremy

DO you _really_ believe that all human brains are equal? Hint: look up the cellular analysis on Einstein's brain tissue (which was preserved, by the way). Do you have to make such a player play without his left brain, or without a frontal lobe? You just play with the tools you are given. Not all sprinters have the same muscle make-up. Some have better angles of tendons to muscles to bone attachment points to give them more leverage, or more speed, depending.

Everything on this planet is _not_ created equal. And it is actually impossible to design a completely equal playing field for computer chess, since you have to turn off every feature not in any single program, leaving you with a very vanilla (and very uninteresting) chess match.
Nope, but I bet that Lance Armstrong can kick your ass, whether he rides a Huffy, a Cannondale, a tricyle or a unicyle. Assuming that you're on the same hardware. I'm not interesting in suppressing Lance's ability, just saying -- everyone rides the same bike.

Your position is one of exaggeration -- it's the CC equivalent of "there goes the neighborhood".

Jeremy

orgfert
Posts: 183
Joined: Fri Jun 11, 2010 5:35 pm
Real Name: Mark Tapley

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by orgfert » Sat Feb 12, 2011 12:10 am

Jeremy Bernstein wrote:Nope, but I bet that Lance Armstrong can kick your ass, whether he rides a Huffy, a Cannondale, a tricyle or a unicyle. Assuming that you're on the same hardware. I'm not interesting in suppressing Lance's ability, just saying -- everyone rides the same bike.
Bikes today are custom built for particular physiques at the top level. Your idea to put the best physiques on identical bikes that only amateurs can afford would be considered unfair by the riders and uninteresting by people who want to see top cycling.

orgfert
Posts: 183
Joined: Fri Jun 11, 2010 5:35 pm
Real Name: Mark Tapley

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by orgfert » Sat Feb 12, 2011 12:18 am

Odeus37 wrote:Now, if you really can't understand TCEC's goal, we can't really do much for you...
The goal of TCEC is to wave Houdini in Vas's face. I grok that motive. I don't need your help which is not to say you all don't need some. ;)

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by Odeus37 » Sat Feb 12, 2011 1:58 am

If you really want to see the overpriced (and not selled) cluster rykba in TCEC, like I said, buy a few weeks to Martin ! I dunno, why, I am so sure you won't Mr. Angry ! :D I doubt he would refuse to setup a match if you do so.

If you don't want to buy those, don't come and whine then that rybka have to play with the same hardware (which is an excellent one) than all the others engines in TCEC...

But the worse is I am not even sure the 40 cores rybka cluster would win against the overclocked 6 cores i7 980x running Houdini, but at least, this way, you could give your god Vas some $! ;)

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: Houdini routs Rybka to start, routs Rybka to end match

Post by Adam Hair » Sat Feb 12, 2011 3:01 am

orgfert wrote:
Martin Thoresen wrote:I think your posts doesn't make sense at all. Dumbed-down Rybka? Your analogy is the same as saying that a car with 300 hp which is driven legally at 100 km/h on the highway is dumbed down because it can theoretically run at 250 km/h.
I suppose a better analogy would be putting the same motor in two different automobile designs and then racing them. It makes no sense. The goal is the best chess possible with an intended design. For a match to mean something, the programmers should put forward their best setup. This is probably why interest was minimal in a uniform platform world championship. After all, why bother to win a crippleware world title? Ergo, why bother to take interest in yet another crippleware chess match? Such matches are done ad infinitum in CCRL and other places, and the resulting Elos touted as illumination. Dingoes kidneys.
Perhaps you should keep looking for the proper analogy. Try putting two drivers in identical cars and then let them
race. Obviously, that is nowhere near as exciting to you. Yet, it may illuminate the drivers' ability better than
having them race in their own cars. In the latter scenario, there is the question of what was most important in
winning the race. Was it the car or the driver?

Obviously, I am in the minority as far as the purpose of a rating list. I keep seeing rating lists and
competitions being lumped together. Perhaps the term rating list should be abandoned and everybody
start using the term ranking list. That would be more consistent with most people's idea of what the lists
are. Pity that some of the people involved with the lists don't see it that way. Of course, if the true purpose of
a list is to help show the level an engine has obtained rather than to show which engine is better than another
engines, then the current terminology is appropriate.

Post Reply