OpenChess

Posted: **Fri Jul 23, 2010 6:33 am**

Usual testing conditions using a time control of 4" CTPM.

   1 Houdini 1.03     9 1603.0 (852.0 : 751.0)
                        1603.0 (852.0 : 751.0) Rybka 4         -9
   2 Rybka 4         -9 1603.0 (751.0 : 852.0)
                        1603.0 (751.0 : 852.0) Houdini 1.03     9

Rank Name           Elo    +    - games score oppo. draws 
   1 Houdini 1.03     9    7    7  1603   53%    -9   52% 
   2 Rybka 4         -9    7    7  1603   47%     9   52%

For a Likelihood of Superiority of 99.95%

From the days of Houdini 1.02 I have:

Code: Select all

Rank Name               Elo    +    - games score oppo. draws 
   1 Deep Rybka 4 x64     1    8    8  1181   50%    -1   52% 
   2 Houdini_x64_4CPU    -1    8    8  1181   50%     1   52%

where Houdini 1.02 was indistinguishable from R4 in terms of strength.

If you calculate the raw Elo numbers and compare, it looks like Houdini 1.03 is ~19.57 Elo improved.

Posted: **Fri Jul 23, 2010 3:05 pm**

I get a much closer result -- and I am pretty certain Houdini 1.03 has not improved by more than 10 elo.

Using suite of positions (ACH basic)
128HHash 4 cpu ponder=off tc=0+3s on Arena in Winxp 32 bit. . Rybka 4 with 3-4 and select 5 man tablebases.
Houdini_w32_4CPU_1.03a - Rybka 4 w32 : 45.5.0/92 20-21-51 Houdini is behind by 1 game

I set split depth to 12 since that seemed to give better performance:

Same conditions but using Nunn 20 position suite:

Houdini_w32_4CPU_1.03a - Rybka 4 w32 : 23.5/40 13-6-21 (=1011==1==0======1111011==0====100===11=) 59%

Above using slit depth 12 as compared to default split depth result below:

Rybka 4 w32 - Houdini_w32_4CPU_1.03a : 21.0/40 12-10-18 (1===1==01==10=10==1=01=01=0=11=00==0=110) 53%

Looks like the Nunn suite seems to be good for Houdini split depth=12.

I like the fact that Houdini is multi-pv and still stronger but R4 still has not been convincingly beaten.
and lets not forget that R4 is VERY MUCH stronger at long time control.

Posted: **Fri Jul 23, 2010 3:24 pm**

Charles,

Way too few games. If you look at gaard's post he has games near the two-thousands.

Posted: **Fri Jul 23, 2010 7:11 pm**

Yes, it could be a number of other factors too. Different time control, different openings

Is 4" CTPM. 4 seconds per move incremented at each move or just 4 seoncds fixed?

Was it an opening suite or an opening book that was used?

The ach-48 suite i used covers a lot more openings than say a perfect 12 book .

Additionally mine was 32 bit -- so that too could be a reason.

IN any case my final match finished dead even at 48/96 games ... so Houdini could have a slight lead on R 4 at least on short time control.

Posted: **Sat Jul 24, 2010 9:09 am**

Charles wrote:Yes, it could be a number of other factors too. Different time control, different openings

Is 4" CTPM. 4 seconds per move incremented at each move or just 4 seoncds fixed?

Was it an opening suite or an opening book that was used?

The ach-48 suite i used covers a lot more openings than say a perfect 12 book .

Additionally mine was 32 bit -- so that too could be a reason.

IN any case my final match finished dead even at 48/96 games ... so Houdini could have a slight lead on R 4 at least on short time control.

Just 4 seconds fixed for each move, no increment, with an opening suite of over 7000 position. Only the first 800 or so were used for this test. One is probably more optimized for a 32-bit platform, and SMP. You are also including time management in your ratings, whereas in mine, the entire mechanism in precluded. Besides that I don't think that there is a significant difference in our time controls.

There is little doubt in my mind that Houdini 1.03 has surpassed Rybka 4 in all areas except perhaps SMP efficiency where she might still hold a slight edge on monster hardware at long time controls. However, the time and resources required to prove such a claim make it virtually untestable.

Posted: **Sat Jul 24, 2010 10:02 am**

Hi gaard,

Did you test on single core ?

You say you take only the first 800 of a 7000 position suit. Is the 800 suit diversive, too? This is only the case if the 7000 suit is merged.

Posted: **Sat Jul 24, 2010 1:39 pm**

Stefan wrote:Hi gaard,

Did you test on single core ?

You say you take only the first 800 of a 7000 position suit. Is the 800 suit diversive, too? This is only the case if the 7000 suit is merged.

The results posted here are all for one core. The suite is representative of chess as it is played in high level games; this was much more likely achieved using so many positions as opposed to my usual 60.

Posted: **Sat Jul 24, 2010 6:19 pm**

gaard wrote:There is little doubt in my mind that Houdini 1.03 has surpassed Rybka 4 in all areas except perhaps SMP efficiency where she might still hold a slight edge on monster hardware at long time controls. However, the time and resources required to prove such a claim make it virtually untestable.

First of all, I think it's important to keep in mind that your results applies to blitz chess using single core CPUs, and not necessarily tournament level chess with more powerful hardware. For me, performance at tournament level is more interesting and important than blitz chess ratings.

I'm not saying that your efforts are useless by any means, but I think a multicore match would've been more interesting, since most people run their engines on multicore CPUs nowadays. Rybka 4 also has certain problems with its time control in shorter games. If you look at this list, you'll see that the TC3100150-version is significantly stronger:

http://computerchess.org.uk/ccrl/404/ra ... t_all.html

Rybka 4 doesn't seem to have any issues with time controls at tournament level (classic time contrls), and I do believe that Rybka 4 is slightly stronger than Houdini 1.03 at tournament level. In any case, the difference in strength between these two programs seems to be rather small, regardless of time control. I still prefer Rybka 4 for analysis, though, because she supports endgame tablebases (which I've found to be of practical value numerous times).

Posted: **Sat Jul 24, 2010 6:35 pm**

I forgot to mention that your result is only based on Houdini's performance against Rybka, not other engines. For an overall, general measure of strength, you need several different opponents.

gaard wrote:However, the time and resources required to prove such a claim make it virtually untestable.

Not really. Martin has already conducted a 48 game match between R4 and Houdini 1.02. Although 48 games isn't that much, statistically speaking, you'll eventually get a good idea of its strength after comparing several results over time. But I do agree, though, that these kind of matches take a long time to complete. SSDF has tested hundreds of engines at tournament level for several decades, but they probably won't test Houdini due to its controversial nature.

Posted: **Sat Jul 24, 2010 7:48 pm**

gaard wrote:The suite is representative of chess as it is played in high level games

Sorry for my bad English, so you did not understand my question. It is clear that the 7000 position suite is representative. But if you take the first 800 positions this is only representative if the 7000 position suite is randomly shuffled or unsorted (and not ordered by openings). Otherwise you could get 800 Sicilian openings.

OpenChess

The Final Showdown: Rybka 4 vs Houdini 1.03

The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03

Re: The Final Showdown: Rybka 4 vs Houdini 1.03