As in chess tournaments and matches...
-
gaard
- Posts: 127
- Joined: Thu Jun 10, 2010 1:39 am
- Real Name: Martin Wyngaarden
- Location: Holland, Michigan
Post
by gaard » Sun Jul 25, 2010 12:19 am
DeepRybka wrote:gaard wrote:There is little doubt in my mind that Houdini 1.03 has surpassed Rybka 4 in all areas except perhaps SMP efficiency where she might still hold a slight edge on monster hardware at long time controls. However, the time and resources required to prove such a claim make it virtually untestable.
First of all, I think it's important to keep in mind that your results applies to blitz chess using single core CPUs, and not necessarily tournament level chess with more powerful hardware. For me, performance at tournament level is more interesting and important than blitz chess ratings.
I'm not saying that your efforts are useless by any means, but I think a multicore match would've been more interesting, since most people run their engines on multicore CPUs nowadays. Rybka 4 also has certain problems with its time control in shorter games. If you look at this list, you'll see that the TC3100150-version is significantly stronger:
http://computerchess.org.uk/ccrl/404/ra ... t_all.html
Rybka 4 doesn't seem to have any issues with time controls at tournament level (classic time contrls), and I do believe that Rybka 4 is slightly stronger than Houdini 1.03 at tournament levsel. In any case, the difference in strength between these two programs seems to be rather small, regardless of time control. I still prefer Rybka 4 for analysis, though, because she supports endgame tablebases (which I've found to be of practical value numerous times).
That my results apply to faster time controls on 1-core and do not exactly extrapolate to ratings between the two on larger hardware with long time controls was exactly my point. We can speculate and make conjectures all day long about the relative performance of Rybka and Houdini on larger hardware with long time controls, but how long would that match take, and who is going to run it?
I am aware TC3100150 is stronger than default with CCRL condition. This is comparing apples and oranges, however. You should be aware that my testing precludes time managements altogether so the difference between TC3100150 is not relevant here.
Last edited by
gaard on Sun Jul 25, 2010 12:29 am, edited 2 times in total.
-
gaard
- Posts: 127
- Joined: Thu Jun 10, 2010 1:39 am
- Real Name: Martin Wyngaarden
- Location: Holland, Michigan
Post
by gaard » Sun Jul 25, 2010 12:28 am
DeepRybka wrote:I forgot to mention that your result is only based on Houdini's performance against Rybka, not other engines. For an overall, general measure of strength, you need several different opponents.
gaard wrote:However, the time and resources required to prove such a claim make it virtually untestable.
Not really. Martin has already conducted a 48 game match between R4 and Houdini 1.02. Although 48 games isn't that much, statistically speaking, you'll eventually get a good idea of its strength after comparing several results over time. But I do agree, though, that these kind of matches take a long time to complete. SSDF has tested hundreds of engines at tournament level for several decades, but they probably won't test Houdini due to its controversial nature.
You are only looking at one result here in this thread. Find my "New Rating List" thread and you will see that Houdini 1.03 is still the leader, and better than Rybka 4, in a match with more than ten other opponents. It's a work in progress; whenever I have spare CPU cycles, I am running more games.
Where are his results that I could look at to "get a good idea of its strength after comparing several results over time" that relate to Houdini?
-
gaard
- Posts: 127
- Joined: Thu Jun 10, 2010 1:39 am
- Real Name: Martin Wyngaarden
- Location: Holland, Michigan
Post
by gaard » Sun Jul 25, 2010 12:32 am
Stefan wrote:gaard wrote:The suite is representative of chess as it is played in high level games
Sorry for my bad English, so you did not understand my question. It is clear that the 7000 position suite is representative. But if you take the first 800 positions this is only representative if the 7000 position suite is randomly shuffled or unsorted (and not ordered by openings). Otherwise you could get 800 Sicilian openings.
They are shuffled.