New Rating List

As in chess tournaments and matches...
TPJR
Posts: 14
Joined: Thu Jul 08, 2010 1:42 pm
Location: Switzerland

Re: New Rating List

Post by TPJR » Mon Aug 09, 2010 12:10 pm

Hi gaard,

can you make the same test with Firebird DD, but using PawnHash = 1/8 * Hash? (If you use Hash=512MB, you have already done the test.) In another treat it was explained that the PawnHash should be assigned relatively to Hash.
http://www.open-chess.org/viewtopic.php ... t=40#p5025

Thomas

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: New Rating List

Post by gaard » Mon Aug 09, 2010 12:35 pm

TPJR wrote:Hi gaard,

can you make the same test with Firebird DD, but using PawnHash = 1/8 * Hash? (If you use Hash=512MB, you have already done the test.) In another treat it was explained that the PawnHash should be assigned relatively to Hash.
http://www.open-chess.org/viewtopic.php ... t=40#p5025

Thomas
Sure. I ran the last test with 1/4 PH, but I will go ahead and run another with 1/8. It is not obvious to me why this would make a difference though I am always open to suggestions :) I'll report the first 120 games and then again once the 720 game gauntlet has concluded.

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: New Rating List

Post by gaard » Mon Aug 09, 2010 4:28 pm

gaard wrote:
TPJR wrote:Hi gaard,

can you make the same test with Firebird DD, but using PawnHash = 1/8 * Hash? (If you use Hash=512MB, you have already done the test.) In another treat it was explained that the PawnHash should be assigned relatively to Hash.
http://www.open-chess.org/viewtopic.php ... t=40#p5025

Thomas
Sure. I ran the last test with 1/4 PH, but I will go ahead and run another with 1/8. It is not obvious to me why this would make a difference though I am always open to suggestions :) I'll report the first 120 games and then again once the 720 game gauntlet has concluded.

Code: Select all

   4 FireBird 1.1 DD       64 120.0 ( 65.5 :  54.5)
                               15.0 (  7.0 :   8.0) Houdini 1.03         134
                               19.0 (  6.0 :  13.0) Rybka 4              130
                               26.0 ( 14.0 :  12.0) Stockfish 1.8         61
                               14.0 (  5.5 :   8.5) Critter 0.80           2
                               27.0 ( 18.0 :   9.0) Naum 4.2              -3
                               19.0 ( 15.0 :   4.0) Spark 0.4           -125


Rank Name                 Elo    +    - games score oppo. draws 
   1 Houdini 1.03         134   19   19   984   72%   -27   35% 
   2 Rybka 4              130   19   19   999   70%   -24   32% 
   3 IvanHoe 9.52a        119   22   22   708   68%   -16   36% 
   4 FireBird 1.1 DD       64   51   51   120   55%    31   39% 
   5 Stockfish 1.8         61   19   19  1006   60%   -15   33% 
   6 Critter 0.80           2   34   34   302   51%   -12   32% 
   7 Naum 4.2              -3   18   18  1009   50%    -8   35% 
   8 Shredder 12 32-bit   -48   20   20   876   45%   -15   32% 
   9 Spark 0.4           -125   19   19   987   33%     5   30% 
  10 Zappa Mexico II     -142   20   20   969   30%     6   31% 
  11 Toga II 1.4beta5c   -193   21   21   968   24%    12   25% 

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: New Rating List

Post by gaard » Tue Aug 10, 2010 6:31 pm

TPJR wrote:Hi gaard,

can you make the same test with Firebird DD, but using PawnHash = 1/8 * Hash? (If you use Hash=512MB, you have already done the test.) In another treat it was explained that the PawnHash should be assigned relatively to Hash.
http://www.open-chess.org/viewtopic.php ... t=40#p5025

Thomas
Match concluded!

Code: Select all

Ratings:

Rank Name                 Elo    +    - games score oppo. draws 
   1 Houdini 1.03         134   18   18  1089   70%   -17   37% 
   2 Rybka 4              129   18   18  1100   69%   -15   33% 
   3 IvanHoe 9.52a        118   22   22   708   68%   -17   36% 
   4 FireBird 1.1 DD       86   21   21   720   58%    31   42% 
   5 Stockfish 1.8         60   18   18  1100   59%    -8   35% 
   6 Naum 4.2              -2   17   17  1102   49%    -1   36% 
   7 Critter 0.80          -9   29   29   408   46%    13   33% 
   8 Shredder 12 32-bit   -49   20   20   876   45%   -17   32% 
   9 Spark 0.4           -127   19   19  1088   31%    11   30% 
  10 Zappa Mexico II     -144   20   20   969   30%     4   31% 
  11 Toga II 1.4beta5c   -195   21   21   968   24%    11   25% 


Details:

   4 FireBird 1.1 DD       86 720.0 (415.5 : 304.5)
                              120.0 ( 49.5 :  70.5) Houdini 1.03         134
                              120.0 ( 50.5 :  69.5) Rybka 4              129
                              120.0 ( 65.5 :  54.5) Stockfish 1.8         60
                              120.0 ( 75.5 :  44.5) Naum 4.2              -2
                              120.0 ( 78.5 :  41.5) Critter 0.80          -9
                              120.0 ( 96.0 :  24.0) Spark 0.4           -127


Likelihood of Superiority:

                    Hou Ryb Iva Fir Sto Nau Cri Shr Spa Zap Tog
Houdini 1.03            656 875 999 999100010001000100010001000
Rybka 4             343     788 999 9991000 9991000100010001000
IvanHoe 9.52a       124 211     976 9991000 9991000100010001000
FireBird 1.1 DD       0   0  23     975 999 9991000100010001000
Stockfish 1.8         0   0   0  24     999 9991000100010001000
Naum 4.2              0   0   0   0   0     655 999100010001000
Critter 0.80          0   0   0   0   0 344     987 999 9991000
Shredder 12 32-bit    0   0   0   0   0   0  12     999 9991000
Spark 0.4             0   0   0   0   0   0   0   0     900 999
Zappa Mexico II       0   0   0   0   0   0   0   0  99     999
Under these conditions, the chance of FireBird 1.1 DD being better than Rybka or Houdini is less than .001 and it rates very close to the default version and almost identically with RobboLito. This version of FireBird is so close to RobboLito and scales just as poorly with more time, IMO, that I seriously doubt it would do better with a longer time control.

Post Reply