Not getting the same results as everyone else...

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
Post Reply
teh_pwnerer
Posts: 2
Joined: Mon Aug 16, 2010 11:19 am

Not getting the same results as everyone else...

Post by teh_pwnerer » Mon Aug 16, 2010 11:42 am

Hello all, I'm a bit new to engine testing and whatnot, but I am definitely enjoying it so far.

I'm just posting to get some input on what seems to be a problem with some of my engines.

I keep seeing Naum getting extremely high ELO ratings, and generally being considered the 3rd best computer.. yet my Naum can't compete with any of my other engines under any time control.. although it does slightly better with longer games.

To save the story, I will simply post the ELO ratings I would give each of my engines after several hundred games.

Fast games (1:0):
Firebird 1.1 (with changed configurations): 3300
Houdini 1.03a: 3300
Fire 1.3: 3275
Rybka 4: 3200
Stockfish 1.8: 3075
Naum 4.2: 2950

Slower games (3:2 through 15:0):
Rybka 4: 3300
Firebird 1.1 (with changed configurations): 3275
Fire 1.3: 3200
Houdini 1.03a: 3100
Stockfish 1.8: 3100
Naum 4.2: 3050

I'm using Aquarium + Narrow book in all tests to limit the amount of book moves.

I guess my main question is why are Stockfish and Naum so weak for me when they're supposedly as strong or stronger than Rybka 4?
And why does Houdini do so poorly in slower games?

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: Not getting the same results as everyone else...

Post by gaard » Mon Aug 16, 2010 12:39 pm

teh_pwnerer wrote:Hello all, I'm a bit new to engine testing and whatnot, but I am definitely enjoying it so far.

I'm just posting to get some input on what seems to be a problem with some of my engines.

I keep seeing Naum getting extremely high ELO ratings, and generally being considered the 3rd best computer.. yet my Naum can't compete with any of my other engines under any time control.. although it does slightly better with longer games.

To save the story, I will simply post the ELO ratings I would give each of my engines after several hundred games.

Fast games (1:0):
Firebird 1.1 (with changed configurations): 3300
Houdini 1.03a: 3300
Fire 1.3: 3275
Rybka 4: 3200
Stockfish 1.8: 3075
Naum 4.2: 2950

Slower games (3:2 through 15:0):
Rybka 4: 3300
Firebird 1.1 (with changed configurations): 3275
Fire 1.3: 3200
Houdini 1.03a: 3100
Stockfish 1.8: 3100
Naum 4.2: 3050

I'm using Aquarium + Narrow book in all tests to limit the amount of book moves.

I guess my main question is why are Stockfish and Naum so weak for me when they're supposedly as strong or stronger than Rybka 4?
And why does Houdini do so poorly in slower games?
It's hard to say, but something is definitely wrong... What are your exact testing condition? Could you post the BayesElo ratings instead of your approximations?

teh_pwnerer
Posts: 2
Joined: Mon Aug 16, 2010 11:19 am

Re: Not getting the same results as everyone else...

Post by teh_pwnerer » Mon Aug 16, 2010 1:27 pm

I'm not sure what BayesElo ratings are.

And the testing conditions are again:
Aquarium GUI
1 minute, no increment for fast games
3 minute, 2 sec increment for slower games
NarrowBook Opening Book (fairly basic book, all engines were using it)
No ponder
All are 64-bit versions, on a quad core i5 2.66GhZ

Gerold
Posts: 73
Joined: Thu Jun 10, 2010 1:32 am

Re: Not getting the same results as everyone else...

Post by Gerold » Mon Aug 16, 2010 2:25 pm

teh_pwnerer wrote:I'm not sure what BayesElo ratings are.

And the testing conditions are again:
Aquarium GUI
1 minute, no increment for fast games
3 minute, 2 sec increment for slower games
NarrowBook Opening Book (fairly basic book, all engines were using it)
No ponder
All are 64-bit versions, on a quad core i5 2.66GhZ
Do a head to head test at TC 5/3 or longer with the same book and at least 500 games each,
to see a better results.

How many games played for each engine. Even with 100 games one engine may be 50 elo higher than another engine and the
next 100 games different.

With one minute games with these engines that have very poor end game knowledge you will get all kinds of scores.

Good luck,
Gerold.

P.S. You could download Arena 1.1 Gui which is free. And test engine vs. engine with same book at longer TC.
This Should give you much better results. Arena is the best for testing at longer time control.

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: Not getting the same results as everyone else...

Post by gaard » Mon Aug 16, 2010 2:37 pm

teh_pwnerer wrote:I'm not sure what BayesElo ratings are.

And the testing conditions are again:
Aquarium GUI
1 minute, no increment for fast games
3 minute, 2 sec increment for slower games
NarrowBook Opening Book (fairly basic book, all engines were using it)
No ponder
All are 64-bit versions, on a quad core i5 2.66GhZ
Could you put the games in PGN and upload? Calculating ratings with BayesElo from a .pgn file is fairly straightforward.

http://remi.coulom.free.fr/Bayesian-Elo/

With the .pgn file in the same directory as bayeselo.exe, run and type:
> readpgn database.pgn
> elo
> mm
> ratings
or to send to a file
> ratings >somefile.txt

AnthonyTheSage
Posts: 92
Joined: Wed Jun 16, 2010 12:31 am
Real Name: Anthony

Re: Not getting the same results as everyone else...

Post by AnthonyTheSage » Tue Aug 17, 2010 2:38 am

Theres no problem.........Rybka 4, Houdini, Stockfish 1.8, Fire 1.1(Custom) are all stronger than Naum 4.2. Fire 1.3 should be fairly close with more games. The longer time controls controls will bring Naums elo up slightly, because Houdini and Fire use move on ponderhit for blitz which wins them alot of games on time. Naum also does better with longer games because it really sucks at openings. For comfirmation look at CCRL ratings. You'll see that Stockfish and Rybka are both stonger than Naum. Since Houdini is stronger than Rybka, and Fire 1.1 about the same rating, there you have it.

gaard
Posts: 127
Joined: Thu Jun 10, 2010 1:39 am
Real Name: Martin Wyngaarden
Location: Holland, Michigan

Re: Not getting the same results as everyone else...

Post by gaard » Tue Aug 17, 2010 4:03 am

I think Naum defaults to one thread. Did you configure it to use four?

User avatar
Dr.Wael Deeb
Posts: 104
Joined: Thu Jun 10, 2010 8:29 pm
Real Name: Dr.Wael Deeb

Re: Not getting the same results as everyone else...

Post by Dr.Wael Deeb » Tue Aug 17, 2010 5:42 am

AnthonyTheSage wrote:Theres no problem.........Rybka 4, Houdini, Stockfish 1.8, Fire 1.1(Custom) are all stronger than Naum 4.2. Fire 1.3 should be fairly close with more games. The longer time controls controls will bring Naums elo up slightly, because Houdini and Fire use move on ponderhit for blitz which wins them alot of games on time. Naum also does better with longer games because it really sucks at openings. For comfirmation look at CCRL ratings. You'll see that Stockfish and Rybka are both stonger than Naum. Since Houdini is stronger than Rybka, and Fire 1.1 about the same rating, there you have it.
Good analysis which I totally agree with 8-)
Dr.D

Post Reply