Some Notes About Engine Settings Testing

Sedat Canbaz · Post by **Sedat Canbaz** » Wed Jan 31, 2024 2:12 pm

Hello there )

1st of all, via current New Engine Settings Testing,
The target was to check the performance of the engines
Sure I mean which are near/close to GMs playing level

For examples,
In the 1st Test: CM11 Archangel vs CM11 Fischer

Strange results really... e.g + 190 Elo difference...
Normally it should not be so much diff. in strength!

Code: Select all

1   CM11 Archangel  +59/-9/=32 75.00%   75.0/100
2   CM11 Fischer    +9/-59/=32 25.00%   25.0/100

----------------------------------------------------------
Or as other example,
In the below played test, more close to reality...
But anyhow still I can not say very close...e.g
According to calculations...Bobby Fischer should
Be minus plus 100 Elo points better...where in this
Test... as we see, about 25 Elo difference..

Code: Select all

1   CM11 Fischer  +40/-33/=27 53.50%   53.5/100
2   CM11 Euwe     +33/-40/=27 46.50%   46.5/100

---------------------------------------------------------
And sure some may say not enough games etc and etc..
Actually they have right...!!
Because 100 games can lead to wrong conclusions !
But I have some experience...not much but I have..
Btw, what about you ? can you serve please your data..?
If possible..with games... too boring via comments,
I mean 'terrible' if we see 'standings' as without games!
Really... and many pay attention for them...sad indeed..
Nowadays everyone is hero...but under SCCT cond. it is not
Like that... it is hard hard conditions for sure, right... ?!)
-----------------------------------------------------------------
Btw, I wonder too what about this test ?) +346 Elo diff. !
But in reality...at their best... theirs av. peak Elo diff.
Between G. Kasparov and J. Polgar was about 150 Elo

In other words,
It's clear that Prodeo 1.86's Polgar setting does not
Reflect to actual (real) strength of Judit Polgar !!
At least far away.....even after +1000 games (per player)
I hardly doubt that the situation will be much changed...

Code: Select all

1   Prodeo186 Kasparov +82/-6/=12 88.00%   88.0/100
2   Prodeo186 Polgar   +6/-82/=12 12.00%   12.0/100

Note: For the above test, Prodeo 1.86 engine is used...
As usual, for both players, separate folders installed
You may know.. different settings etc.. that's why...
--------------------------------------------------------------
Other test, played Rodent III engine, but this time used various
Rodent III's eng settings too.. plus as for comparing Fruit 2.1 used
So it is clear that as in all other tests..far away from reality!
Forger Fruit 2.1 that should be close in strength...at least
I say that Rodent's Anand should do better than Rodent's Spassky

E.g according chessmetrics.com: the difference is about 60 Elo:
http://www.chessmetrics.com/cm/CM2/Peak ... 0024310100
But unfortunately in my test...both settings are very close in strength...
Even RIII Spassky seems to be stronger than RIII Anand...sad...

Code: Select all

1   RIII Default   61.5/90
2   RIII Spassky   52.0/90
3   RIII Anand     50.5/90
4   Fruit 2.1      16.0/90

---------------------------------------------------------------

Meanwhile,
I have a note for BIG brains too, e.g these guys often say
Such as meaningless test.. or such as old, bad, poor engines etc.
Ok.here is a test, played by much newer engine: Shashchess 34.5

What I can say more,
Very likely... SH Capablanca came from other planet )
Definitely not from our planet, where now we live...))
Because too much diff. (in Elo points) from reality !

NON-NNUE Test + Capablanca setting is enabled...plus
The Elo settings is configured to play at 2700 Elo:

Code: Select all

1   Shashchess Capablanca  +97/-0/=3 98.50%   98.5/100
2   Prodeo186 Kasparov     +0/-97/=3 1.50%     1.5/100

----------------------------------------------------------
Other series of tests, for example:
Shashchess's NN + Capablanca setting is enabled,
But configurated/reduced to play at 2600 Elo level...

Code: Select all

1   Shashchess Capablanca NN +96/-0/=4 98.00%   98.0/100
2   Prodeo186 Kasparov       +0/-96/=4 2.00%     2.0/100

---------------------------------------------------------
Here is last test, used Fruit 2.1 engine as alternative,
Where Shashchess configurated to play at 2000 Elo level,
But as we see here too.. unfortunately nothing correct!
If still not so clear, Fruit 2.1 is close to GMs level

Code: Select all

1   Shashchess Capablanca  +98/-0/=2 99.00%   99.0/100
2   Fruit 2.1               +0/-98/=2 1.00%    1.0/100

Conditions (for all tests):
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 2m+1s, 128 MB Hash, 4-MEN

-----------------------------------------------------------

Ok.. as final words,
it's clear that mostly tests are not close to GMs strength levels!

And I hope also that next engine releases to be better in quality!
If nothing else, at least all chess engines to be with less bugs...!

GAMES:
https://mega.nz/file/T1hXVY7Y#GMTHgsfbJ ... QSMGE_MZaQ

Best Wishes,
Sedat

Homayoun · Post by **Homayoun** » Wed Jan 31, 2024 4:04 pm

Hi dear sedat
I think these kind of tests are something new. Now we can only consider the results as fun tests and nothing more. As I understood when we use the same engine for both personalities but with different settings, results are more acceptable and at least closer to reality. But when we use two different engines with adjusted settings ( shashchess Capablanca and prodeo Kasparov) the results are terrible and completely misleading. Not reliable and far from the reality. If there was a chance for real match between two greats, Kasparov and Capablanca, Kasparov is the one who has more chance for victory. As I said this is something new. Should be discussed and tested more and the issues should be solved gradually. But many thanks as always for interesting idea, nice and interesting test too.
Greetings

Sedat Canbaz · Post by **Sedat Canbaz** » Wed Jan 31, 2024 5:44 pm

Homayoun wrote: ↑
Wed Jan 31, 2024 4:04 pm
Hi dear sedat
I think these kind of tests are something new. Now we can only consider the results as fun tests and nothing more. As I understood when we use the same engine for both personalities but with different settings, results are more acceptable and at least closer to reality. But when we use two different engines with adjusted settings ( shashchess Capablanca and prodeo Kasparov) the results are terrible and completely misleading. Not reliable and far from the reality. If there was a chance for real match between two greats, Kasparov and Capablanca, Kasparov is the one who has more chance for victory. As I said this is something new. Should be discussed and tested more and the issues should be solved gradually. But many thanks as always for interesting idea, nice and interesting test too.
Greetings

Hello dear Homayoun ,

You are welcome..
And thanks for your useful posting...

Well, the idea is born for my next planning tour!
E.g I plan to organize a new Human like tour, which
Will be based on Eng players, and same time will be used
As Openings, played by the real World Chess Champions...

You know, I will do my best... for better Human like...)
At least I guarantee about the openings...

Btw, based on my latest new strength testings, after checking..
As you see too.. it's clear some engines are far away from reality...
I mean not so much close to GMs playing level strength...

But I admit also that e.g
According to various data, e.g Man vs Machine matches,
Some of the engines seems to be more close to GMs levels!

And if you ask Me, which are those engines ?) e.g
The engines such as older Fritz, Junior series, or some
Fruit eng versions, ProDeo vers, older Hiarcs eng etc.
Even I believe that the ancient Stockfish 1x, Rybka 1b etc.
Seems be one of them....sure we can not say such as
Theirs strength or playing style is identical to GMs...
Otherwise..it will be madness... but my target is simply
To pick as eng players, which are close in strength to
The one of the greatest human players of all time...!

Btw, at the moment I am running a new strength test...
We will see...which candidate engines will be more
Suitable... more info very soon )

Greetings

Homayoun · Post by **Homayoun** » Wed Jan 31, 2024 7:09 pm

Sedat Canbaz wrote: ↑
Wed Jan 31, 2024 5:44 pm

Homayoun wrote: ↑
Wed Jan 31, 2024 4:04 pm
Hi dear sedat
I think these kind of tests are something new. Now we can only consider the results as fun tests and nothing more. As I understood when we use the same engine for both personalities but with different settings, results are more acceptable and at least closer to reality. But when we use two different engines with adjusted settings ( shashchess Capablanca and prodeo Kasparov) the results are terrible and completely misleading. Not reliable and far from the reality. If there was a chance for real match between two greats, Kasparov and Capablanca, Kasparov is the one who has more chance for victory. As I said this is something new. Should be discussed and tested more and the issues should be solved gradually. But many thanks as always for interesting idea, nice and interesting test too.
Greetings
Hello dear Homayoun ,

You are welcome..
And thanks for your useful posting...

Well, the idea is born for my next planning tour!
E.g I plan to organize a new Human like tour, which
Will be based on Eng players, and same time will be used
As Openings, played by the real World Chess Champions...

You know, I will do my best... for better Human like...)
At least I guarantee about the openings...

Btw, based on my latest new strength testings, after checking..
As you see too.. it's clear some engines are far away from reality...
I mean not so much close to GMs playing level strength...

But I admit also that e.g
According to various data, e.g Man vs Machine matches,
Some of the engines seems to be more close to GMs levels!

And if you ask Me, which are those engines ?) e.g
The engines such as older Fritz, Junior series, or some
Fruit eng versions, ProDeo vers, older Hiarcs eng etc.
Even I believe that the ancient Stockfish 1x, Rybka 1b etc.
Seems be one of them....sure we can not say such as
Theirs strength or playing style is identical to GMs...
Otherwise..it will be madness... but my target is simply
To pick as eng players, which are close in strength to
The one of the greatest human players of all time...!

Btw, at the moment I am running a new strength test...
We will see...which candidate engines will be more
Suitable... more info very soon )

Greetings

Thanks, Waiting for tournament results.
Best regards

Sedat Canbaz · Post by **Sedat Canbaz** » Wed Jan 31, 2024 11:17 pm

Homayoun wrote: ↑
Wed Jan 31, 2024 7:09 pm

Sedat Canbaz wrote: ↑
Wed Jan 31, 2024 5:44 pm

Homayoun wrote: ↑
Wed Jan 31, 2024 4:04 pm
Hi dear sedat
I think these kind of tests are something new. Now we can only consider the results as fun tests and nothing more. As I understood when we use the same engine for both personalities but with different settings, results are more acceptable and at least closer to reality. But when we use two different engines with adjusted settings ( shashchess Capablanca and prodeo Kasparov) the results are terrible and completely misleading. Not reliable and far from the reality. If there was a chance for real match between two greats, Kasparov and Capablanca, Kasparov is the one who has more chance for victory. As I said this is something new. Should be discussed and tested more and the issues should be solved gradually. But many thanks as always for interesting idea, nice and interesting test too.
Greetings
Hello dear Homayoun ,

You are welcome..
And thanks for your useful posting...

Well, the idea is born for my next planning tour!
E.g I plan to organize a new Human like tour, which
Will be based on Eng players, and same time will be used
As Openings, played by the real World Chess Champions...

You know, I will do my best... for better Human like...)
At least I guarantee about the openings...

Btw, based on my latest new strength testings, after checking..
As you see too.. it's clear some engines are far away from reality...
I mean not so much close to GMs playing level strength...

But I admit also that e.g
According to various data, e.g Man vs Machine matches,
Some of the engines seems to be more close to GMs levels!

And if you ask Me, which are those engines ?) e.g
The engines such as older Fritz, Junior series, or some
Fruit eng versions, ProDeo vers, older Hiarcs eng etc.
Even I believe that the ancient Stockfish 1x, Rybka 1b etc.
Seems be one of them....sure we can not say such as
Theirs strength or playing style is identical to GMs...
Otherwise..it will be madness... but my target is simply
To pick as eng players, which are close in strength to
The one of the greatest human players of all time...!

Btw, at the moment I am running a new strength test...
We will see...which candidate engines will be more
Suitable... more info very soon )

Greetings
Thanks, Waiting for tournament results.
Best regards

Sure, but before I have to create the opening books...
But in a few days... I think all will be ready to start...

Greetings

Sedat Canbaz · Post by **Sedat Canbaz** » Wed Jan 31, 2024 11:19 pm

Update

Be aware of that too,
Chessmetrics site is not updated since 2006 year
So many of latest players are not calculated...

But I have good news,
E.g according to my calculations, 5 year peak range:
M. Carlsen is close to 2857, where D. Liren 2796 Elo

Meanwhile, best of all time is Garry Kasparov!
Btw, Carlsen is quite close..just 18 Elo behind..
Impressive also Lasker, Capablanca, Botvinik etc.
Are not too far from the level of 2850 Elo points!

And here are Classical World Champions
Peak Average Ratings: 5 Year Peak Range

Where on the right side of the rankings are
The planning engines, which will be played..

Btw, as you may see too,
Mostly of World Champions Elo points are almost
Identical or close to the planning chess engines!
And very soon I'll share the latest eng ratings..
And then you will have chances to compare that
All shared below ranking is very close....

Code: Select all

Champions   Elo  /  Chess Engines   Elo   
Kasparov   2875  -  Stockfish 1.4  2878    
Carlsen    2857  -  Rybka 1.01b    2860     
Lasker     2854  -  Sjeng 2008     2845         
Capablanca 2843  -  Tornado 4.88   2841   
Botvinnik  2843  -  Rodent II 087  2827   
Fischer    2841  -  Gaviota 0.86   2826
Karpov     2829  -  Shredder 11    2825
Alekhine   2827  -  Hannibal 1.0a  2819
Anand      2818  -  Fruit 2.3.1    2817
Kramnik    2812  -  Junior 2010    2816
Liren      2796  -  Naum 2.2       2808
Steinitz   2789  -  Zappa Mexico   2789
Smyslov    2788  -  Prodeo 3.1     2787
Petrosian  2782  -  Toga II 121a   2783
Tal        2773  -  Shield 2.1     2783
Spassky    2761  -  Hiarcs X54     2768
Euwe       2741  -  Gambit Fruit   2740

Have fun,
Sedat

Sedat Canbaz · Post by **Sedat Canbaz** » Wed Jan 31, 2024 11:21 pm

And here is the mentioned, latest strength results/ranking:

Sure later... all games will be shared.. I need to take rest )

Note also that as most suitable engines are picked, which
Are best, stable ones...I mean as without time forfeits etc.
Besides as players are picked, which are close in strength!

E.g Deuterium 14.2 is out due to has several games lost on time..
Arasan, plus Pedone are not so stable too... e.g lost on time...
Not so frequent of course, 1 or 2 games, but lost...

Code: Select all

Rank Name                Elo    +    - games score oppo. draws 
   1 Komodo 1.0         2910   49   47   120   67%  2796   41% 
   2 Stockfish 1.4      2899   47   45   140   68%  2772   34% 
   3 Stockfish 1.4 w32  2878   19   19   820   61%  2799   34% 
   4 Pedone 1.3         2876   20   20   770   62%  2790   35% 
   5 Rybka 1.01b        2860   18   18   890   58%  2806   36% 
   6 Deuterium 14.2     2857   25   25   470   59%  2795   33% 
   7 Sjeng 2008         2845   20   20   740   56%  2804   36% 
   8 Tornado 4.88       2841   19   19   840   56%  2800   34% 
   9 Rodent II 0.8.7    2827   18   18   860   53%  2802   36% 
  10 Gaviota 0.86       2826   18   18   890   53%  2807   34% 
  11 Shredder 11        2825   19   18   850   53%  2807   34% 
  12 Hannibal 1.0a      2819   17   17  1040   52%  2803   34% 
  13 Fruit 2.3.1        2817   18   18   900   52%  2803   40% 
  14 Junior 2010        2816   18   18   890   51%  2808   31% 
  15 Naum 2.2           2808   18   18   860   51%  2803   40% 
  16 Zappa Mexico       2789   18   18   860   48%  2805   37% 
  17 Prodeo 3.1         2787   18   18   870   47%  2805   35% 
  18 Toga II 1.2.1a     2783   18   18   900   47%  2805   36% 
  19 Shield 2.1         2783   17   17  1020   47%  2804   30% 
  20 Rodent II Fischer  2777   42   42   160   46%  2806   36% 
  21 Smarthing 1.2.0    2771   33   33   260   47%  2792   36% 
  22 Pawny 1.2          2769   18   18   890   44%  2811   36% 
  23 Hiarcs X54         2768   18   18   890   44%  2811   33% 
  24 Spike 1.2 Turin    2763   31   31   280   46%  2789   38% 
  25 Stockfish 1.3.1    2761   33   33   260   46%  2787   33% 
  26 Prodeo 3.1 Tal     2757   42   42   160   43%  2807   34% 
  27 Arasan 16.2 pol    2745   33   34   260   43%  2794   33% 
  28 Stockfish 1.3      2743   37   37   200   44%  2783   38% 
  29 Gambit Fruit       2740   21   21   680   42%  2800   32% 
  30 Bobcat 3.2.5       2721   32   32   280   42%  2779   33% 
  31 TL20090105         2718   32   33   280   41%  2779   32% 
  32 Marvin 3.0.0       2707   19   19   860   36%  2805   32% 
  33 ECE 20.1           2697   33   33   280   39%  2781   27% 
  34 Renegade 0.11      2674   43   44   150   35%  2777   35%

ZamChess · Post by **ZamChess** » Thu Feb 01, 2024 8:49 am

Hi Sedat, a suggestion:
Play all games with the same engine, for example Komodo Dragon 3.3 with 1 cpu and select ”Human” and enter player elos as well.

All the best.

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Feb 01, 2024 3:35 pm

ZamChess wrote: ↑
Thu Feb 01, 2024 8:49 am
Hi Sedat, a suggestion:
Play all games with the same engine, for example Komodo Dragon 3.3 with 1 cpu and select ”Human” and enter player elos as well.

All the best.

Hello Zam,

Thanks...

Note that for the planning new competition,
Main target is to use various classical engines!
In this way,
I believe that we will see more variety of play...

Btw, in one of my older tours - limit playing strength,
Many of the tested engines were far from real Elo points:
https://sites.google.com/site/computers ... g-strength

Anyhow, just out of curiosity
I will check and test Komodo Dragon 3.3 too
I wonder what will be Komodo Dragon at 2880 Elo
And plus Komodo Dragon at 2740 Elo ...

All this mystery will be solved soon as possible..

Greetings )

Sedat Canbaz · Post by **Sedat Canbaz** » Thu Feb 01, 2024 5:38 pm

UPDATE 2

Code: Select all

Rank Name                Elo    +    - games score oppo. draws 
   1 KD 2880 Elo        2890   27   26   500   75%  2672   14% 
   2 Stockfish 1.4 w32  2878   26   26   500   75%  2674   19% 
   3 Gambit Fruit       2735   24   24   500   54%  2703   25% 
   4 KD 2740 Elo        2715   24   24   500   51%  2707   18% 
   5 Shredder 5         2547   26   27   500   26%  2741   16% 
   6 KD 2600 Elo        2485   28   29   500   20%  2753   11% 

 • Stockfish 1.4 is fixed to 2878 Elo / KD = KomodoDragon 3.3

ALL GAMES (I included the recent candidate games as well):
https://mega.nz/file/flgW1YqY#mVpWgoCYs ... xquaXi2fWg

Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 2m+1s, 128 MB Hash, 4-MEN

More Details:
It's clear that Komodo Dragon is optimized for limited playing strength
Especially Komodo at 2880 Elo is close in strength to Stockfish, nice!
But what about Komodo at 2600 Elo ? as we see far away from reality...

And once more we noticed that,
The best, ideal way is to use various chess engines,
Sure before starting.. I suggest serious strength testings!
Otherwise, our tours may lead to wrong, unfair results!

And as same words are going for book competitions as well...
I mean run beta eng strength testings (before serious tours)
Because some engines are too DRAWISH, or some are buggy...
E.g with book settings etc...or some produce too DOUBLES

In other words,
The BIG efforts, which we spend.. should not be gone for nothing!

If nothing else... in most cases I am doing this job in that way,.
And I feel much better... !)

Greetings )

OpenChess

OpenChess

Some Notes About Engine Settings Testing

Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing

Re: Some Notes About Engine Settings Testing