Some Notes About Engine Settings Testing
Posted: Wed Jan 31, 2024 2:12 pm
Hello there )
1st of all, via current New Engine Settings Testing,
The target was to check the performance of the engines
Sure I mean which are near/close to GMs playing level
For examples,
In the 1st Test: CM11 Archangel vs CM11 Fischer
Strange results really... e.g + 190 Elo difference...
Normally it should not be so much diff. in strength!
----------------------------------------------------------
Or as other example,
In the below played test, more close to reality...
But anyhow still I can not say very close...e.g
According to calculations...Bobby Fischer should
Be minus plus 100 Elo points better...where in this
Test... as we see, about 25 Elo difference..
---------------------------------------------------------
And sure some may say not enough games etc and etc..
Actually they have right...!!
Because 100 games can lead to wrong conclusions !
But I have some experience...not much but I have..
Btw, what about you ? can you serve please your data..?
If possible..with games... too boring via comments,
I mean 'terrible' if we see 'standings' as without games!
Really... and many pay attention for them...sad indeed..
Nowadays everyone is hero...but under SCCT cond. it is not
Like that... it is hard hard conditions for sure, right... ?!)
-----------------------------------------------------------------
Btw, I wonder too what about this test ?) +346 Elo diff. !
But in reality...at their best... theirs av. peak Elo diff.
Between G. Kasparov and J. Polgar was about 150 Elo
In other words,
It's clear that Prodeo 1.86's Polgar setting does not
Reflect to actual (real) strength of Judit Polgar !!
At least far away.....even after +1000 games (per player)
I hardly doubt that the situation will be much changed...
Note: For the above test, Prodeo 1.86 engine is used...
As usual, for both players, separate folders installed
You may know.. different settings etc.. that's why...
--------------------------------------------------------------
Other test, played Rodent III engine, but this time used various
Rodent III's eng settings too.. plus as for comparing Fruit 2.1 used
So it is clear that as in all other tests..far away from reality!
Forger Fruit 2.1 that should be close in strength...at least
I say that Rodent's Anand should do better than Rodent's Spassky
E.g according chessmetrics.com: the difference is about 60 Elo:
http://www.chessmetrics.com/cm/CM2/Peak ... 0024310100
But unfortunately in my test...both settings are very close in strength...
Even RIII Spassky seems to be stronger than RIII Anand...sad...
---------------------------------------------------------------
Meanwhile,
I have a note for BIG brains too, e.g these guys often say
Such as meaningless test.. or such as old, bad, poor engines etc.
Ok.here is a test, played by much newer engine: Shashchess 34.5
What I can say more,
Very likely... SH Capablanca came from other planet )
Definitely not from our planet, where now we live...))
Because too much diff. (in Elo points) from reality !
NON-NNUE Test + Capablanca setting is enabled...plus
The Elo settings is configured to play at 2700 Elo:
----------------------------------------------------------
Other series of tests, for example:
Shashchess's NN + Capablanca setting is enabled,
But configurated/reduced to play at 2600 Elo level...
---------------------------------------------------------
Here is last test, used Fruit 2.1 engine as alternative,
Where Shashchess configurated to play at 2000 Elo level,
But as we see here too.. unfortunately nothing correct!
If still not so clear, Fruit 2.1 is close to GMs level
Conditions (for all tests):
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 2m+1s, 128 MB Hash, 4-MEN
-----------------------------------------------------------
Ok.. as final words,
it's clear that mostly tests are not close to GMs strength levels!
And I hope also that next engine releases to be better in quality!
If nothing else, at least all chess engines to be with less bugs...!
GAMES:
https://mega.nz/file/T1hXVY7Y#GMTHgsfbJ ... QSMGE_MZaQ
Best Wishes,
Sedat
1st of all, via current New Engine Settings Testing,
The target was to check the performance of the engines
Sure I mean which are near/close to GMs playing level
For examples,
In the 1st Test: CM11 Archangel vs CM11 Fischer
Strange results really... e.g + 190 Elo difference...
Normally it should not be so much diff. in strength!
Code: Select all
1 CM11 Archangel +59/-9/=32 75.00% 75.0/100
2 CM11 Fischer +9/-59/=32 25.00% 25.0/100
Or as other example,
In the below played test, more close to reality...
But anyhow still I can not say very close...e.g
According to calculations...Bobby Fischer should
Be minus plus 100 Elo points better...where in this
Test... as we see, about 25 Elo difference..
Code: Select all
1 CM11 Fischer +40/-33/=27 53.50% 53.5/100
2 CM11 Euwe +33/-40/=27 46.50% 46.5/100
And sure some may say not enough games etc and etc..
Actually they have right...!!
Because 100 games can lead to wrong conclusions !
But I have some experience...not much but I have..
Btw, what about you ? can you serve please your data..?
If possible..with games... too boring via comments,
I mean 'terrible' if we see 'standings' as without games!
Really... and many pay attention for them...sad indeed..
Nowadays everyone is hero...but under SCCT cond. it is not
Like that... it is hard hard conditions for sure, right... ?!)
-----------------------------------------------------------------
Btw, I wonder too what about this test ?) +346 Elo diff. !
But in reality...at their best... theirs av. peak Elo diff.
Between G. Kasparov and J. Polgar was about 150 Elo
In other words,
It's clear that Prodeo 1.86's Polgar setting does not
Reflect to actual (real) strength of Judit Polgar !!
At least far away.....even after +1000 games (per player)
I hardly doubt that the situation will be much changed...
Code: Select all
1 Prodeo186 Kasparov +82/-6/=12 88.00% 88.0/100
2 Prodeo186 Polgar +6/-82/=12 12.00% 12.0/100
As usual, for both players, separate folders installed
You may know.. different settings etc.. that's why...
--------------------------------------------------------------
Other test, played Rodent III engine, but this time used various
Rodent III's eng settings too.. plus as for comparing Fruit 2.1 used
So it is clear that as in all other tests..far away from reality!
Forger Fruit 2.1 that should be close in strength...at least
I say that Rodent's Anand should do better than Rodent's Spassky
E.g according chessmetrics.com: the difference is about 60 Elo:
http://www.chessmetrics.com/cm/CM2/Peak ... 0024310100
But unfortunately in my test...both settings are very close in strength...
Even RIII Spassky seems to be stronger than RIII Anand...sad...
Code: Select all
1 RIII Default 61.5/90
2 RIII Spassky 52.0/90
3 RIII Anand 50.5/90
4 Fruit 2.1 16.0/90
Meanwhile,
I have a note for BIG brains too, e.g these guys often say
Such as meaningless test.. or such as old, bad, poor engines etc.
Ok.here is a test, played by much newer engine: Shashchess 34.5
What I can say more,
Very likely... SH Capablanca came from other planet )
Definitely not from our planet, where now we live...))
Because too much diff. (in Elo points) from reality !
NON-NNUE Test + Capablanca setting is enabled...plus
The Elo settings is configured to play at 2700 Elo:
Code: Select all
1 Shashchess Capablanca +97/-0/=3 98.50% 98.5/100
2 Prodeo186 Kasparov +0/-97/=3 1.50% 1.5/100
Other series of tests, for example:
Shashchess's NN + Capablanca setting is enabled,
But configurated/reduced to play at 2600 Elo level...
Code: Select all
1 Shashchess Capablanca NN +96/-0/=4 98.00% 98.0/100
2 Prodeo186 Kasparov +0/-96/=4 2.00% 2.0/100
Here is last test, used Fruit 2.1 engine as alternative,
Where Shashchess configurated to play at 2000 Elo level,
But as we see here too.. unfortunately nothing correct!
If still not so clear, Fruit 2.1 is close to GMs level
Code: Select all
1 Shashchess Capablanca +98/-0/=2 99.00% 99.0/100
2 Fruit 2.1 +0/-98/=2 1.00% 1.0/100
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 2m+1s, 128 MB Hash, 4-MEN
-----------------------------------------------------------
Ok.. as final words,
it's clear that mostly tests are not close to GMs strength levels!
And I hope also that next engine releases to be better in quality!
If nothing else, at least all chess engines to be with less bugs...!
GAMES:
https://mega.nz/file/T1hXVY7Y#GMTHgsfbJ ... QSMGE_MZaQ
Best Wishes,
Sedat