Page 1 of 1

EN-Test 2022 - new testsuite

Posted: Wed Oct 19, 2022 4:06 pm
by Eduard Nemeth
I created a new test suite for Engines. Neither the ERET test nor the Stockfish 2021 test suite satisfied me.

Test suite Stockfish-2021 contains many nonsensical positions. What should be useful in a position where the best move is +10 and the second best move is +7 (tested with Stockfish)? It doesn't really matter whether the engine wins with +10 or only with +7. There are positions in the ERET test that are irrelevant in practice. The test also contains positions with a secondary solution.

Examples:
8/7p/5P1k/1p5P/5p2/2p1p3/P1P1P1P1/1K3Nb1 w - - 0 1

This position is even solved by some engines (including mine), but I still think it's useless in practice.

1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1

This position is also pointless.

2b1r3/r2ppN2/8/1p1p1k2/pP1P4/2P3R1/PP3PP1/2K5 w - - 0 2


Or this.

Some positions have good secondary solutions:

4r1k1/1r1np3/1pqp1ppB/p7/2b1P1PQ/2P2P2/P3B2R/3R2K1 w - - 0 28


Here Bg5 is just as good as Bg7 (ERET).

I wanted a test where all positions could be solved and corresponded to normal practice. So I have summarized the best positions for it from various test suites. I've added some interesting positions of my own that I've seen on the server in games. A test suite with 120 positions was created.

All of these positions were solved on my PC by some engine! The only question is: How much time do I give the engine? The test is intended to provide a rough estimate of the playing strength. That's why I won't test an engine with special settings like "Gold Drigger", not even in MV mode. It is about a rough assessment of the practical playing strength. I myself will test with 30s and 60s per position.

Download EN-Test 2022 (CBH und PGN Format)

https://filehorst.de/d/eefonGnl

and on my home page.

I myself use CBH format, if you prefer EPD you have to convert the PGN to EPD.

Eduard Nemeth

Re: EN-Test 2022 - new testsuite

Posted: Wed Oct 19, 2022 7:56 pm
by Eduard Nemeth
Results AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:

Solista Attack v2 (default), Result: 107 out of 120 = 89.1%. Solista Attack v2.txt (ZIP)
Blue Marlin 15.3a, Result: 105 out of 120 = 87.5%. BlueMarlin 15.3a.txt (ZIP)
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%. DarkSister 1.9a.txt (ZIP)
Shashchess 25 (default), Result: 98 out of 120 = 81.6%. Shashchess 25 .txt (ZIP)
Stockfish 161022, Result: 97 out of 120 = 80.8%. Stockfish.txt (ZIP)

TXT files download on my Homepage:
https://solistachess.jimdosite.com/testing/

Re: EN-Test 2022 - new testsuite

Posted: Wed Oct 19, 2022 8:09 pm
by Eduard Nemeth
Alternative download EN-Test 2022 Testsuite (60 Days)
https://pixeldrain.com/u/cEPxDG84

Re: EN-Test 2022 - new testsuite

Posted: Thu Oct 20, 2022 7:29 pm
by Eduard Nemeth
Results so far on AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:

Solista Classic, Result: 109 out of 120 = 90.8%. Solista-Classic.txt (ZIP)
Solista Attack v2 (default), Result: 107 out of 120 = 89.1%. Solista Attack v2.txt (ZIP)
Blue Marlin 15.3a, Result: 105 out of 120 = 87.5%. BlueMarlin 15.3a.txt (ZIP)
Swordfish 15.3a, Result: 104 out of 120 = 86.6%. Swordfish 15.3a.txt (ZIP)
Corchess 3 171022, Result: 100 out of 120 = 83.3%. Corchess 3 171022.txt (ZIP)
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%. DarkSister 1.9a.txt (ZIP)
Shashchess 25 (default), Result: 98 out of 120 = 81.6%. Shashchess 25 .txt (ZIP)
Stockfish 161022, Result: 97 out of 120 = 80.8%. Stockfish.txt (ZIP)
Eman 8.40, Result: 96 out of 120 = 80.0%. Eman 8.40.txt (ZIP)
Stockfish_FF2 150521, Result: 88 out of 120 = 73.3%. Stockfish_FF2.txt (ZIP)

Text files can be downloaded from my homepage:
https://solistachess.jimdosite.com/testing/

Re: EN-Test 2022 - new testsuite

Posted: Fri Oct 28, 2022 4:36 pm
by Eduard Nemeth
EN-Test 2022 under Fritz GUI, step by step:

I use Fritz 18 for my test. The test runs automatically.

Step 1: Load Engine

Image

Step 2: Select "Process Test Set"

Image

Step 3: Select the database (CBH format)

Image

Step 4: Select the time and click OK

Image

Step 5: The test will now start automatically

Image

GUI stops the process after the last position, and you can copy the result as text.

Re: EN-Test 2022 - new testsuite

Posted: Wed Nov 02, 2022 12:34 am
by Eduard Nemeth
Leptir 1 (my newest engine) is now #1. I don't think it gets any better. More than 112 solutions, is not feasible on my machine at only 30s.

Results AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:

Leptir 1, Result: 112 out of 120 = 93.3%. Leptir 1.txt (ZIP)
Solista Attack v3 (default), Result: 108 out of 120 = 90.0%. Solista Attack v3.txt (ZIP)
Blue Marlin 15.4, Result: 106 out of 120 = 88.3%. BlueMarlin 15.4.txt (ZIP)
Shashchess 25.2 GoldDigger, Result: 105 out of 120 = 87.5%. GoldDigger.txt (ZIP)
Swordfish 15.3a, Result: 104 out of 120 = 86.6%. Swordfish 15.3a.txt (ZIP)
Corchess 3 241022, Result: 102 out of 120 = 85.0%. Corchess 3 241022.txt (ZIP)
Stockfish dev 271022, Result: 99 out of 120 = 82.5%. Stockfish dev 271022.txt (ZIP)
ProteusSF-Piranha 220904, Result: 99 out of 120 = 82.5%. ProteusSF-Piranha.txt (ZIP)
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%. DarkSister 1.9a.txt (ZIP)
Shashchess 25 (default), Result: 98 out of 120 = 81.6%. Shashchess 25 .txt (ZIP)
Crystal 040722, Result: 98 out of 120 = 81.6%. Crystal 040722.txt (ZIP)
BrainLearn 19, Result: 97 out of 120 = 80.8%. BrainLearn 19.txt (ZIP)
Eman 8.40, Result: 96 out of 120 = 80.0%. Eman 8.40.txt (ZIP)
Kookaburra 1.01, Result: 95 out of 120 = 79.1%. Kookaburra 1.01.txt (ZIP)
Fat Titz 2, Result: 92 out of 120 = 76.6%. Fat Titz 2.txt (ZIP)
Cfish 250621, Result: 90 out of 120 = 75.0%. Cfish 250621.txt (ZIP)
Stockfish_FF2 150521, Result: 88 out of 120 = 73.3%. Stockfish_FF2.txt (ZIP)
RubiChess 2022 (1013), Result: 78 out of 120 = 65.0%. RubiChess.txt (ZIP)
Berserk 20220725, Result: 77 out of 120 = 64.1%. Berserk 20220725.txt (ZIP)
Koivisto 8.16, Result: 75 out of 120 = 62.5%. Koivisto 8.16.txt (ZIP)
Wasp 6.00, Result: 72 out of 120 = 60.0%. Wasp 6.00.txt (ZIP)
Powerfritz 18, Result: 69 out of 120 = 57.5%. Powerfritz 18.txt (ZIP)
Seer 2.6.0, Result: 69 out of 120 = 57.5%. Seer 2.6.0.txt (ZIP)
Fire NN 1072022, Result: 68 out of 120 = 56.6%. Fire NN 1072022.txt (ZIP)
rofChade 3.0, Result: 67 out of 120 = 55.8%. rofChade 3.0.txt (ZIP)
Rebel 15x2, Result: 66 out of 120 = 55.0%. Rebel 15x2.txt (ZIP)
Igel 3.1.0, Result: 56 out of 120 = 46.6%. Igel 3.1.0.txt (ZIP)

Textfiles on my homepage:
https://solistachess.jimdosite.com/testing/

Re: EN-Test 2022 - new testsuite

Posted: Thu Dec 01, 2022 11:27 am
by Eduard Nemeth
EN-Test 2022 - Starts on 06 Nov 2022:
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy.

New in the List now Brainlearn 20.1 vulkan (Download on my Homepage under Solista News, or here in Forum).

Top 10:

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3-4) Brainlearn 20.1 vulkan (AM/EN), Result: 110 out of 120 = 91.6%. Brainlearn 20.1 vulkan.txt (ZIP)
3-4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)

Textfiles on my Homepage:
https://solistachess.jimdosite.com/testing/

Re: EN-Test 2022 - new testsuite

Posted: Thu Dec 01, 2022 4:07 pm
by dorsz
Thx.