IvanHoe T52 testing

As in chess tournaments and matches...
Post Reply
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: IvanHoe T52 testing

Post by hyatt » Tue Aug 31, 2010 10:40 pm

robbolito wrote:
hyatt wrote:
JackStraw wrote:4'2" / i7-940 @ 3.75 / 3 cores ea / ponder off / Vista x64 / F12 / 512 mg / Nunn2 openings /

Rybka 4 default settings + 3-4-5 egtb

Ivanhoe - default except 64 mg pawn hash + Robbobases

Testing : IvanHoe 52gU + High Priority Search on , diff styles

IvanHoe(custom style).............50.5.............+19,-18,=63
Rybka4...............................49.5

IvanHoe(ivanhoe style)............49................+20,-22,=58
Rybka4...............................51

Tom Glenn

I am seriously hoping nobody is drawing conclusions from these kinds of results??? The error bar is huge. There's not enough information to prove which is better, much less by how much.
Dr.Hyatt,
these are just scores from the games played.No conclusions are drawn due to small number of games.Also takes time to collect enough games for some meaningful conclusion.At 10 min. per engine just for 30 games it takes more than 9 hours to finish .But if more people would participate in the testing decent number of games could be collected for reasonable conclusion.
These games are also for fun of testing new versions that are released in attempt to make the engines better.Some are successful and some are not but IH engines are group project that is still going in full swing.

It is difficult to use multiple testers. Which book (if any). Which time control? Which hash size? Which processor and speed? Changing any of those causes accuracy issues.

100 games between two programs that are very close in rating is effectively useless. When you look more carefully and see a bunch of draws, and one or two extra wins by one program, that is simply random luck. If two programs are within 3-4 elo of each other, you need tens of thousands of games, under identical test conditions, to reach a conclusion. Probably closer to 100,000 games, than to 50,000, in fact...

Vael Jean-Paul
Posts: 78
Joined: Thu Jun 10, 2010 7:59 am

Re: IvanHoe T52 testing

Post by Vael Jean-Paul » Wed Sep 01, 2010 12:51 pm

So to have a idea how strong one engine is i have this data: get 100.000 games and 30 games in 9hours !!

If i can calculate well then i have (100.000 / 30) *9 = 30.000Hours / 24 = 1250Days non stop running your computer :shock:
And we have here a few hunderd engines..so can anybody tell here which engine is the strongest,because no one in the world has
test these engines each 100.000games to know or to tell maybe this engine is the best one :D

Common prof.Hyatt with all respect..you are dreaming..what everybody has done from the beginning that people are testing engine is then useless?? Because in your view we can't (even ever) come to a conclusion..nobody can't have enough games!

I agree we can never have enough games..but if i have tested more then 20 years..every time start a new list..because these older engines comes to weak..or you buy a new much faster computer system and build again new lists..now on my i7 for the third time 5000games each..then i think you get a picture which engine plays better or not!

You don't need 100.000games..you need as much possible engines and let them play against each other,and with same total games each ,you get fast a idea how strong a engine is!

Kind regards,
JP.

User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

Re: IvanHoe T52 testing

Post by robbolito » Wed Sep 01, 2010 6:40 pm

Intel(R) Core(TM)2 Quad Q9550 2.83GHz4x @3.5 GHz 4,096 MB Memory
Microsoft Windows XP 64 Bit Professional Service Pack 2 (Build 3790)
Fritz Benchmark:
Speed: 20.90
KNS: 10032
GUI: CB Rybka 3
Book: Perfect 2009-10 moves
Hash: 256
RB and TB: ON
Ponder: OFF

Ivanhoe lost one game to no move bug against Houdini.
Very close score that shows similar strength of the programs.
Attachments
2010-09-01_123510.png
2010-09-01_123510.png (4.1 KiB) Viewed 2103 times

User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

Re: IvanHoe T52 testing

Post by robbolito » Thu Sep 02, 2010 9:11 pm

Intel(R) Core(TM)2 Quad Q9550 2.83GHz4x @3.5 GHz 4,096 MB Memory
Microsoft Windows XP 64 Bit Professional Service Pack 2 (Build 3790)
Fritz Benchmark:
Speed: 20.90
KNS: 10032
GUI: CB Rybka 3
Book: Power 2009-10 moves
Hash: 256
RB and TB: ON
Ponder: OFF
Attachments
2010-09-02_150922.png
2010-09-02_150922.png (3.79 KiB) Viewed 2068 times

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: IvanHoe T52 testing

Post by hyatt » Fri Sep 03, 2010 3:38 am

Vael Jean-Paul wrote:So to have a idea how strong one engine is i have this data: get 100.000 games and 30 games in 9hours !!

If i can calculate well then i have (100.000 / 30) *9 = 30.000Hours / 24 = 1250Days non stop running your computer :shock:
And we have here a few hunderd engines..so can anybody tell here which engine is the strongest,because no one in the world has
test these engines each 100.000games to know or to tell maybe this engine is the best one :D

Common prof.Hyatt with all respect..you are dreaming..what everybody has done from the beginning that people are testing engine is then useless?? Because in your view we can't (even ever) come to a conclusion..nobody can't have enough games!

I agree we can never have enough games..but if i have tested more then 20 years..every time start a new list..because these older engines comes to weak..or you buy a new much faster computer system and build again new lists..now on my i7 for the third time 5000games each..then i think you get a picture which engine plays better or not!

You don't need 100.000games..you need as much possible engines and let them play against each other,and with same total games each ,you get fast a idea how strong a engine is!

Kind regards,
JP.

The math is simple, as I said. If you have two engines that are within 2-3 elo of each other, you are going to need way over 100,000 games for each to get a rating down to the +/- 1-2 elo range. Can you draw conclusions with 1,000 games? Of course. You can also draw conclusions by flipping a coin. Neither is more accurate than the other if the engines are that close. And the engines being discussed here are _very_ close, in general.

So believe what you want. And continue to see reports where A is better than B, B is better than A, and A/B are equal. Because it takes a _ton_ of games. There is no sense in whining about how long it takes. It takes what it takes if you want accuracy. Otherwise, a coin toss will do just as well.

Vael Jean-Paul
Posts: 78
Joined: Thu Jun 10, 2010 7:59 am

Re: IvanHoe T52 testing

Post by Vael Jean-Paul » Fri Sep 03, 2010 10:15 am

Yes,everybody can understand that..and i don't see many people say after engine A played against B and have a conclusion!
You can choose two engines..and play million games..you still don't know if this engine is stronger..by luck you choose a engine A and let him play against B and he like his way off play..A can win easy from B ..but i can't say that A is stronger..because he can loose from C and so on.
So..i let A play against B,C,D..Z and every engine play against all off them! This is for me the fastest way to know whitch engine is stronger with the time and system i have to run games! Everybody wants more games and more time!

JP.

User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

Re: IvanHoe T52 testing

Post by robbolito » Fri Sep 03, 2010 1:16 pm

Intel(R) Core(TM) i7 Q- 960 3.2 GHz8x @ 4.005 GHz 4,096 MB Memory
Windows 7 Professional Professional (Build 7600)
Fritz Benchmarks:
Speed: 24.89
KNS: 11946
GUI: CB Rybka 3
Hash: 256
Book: Power Book 2010-10 moves
RB and TB: ON
Ponder: OFF


, Blitz:10' 0

Code: Select all

1   Houdini 1.03a x64 4_CPU  3190  +5/-2/=33 53.75%   21.5/40
2   IvanHoeT0.3A.x64         3190  +2/-5/=33 46.25%   18.5/40


User avatar
robbolito
Posts: 601
Joined: Thu Jun 10, 2010 3:48 am

Re: IvanHoe T52 testing

Post by robbolito » Fri Sep 03, 2010 1:28 pm

AMD Phenom(tm) II X6 1090T Processor6x @ 3.2 GHz 4,096 MB Memory
Windows 7 Home Premium Edition (Build 7600)
Fritz Benchmark:
Speed: 23.33
KNS: 11196
GUI: CB Rybka 3
Book: Power Book 2009- 10 moves
Hash: 256
RB and TB: ON
Ponder: OFF

Blitz:10' 0

Code: Select all

1   Houdini 1.03a x64 8_CPU  3190  +5/-5/=40 50.00%   25.0/50  625.00
2   IvanHoeT0.3A.x64         3190  +5/-5/=40 50.00%   25.0/50  625.00

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: IvanHoe T52 testing

Post by hyatt » Fri Sep 03, 2010 4:56 pm

Vael Jean-Paul wrote:Yes,everybody can understand that..and i don't see many people say after engine A played against B and have a conclusion!
You can choose two engines..and play million games..you still don't know if this engine is stronger..by luck you choose a engine A and let him play against B and he like his way off play..A can win easy from B ..but i can't say that A is stronger..because he can loose from C and so on.
So..i let A play against B,C,D..Z and every engine play against all off them! This is for me the fastest way to know whitch engine is stronger with the time and system i have to run games! Everybody wants more games and more time!

JP.

Sorry, but I do not understand your comment. It doesn't matter whether you play 100K games of A vs B, or you play 100K games of A vs everybody else and then 100K games of B vs everybody else. To get a rating for A or B that is within 1-2 Elo of the actual value, you need 100K games, period.

There is no "fastest" way to test. And if the engines are 1 elo apart in _real_ strength, you do not need millions of games. But you may well need 1/4 million games to get the error bar down to an acceptable level.

There is no way to "cheat" the time required. All games against one opponent. One game against 100,000 opponents. You still need the 100K games if the programs are within a 5-8 Elo window.

Vael Jean-Paul
Posts: 78
Joined: Thu Jun 10, 2010 7:59 am

Re: IvanHoe T52 testing

Post by Vael Jean-Paul » Fri Sep 03, 2010 7:17 pm

Of course you don't understand my comment,i'am not a proffessor :lol:

But okay..nobody in the world can tell which engine is the strongest.. it can be any engine that we know..and all rating list we see
or have or useless! Why we loose our time in testing then :?:
Lucky we do it for fun...or do you know the strongest engine..maybe!

JP.

Post Reply