robbolito wrote:Dr.Hyatt,hyatt wrote:JackStraw wrote:4'2" / i7-940 @ 3.75 / 3 cores ea / ponder off / Vista x64 / F12 / 512 mg / Nunn2 openings /
Rybka 4 default settings + 3-4-5 egtb
Ivanhoe - default except 64 mg pawn hash + Robbobases
Testing : IvanHoe 52gU + High Priority Search on , diff styles
IvanHoe(custom style).............50.5.............+19,-18,=63
Rybka4...............................49.5
IvanHoe(ivanhoe style)............49................+20,-22,=58
Rybka4...............................51
Tom Glenn
I am seriously hoping nobody is drawing conclusions from these kinds of results??? The error bar is huge. There's not enough information to prove which is better, much less by how much.
these are just scores from the games played.No conclusions are drawn due to small number of games.Also takes time to collect enough games for some meaningful conclusion.At 10 min. per engine just for 30 games it takes more than 9 hours to finish .But if more people would participate in the testing decent number of games could be collected for reasonable conclusion.
These games are also for fun of testing new versions that are released in attempt to make the engines better.Some are successful and some are not but IH engines are group project that is still going in full swing.
It is difficult to use multiple testers. Which book (if any). Which time control? Which hash size? Which processor and speed? Changing any of those causes accuracy issues.
100 games between two programs that are very close in rating is effectively useless. When you look more carefully and see a bunch of draws, and one or two extra wins by one program, that is simply random luck. If two programs are within 3-4 elo of each other, you need tens of thousands of games, under identical test conditions, to reach a conclusion. Probably closer to 100,000 games, than to 50,000, in fact...