Here are my latest complaints and results:
Rybka 2.3.2a also has "movetime" problems (of course, it is oodles better than Rybka 1.0 Beta
which typically uses 5 or so times the amount of time desired). An example:
Code: Select all
Rybka 2.3.2a
go movetime 1000
[...]
info time 814 nodes 186010 nps 233997
bestmove b1c3 ponder g8f6
Code: Select all
* movetime
search exactly x mseconds
One of the advantages of fixed depth and no SMP is (exact) reproducibility; another is that you can test engines in parallel with no worries. However, one problem with "go depth X" is that my (current) positional suite has some with one side up by a lot, and so the +5.07 hash bug/design of some Rybkas makes it painfully slow to reach even depth 10 in many positions. So that's another thing to worry about when constructing a test suite. [Given the amount of problems for testing specifications extant already with a small number of engines, I hesitate to consider everything that could wrong when a more numerous comparison is made].
I might also add that back when these "clone detectors" were first discussed many months ago, I had actually isolated the eval() function in Rybka 3, etc., and Alan Sassler had done some correlation analysis on the numbers generated (I think I had 1 million positions, as it takes so little time, but I forget, and the Rybka forum has it all hidden by now). Again this would be a superior method to determine correlation of evaluation output (whether at the level of "framework" or "numerology" is a different question), though this requires some work to achieve a functional set-up for engines that do not provide source code.