A Talkchess thread: Misinformation being spread

General discussion about computer chess...
Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: A Talkchess thread: Misinformation being spread

Post by Adam Hair » Wed Jan 19, 2011 9:03 pm

orgfert wrote:
Adam Hair wrote:
orgfert wrote:
Adam Hair wrote:I was merely refering to the loss of statistical information in regards to the engines that are static.

Again, I have to say that one major goal of the CCRL is to test as many engines as possible. To accomplish this,
some concessions are made. Ponder is turned off. Some information for each engine is sacrificed in order to aquire
some information on a larger group of engines. It is as simple as that.
Under the current regimen, information is lost regarding the real differences between static and dynamic AI. And as already mentioned, pondering off doesn't fit the rationale of your methodology. Your reply to Bob indicated that the rating list was not a competition, which doesn't seem a valid argument in that human rating lists are composed of games entirely from competitions.
The intent of the creators of the rating list has some role in this, don't you think? The insistence that any chess rating list must have
the same intent as human rating lists is monomaniac. Our intent has been to provide the relative rating for as many engines as
possible with some attempt at statistical significance. Does the attempt fall short? Yes. Are some aspects of some engines ignored?
Yes? Does it invalidate the results from our testing? I argue that it does not, as long as the weaknesses in this approach are kept in
mind.

This is the list that 3 of us tend to concentrate on: http://www.computerchess.org.uk/ccrl/40 ... e_cpu.html

How many of these engines can be found on other lists with this many games played? If it has no significance to you, so be it. It does
have some significance to others. To me, its biggest failing is that it does not include more engines at this point.
But according to this method, it seems to me that even a billion "non-competitive" games between all the AI one might find to include could never produce anything more significant than a rating list of arbitrarily crippled AI. I think this could objectively be the lists biggest failing.
All I can say is that if the focus of testing is on a particular aspect of chess engines ( effects of search and evaluation on playing strength),
then certain other aspects should be minimized. You have made clear that you disagree with this focus. I don't think that the results
from this focus are irrelevant. We disagree.

Post Reply