A Talkchess thread: Misinformation being spread

General discussion about computer chess...
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: A Talkchess thread: Misinformation being spread

Post by BB+ » Wed Jan 12, 2011 8:16 am

As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.

orgfert
Posts: 183
Joined: Fri Jun 11, 2010 5:35 pm
Real Name: Mark Tapley

Re: A Talkchess thread: Misinformation being spread

Post by orgfert » Wed Jan 12, 2011 6:21 pm

BB+ wrote:
As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.
It would be interesting if a strong engine were released that had no facility for custom books or switching off pondering. Would it wag the dog of current rating methodology or would it be excluded from the lists?

orgfert
Posts: 183
Joined: Fri Jun 11, 2010 5:35 pm
Real Name: Mark Tapley

Re: A Talkchess thread: Misinformation being spread

Post by orgfert » Wed Jan 12, 2011 10:50 pm

Adam Hair wrote:Try thinking through the consequences of the objections you make, in regards to constructing a rating list, and
get back to me on that. Ease for the testers is not the only concern and, in fact, a small concern.
I got back to you here.

I'd be interested in your response if any.

I don't know why the rating clubs could not ask authors to supply what they consider optimal settings and books for their designs, with learning and pondering switched on, and let those programs that can grow do so and those that can't remain static. Then it would be like a regular player rating list. If it's good enough for biological intelligence, it should be good enough for artificial intelligence.

Comments? Criticisms?

Anyone?

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: A Talkchess thread: Misinformation being spread

Post by BB+ » Thu Jan 13, 2011 8:26 am

HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it.
Another (ostentatiously provocative) way of stating this is: every commercial engine in the TCEC Premier Division not named "Rybka" has book learning at the very least! :!: According to the conspiracy theorists, this must be why CCRL disabled it... :mrgreen:

Other examples that are claimed to have some sort of learning include Genius 5, Fritz 10 (maybe just "persistent hash"), and Spike. http://www.rybkaforum.net/cgi-bin/rybka ... l?tid=2401

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: A Talkchess thread: Misinformation being spread

Post by Uly » Thu Jan 13, 2011 6:38 pm

What kind of learning do Hiarcs and Naum have? I've never seen it (unless Hiarcs got to it on the 13 series.)

Tony Mokonen
Posts: 16
Joined: Mon Jun 14, 2010 7:11 pm

Re: A Talkchess thread: Misinformation being spread

Post by Tony Mokonen » Thu Jan 13, 2011 7:29 pm

kingliveson wrote:Who ever thought there could be so much drama in computer chess?! :)
Were you not around in the good old days, when Bobby Fischer was requesting special chairs and complaining about movie camera noise?

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: A Talkchess thread: Misinformation being spread

Post by Adam Hair » Fri Jan 14, 2011 4:20 am

orgfert wrote: I don't see the relevance of testing programs that have been stripped to the least common denominator. I thought the point would be to discover the relative strengths of competing designs. Listing the strength of a lobotomized design looks meaningless.
You make it sound as if there is not very much left after turning off books, ponder, and learning. I believe many
authors would disagree with you.
orgfert wrote: So you will admit that the few that use these techniques are put at a disadvantage? This would invalidate their standings in the lists, yes?


Put at a disadvantage? No.
Invalidate their standing? No.
I would think some of the extras are there in order to make the chess engine play more interesting, not because
it would help it be higher on any rating list.
orgfert wrote: Not to mention that proliferation of these methods gives no incentive for programmers to design outside the boundaries of the testing methods, since they know anything else will be disabled by the list makers. And it seems a tacit opinion inherent in the testing process that chess on computers should not be treated a AI, since it's treatment as "intelligent behavior" is subject to surgery to dumb-down designs to a common level. We would never do this to biological intelligence.

IOW, computer chess is not considered AI by the masses, but only the stripped-down designs to the search itself, toss learning, full time management, self-tuned books, etc, even though they are striving for a scientific process in testing. But they aren't discovering the relative strengths of each AI design at all.


Do you really think that list makers determine what should be in a chess program? You give the whole group too
much credit. Some authors undoubtly strive to climb the lists. Others pay more attention to giving their program
a full set of features.

Anybody who does not understand that computer chess is artificial intellegence needs to do some reading. Yet,
simply testing for engine strength does not dismiss that connection. How do you think Bob Hyatt tests Crafty?
With books, ponder, and learning on? No. When he competes with Crafty, then yes. But when he wants to find
out if some changes in the code makes Crafty stronger, all of that is turned off. The same for other authors.
And the rating lists serve as a check for them.
orgfert wrote:
Adam Hair wrote:6) Each rating list is an attempt at something approaching a scientific measurement of engine strength. How close
the approach comes is open to opinion. :) In each case, there is an attempt to eliminate sources of variation.
Sometimes there are some trade offs ( more testers allow for more games and engines but creates more statistical
noise), but at least there is some idea of each engine's strength ( there are many more that should be been tested).
orgfert wrote: This approach fundamentally destroys many design elements of a computer chess player's strength. Even if you discover that specific elements tend to make little difference, you are blinding the test to potentially effective strategies when they arrive in newer, more innovative versions.

Therefore, testing should be careful to include all design elements in a system for evaluation, whether they are deemed to differentiate or not. This is a fundamental principle that should never be violated.
orgfert wrote:
Adam Hair wrote: This is done quite often in science : Define what you are trying to measure, try to eliminate sources of variation, then measure it.
This is not a concern in chess player rating lists. It is strange to test a program in a way that it will not be used by the consumer, much less intended by its designer for real competitive play. The usefulness of the lists, while accepted by almost everyone, cannot be considered accurate to each programs design. People are essentially looking at inaccurate results, with most apparently not realizing it.
We are not giving any program a UL listing. The fact is this: we are testing the chess engine, not the chess program.
orgfert wrote: What about the design goal of the designer? How about testing the designs? CCRL-type testing looks exactly like taking a race cars and removing their engines and just testing the engines in a lab, as though nobody cares about the transmission, suspension or aerodynamics. It's like saying that racing is all about engines.


Start testing all the bells and whistles yourself.
Adam Hair wrote:And there are a lot of engines out there, many being updated and new engines arriving each month. The
CCRL has been trying to test as many of them as possible. This goal may be at odds with what you would like to see done.
It has been helpful to others.

Are you also caught up with the notion that the CCRL is some kind of accreditation organization? :roll:
If we were, then our tests should include all design elements. Well, we are not and do not pretend to be.
orgfert wrote: I've no such notions. It might be one thing if this was but one of several kinds of lists. But for it to be the principle bellwether for chess AI, while very well intentioned, is nevertheless a Wrong Thing.

Wrong Thing: n. A design, action, or decision that is clearly incorrect or inappropriate. Often capitalized; always emphasized in speech as if capitalized. The opposite of the Right Thing; more generally, anything that is not the Right Thing. In cases where ‘the good is the enemy of the best’, the merely good — although good — is nevertheless the Wrong Thing. “In C, the default is for module-level declarations to be visible everywhere, rather than just within the module. This is clearly the Wrong Thing.”
You certainly feel strong about this. However, the strength of your convictions does not determine whether you are
right or wrong about an issue.

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: A Talkchess thread: Misinformation being spread

Post by Adam Hair » Fri Jan 14, 2011 4:26 am

Prima wrote:
Adam Hair wrote:Prima,

Which CCRL list are you refering to and are you talking about 1 CPU, 2 CPU, or 4 CPU versions? If you look at the various
lists available at CCRL, you will see that, depending on the conditions, the results differ. Also keep in mind that there
is a lag between the release of an engine and when it gets tested by the CCRL. And it may get not get tested under all
of the various conditions at the same time. Taking your recollection to be true, there are multiple reasons why you
could have seen the ratings you saw and the ratings as they are now without involving some conspiracy.
I was referring to the entire CCRL lists, specifically the 4CPU lists on various time-control.

The "lag or periods between releases" excuse given by CCRL really applies to engines stronger than Rybka.....in this case and time-frame of Rybka 2.3.2a, Naum 4.1 and DS12. Their "lag between" versions on both Naum 4.1 and Deep Shredder 12 was not just mere weeks or couple of months. The so call "lag" conveniently spanned close to the release of Rybka 3. In the meantime, both Naum 4.1 and Deep Shredder 12 were still placed under Rybka 2.3.2a.

For me, I took both authors' words about the masssive improvements on their respective engines. It's really not hard to figure if their respective engines would do better than the-then champion, Rybka 2.3.2a. All I did was read on each author's estimated (or minimum expected )ELO, added it to each & respective previous engine's ELO and compared it against Rybka 2.3.2a...and it was easy to deduct that Rybka 2.3.2a was surpassed in strength by both engines. But of course, People weren't allowed to see that by all cost, at least till Rybka 3 was out and reigning once again. No wonder some people didn't know this.

Now the "innocence/Misinformation" game is played by a personnel of CCRL. Who's kidding who? I may not be able to produce pages in which, after Mr. Naumov stated the massive ELO increase in Naum 4.1(or 4.x?), which indicated it overtook Rybka 2.3.2a (my interpretation here), it was still not reflected in CCRL...for months!. Same applies for Deep Shredder 12, albeit it was released later in 2009.
There are several people who have noted that they do not recollect the same as you do, several who have no
connection with the CCRL. Seeing as how I am connected, I will not dispute you on this. However, I will say that
if the CCRL was sending out misinformation, I would be a severely pissed-off person. But I have not seen a shred
of evidence myself.

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: A Talkchess thread: Misinformation being spread

Post by Adam Hair » Fri Jan 14, 2011 4:32 am

orgfert wrote:
BB+ wrote:
As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.
It would be interesting if a strong engine were released that had no facility for custom books or switching off pondering. Would it wag the dog of current rating methodology or would it be excluded from the lists?
Most likely it would not be tested.

Adam Hair
Posts: 104
Joined: Fri Jun 11, 2010 4:29 am
Real Name: Adam Hair
Contact:

Re: A Talkchess thread: Misinformation being spread

Post by Adam Hair » Fri Jan 14, 2011 4:42 am

orgfert wrote:
Adam Hair wrote:Try thinking through the consequences of the objections you make, in regards to constructing a rating list, and
get back to me on that. Ease for the testers is not the only concern and, in fact, a small concern.
I got back to you here.

I'd be interested in your response if any.
Sorry that I did not reply with in the 15 hours between the last post and this post.
orgfert wrote: I don't know why the rating clubs could not ask authors to supply what they consider optimal settings and books for their designs, with learning and pondering switched on, and let those programs that can grow do so and those that can't remain static. Then it would be like a regular player rating list. If it's good enough for biological intelligence, it should be good enough for artificial intelligence.
Actually, no human is static. So, the list you want would not really resemble a regular player list.
orgfert wrote: Comments? Criticisms?

Anyone?
You do realize that sometimes a person isn't always able to respond immediately. Life gets in the way sometimes.

Post Reply