As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
A Talkchess thread: Misinformation being spread
Re: A Talkchess thread: Misinformation being spread
Re: A Talkchess thread: Misinformation being spread
It would be interesting if a strong engine were released that had no facility for custom books or switching off pondering. Would it wag the dog of current rating methodology or would it be excluded from the lists?BB+ wrote:As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
Re: A Talkchess thread: Misinformation being spread
I got back to you here.Adam Hair wrote:Try thinking through the consequences of the objections you make, in regards to constructing a rating list, and
get back to me on that. Ease for the testers is not the only concern and, in fact, a small concern.
I'd be interested in your response if any.
I don't know why the rating clubs could not ask authors to supply what they consider optimal settings and books for their designs, with learning and pondering switched on, and let those programs that can grow do so and those that can't remain static. Then it would be like a regular player rating list. If it's good enough for biological intelligence, it should be good enough for artificial intelligence.
Comments? Criticisms?
Anyone?
Re: A Talkchess thread: Misinformation being spread
Another (ostentatiously provocative) way of stating this is: every commercial engine in the TCEC Premier Division not named "Rybka" has book learning at the very least! According to the conspiracy theorists, this must be why CCRL disabled it...HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it.
Other examples that are claimed to have some sort of learning include Genius 5, Fritz 10 (maybe just "persistent hash"), and Spike. http://www.rybkaforum.net/cgi-bin/rybka ... l?tid=2401
Re: A Talkchess thread: Misinformation being spread
What kind of learning do Hiarcs and Naum have? I've never seen it (unless Hiarcs got to it on the 13 series.)
-
- Posts: 16
- Joined: Mon Jun 14, 2010 7:11 pm
Re: A Talkchess thread: Misinformation being spread
Were you not around in the good old days, when Bobby Fischer was requesting special chairs and complaining about movie camera noise?kingliveson wrote:Who ever thought there could be so much drama in computer chess?!
Re: A Talkchess thread: Misinformation being spread
You make it sound as if there is not very much left after turning off books, ponder, and learning. I believe manyorgfert wrote: I don't see the relevance of testing programs that have been stripped to the least common denominator. I thought the point would be to discover the relative strengths of competing designs. Listing the strength of a lobotomized design looks meaningless.
authors would disagree with you.
orgfert wrote: So you will admit that the few that use these techniques are put at a disadvantage? This would invalidate their standings in the lists, yes?
Put at a disadvantage? No.
Invalidate their standing? No.
I would think some of the extras are there in order to make the chess engine play more interesting, not because
it would help it be higher on any rating list.
orgfert wrote: Not to mention that proliferation of these methods gives no incentive for programmers to design outside the boundaries of the testing methods, since they know anything else will be disabled by the list makers. And it seems a tacit opinion inherent in the testing process that chess on computers should not be treated a AI, since it's treatment as "intelligent behavior" is subject to surgery to dumb-down designs to a common level. We would never do this to biological intelligence.
IOW, computer chess is not considered AI by the masses, but only the stripped-down designs to the search itself, toss learning, full time management, self-tuned books, etc, even though they are striving for a scientific process in testing. But they aren't discovering the relative strengths of each AI design at all.
Do you really think that list makers determine what should be in a chess program? You give the whole group too
much credit. Some authors undoubtly strive to climb the lists. Others pay more attention to giving their program
a full set of features.
Anybody who does not understand that computer chess is artificial intellegence needs to do some reading. Yet,
simply testing for engine strength does not dismiss that connection. How do you think Bob Hyatt tests Crafty?
With books, ponder, and learning on? No. When he competes with Crafty, then yes. But when he wants to find
out if some changes in the code makes Crafty stronger, all of that is turned off. The same for other authors.
And the rating lists serve as a check for them.
orgfert wrote:Adam Hair wrote:6) Each rating list is an attempt at something approaching a scientific measurement of engine strength. How close
the approach comes is open to opinion. In each case, there is an attempt to eliminate sources of variation.
Sometimes there are some trade offs ( more testers allow for more games and engines but creates more statistical
noise), but at least there is some idea of each engine's strength ( there are many more that should be been tested).orgfert wrote: This approach fundamentally destroys many design elements of a computer chess player's strength. Even if you discover that specific elements tend to make little difference, you are blinding the test to potentially effective strategies when they arrive in newer, more innovative versions.
Therefore, testing should be careful to include all design elements in a system for evaluation, whether they are deemed to differentiate or not. This is a fundamental principle that should never be violated.
We are not giving any program a UL listing. The fact is this: we are testing the chess engine, not the chess program.orgfert wrote:This is not a concern in chess player rating lists. It is strange to test a program in a way that it will not be used by the consumer, much less intended by its designer for real competitive play. The usefulness of the lists, while accepted by almost everyone, cannot be considered accurate to each programs design. People are essentially looking at inaccurate results, with most apparently not realizing it.Adam Hair wrote: This is done quite often in science : Define what you are trying to measure, try to eliminate sources of variation, then measure it.
orgfert wrote: What about the design goal of the designer? How about testing the designs? CCRL-type testing looks exactly like taking a race cars and removing their engines and just testing the engines in a lab, as though nobody cares about the transmission, suspension or aerodynamics. It's like saying that racing is all about engines.
Start testing all the bells and whistles yourself.
Adam Hair wrote:And there are a lot of engines out there, many being updated and new engines arriving each month. The
CCRL has been trying to test as many of them as possible. This goal may be at odds with what you would like to see done.
It has been helpful to others.
Are you also caught up with the notion that the CCRL is some kind of accreditation organization?
If we were, then our tests should include all design elements. Well, we are not and do not pretend to be.
You certainly feel strong about this. However, the strength of your convictions does not determine whether you areorgfert wrote: I've no such notions. It might be one thing if this was but one of several kinds of lists. But for it to be the principle bellwether for chess AI, while very well intentioned, is nevertheless a Wrong Thing.
Wrong Thing: n. A design, action, or decision that is clearly incorrect or inappropriate. Often capitalized; always emphasized in speech as if capitalized. The opposite of the Right Thing; more generally, anything that is not the Right Thing. In cases where ‘the good is the enemy of the best’, the merely good — although good — is nevertheless the Wrong Thing. “In C, the default is for module-level declarations to be visible everywhere, rather than just within the module. This is clearly the Wrong Thing.”
right or wrong about an issue.
Re: A Talkchess thread: Misinformation being spread
There are several people who have noted that they do not recollect the same as you do, several who have noPrima wrote:I was referring to the entire CCRL lists, specifically the 4CPU lists on various time-control.Adam Hair wrote:Prima,
Which CCRL list are you refering to and are you talking about 1 CPU, 2 CPU, or 4 CPU versions? If you look at the various
lists available at CCRL, you will see that, depending on the conditions, the results differ. Also keep in mind that there
is a lag between the release of an engine and when it gets tested by the CCRL. And it may get not get tested under all
of the various conditions at the same time. Taking your recollection to be true, there are multiple reasons why you
could have seen the ratings you saw and the ratings as they are now without involving some conspiracy.
The "lag or periods between releases" excuse given by CCRL really applies to engines stronger than Rybka.....in this case and time-frame of Rybka 2.3.2a, Naum 4.1 and DS12. Their "lag between" versions on both Naum 4.1 and Deep Shredder 12 was not just mere weeks or couple of months. The so call "lag" conveniently spanned close to the release of Rybka 3. In the meantime, both Naum 4.1 and Deep Shredder 12 were still placed under Rybka 2.3.2a.
For me, I took both authors' words about the masssive improvements on their respective engines. It's really not hard to figure if their respective engines would do better than the-then champion, Rybka 2.3.2a. All I did was read on each author's estimated (or minimum expected )ELO, added it to each & respective previous engine's ELO and compared it against Rybka 2.3.2a...and it was easy to deduct that Rybka 2.3.2a was surpassed in strength by both engines. But of course, People weren't allowed to see that by all cost, at least till Rybka 3 was out and reigning once again. No wonder some people didn't know this.
Now the "innocence/Misinformation" game is played by a personnel of CCRL. Who's kidding who? I may not be able to produce pages in which, after Mr. Naumov stated the massive ELO increase in Naum 4.1(or 4.x?), which indicated it overtook Rybka 2.3.2a (my interpretation here), it was still not reflected in CCRL...for months!. Same applies for Deep Shredder 12, albeit it was released later in 2009.
connection with the CCRL. Seeing as how I am connected, I will not dispute you on this. However, I will say that
if the CCRL was sending out misinformation, I would be a severely pissed-off person. But I have not seen a shred
of evidence myself.
Re: A Talkchess thread: Misinformation being spread
Most likely it would not be tested.orgfert wrote:It would be interesting if a strong engine were released that had no facility for custom books or switching off pondering. Would it wag the dog of current rating methodology or would it be excluded from the lists?BB+ wrote:As I mentioned above, there might be a generation gap here. Many engines of 10 years ago, particularly if they had their own book, had at least "book learning". HIARCS is a modern one that has kept this. Shredder has such capabilities too, as does Naum. So there are still some notables (3 of 8 in TCEC Division 1) with it. I think I've noted before that the UCI plug-and-play of books has decreased the incentive for engines in the lower Elo realms to have their own book, and they are respectively less likely to have "learning" features.As far as I have seen, very few have been. Crafty, ProDeo, and (I think) RomiChess come to mind. I am sure that there are some others.[
Re: A Talkchess thread: Misinformation being spread
Sorry that I did not reply with in the 15 hours between the last post and this post.orgfert wrote:I got back to you here.Adam Hair wrote:Try thinking through the consequences of the objections you make, in regards to constructing a rating list, and
get back to me on that. Ease for the testers is not the only concern and, in fact, a small concern.
I'd be interested in your response if any.
Actually, no human is static. So, the list you want would not really resemble a regular player list.orgfert wrote: I don't know why the rating clubs could not ask authors to supply what they consider optimal settings and books for their designs, with learning and pondering switched on, and let those programs that can grow do so and those that can't remain static. Then it would be like a regular player rating list. If it's good enough for biological intelligence, it should be good enough for artificial intelligence.
You do realize that sometimes a person isn't always able to respond immediately. Life gets in the way sometimes.orgfert wrote: Comments? Criticisms?
Anyone?