orgfert wrote:The real test is if you could do that to a human chess player, would he disagree with you?You make it sound as if there is not very much left after turning off books, ponder, and learning. I believe many authors would disagree with you.
Ask a human intelligence whether the things you turn off in an artificial intelligence are extras he can do without.Adam Hair wrote:Put at a disadvantage? No. Invalidate their standing? No. I would think some of the extras are there in order to make the chess engine play more interesting, not because it would help it be higher on any rating list.
They determine how to limit and hobble all the AI, which must surely affect differing designs by differing and unknowable quanta. The result is then considered a scientific effort at objective measurement, though how it could be considered so with such arbitrary interference is puzzling.Adam Hair wrote:Do you really think that list makers determine what should be in a chess program?
I think you'll find I give them almost no credit (no offense intended) due to arbitrary tampering with the designs. I think this is done innocently in ignorance.Adam Hair wrote:You give the whole group too much credit. Some authors undoubtly strive to climb the lists. Others pay more attention to giving their program a full set of features
Adam Hair wrote:Anybody who does not understand that computer chess is artificial intellegence needs to do some reading. Yet,
simply testing for engine strength does not dismiss that connection. How do you think Bob Hyatt tests Crafty?
With books, ponder, and learning on? No. When he competes with Crafty, then yes.
Your intent really is not that much different than the intent of a rating list. One purpose of the CCRL lists is tohyatt wrote: This is _really_ mixing apples and oranges. In my testing, I am not trying to find out how much better or worse my program is when compared to another program. I am testing different versions of the _same_ program to see if the changes are good or bad. That is far different than the intent of a chess tournament or a rating list.
compare engine versions. When I test a new version of an engine, I set up a gauntlet against the opponents the
older version played against, in order to have some comparison between the two versions. The main difference
in your intent and my intent is that many of your test runs are checking individual changes, whereas I am checking
the end result of all the changes made. Other than that, there does not appear to be much difference in intent. You
check to see if some change in code results in an increase in Elo in comparison to the previous test version relative
to the gauntlet of engines each version played against.
Actually, rating lists are trying to do the same sort of measurement, just for many more engines.hyatt wrote: I eliminate pondering, because it increases randomness. I eliminate the book because I don't want to have to test every opening and I don't want to deal with the interference learning can cause. I don't use SMP because that is a performance issue that has nothing to do with modifying the program's search or evaluation to improve them. And all I am trying to measure is the change(s) we makes to the evaluation or search. Was the change good or bad.
Definitely for a tournament. If I was competing in a tournament, I would want to use anything associated with myhyatt wrote: A rating list or a tournament is a different thing. There, the "whole system" is under scrutiny, book, search, eval, speed (smp included) endgame tables, learning, whatever else your engine does to make it play better.
program that could help me win. However, a rating list is not a competitive event.
hyatt wrote: So, as I said, this is apples and oranges. How I test has nothing to do with how one should conduct a tournament, nor a rating list...
I used you as an example because how you test is widely known. How I conduct tests for the CCRL is similar, yet it
is not based on how you do it. It is based on the fact that it is the best way to conduct the testing, given the goals
and constraints I have.
hyatt wrote: I agree with your comments below. We already know that a book can make a huge difference in real games. A good book will both (a) guide the program into openings where it plays well and (b) guide the program away from openings where its eval or search seem ill-suited to handle. This means that one can either try to fix a hole in their evaluation, or they can use their book to avoid that hole. If you graft an odd book onto a program, you deny it this protection that the author depended on, and the results can be artificially worse. Or if your opponent uses a book that is ill-suited to it, your program might look better if it forces the opponent into openings it would normally avoid.
The only question is, which is better? If you only want to compare engines, no books could work, assuming you expect all engines to play all openings equally skillfully. But none of the authors really believe that is possible. Humans don't play that way, we avoid that which we don't understand or are unfamiliar with.
Here you briefly take my point, but then immediately toss it aside with little attempt at explanation.
But in this case, he is tuning search and eval only. To do this, he must isolate it from its dynamic AI functions. Why rating lists would only be interested in a subset of the total AI seems strange. Why is no one interested in the total AI?Adam Hair wrote:But when he wants to find out if some changes in the code makes Crafty stronger, all of that is turned off. The same for other authors.
And the rating lists serve as a check for them.
Ok, but this has been clear from the start. What is not so clear is the reason why no one wants to know the relative strength of the AI.Adam Hair wrote:We are not giving any program a UL listing. The fact is this: we are testing the chess engine, not the chess program.
What you are calling bells and whistles are the holy grail of AI. One wonders what we are endeavoring to discover by crippling whatever abilities have been achieved. I don't understand the answers that have be given to this so far.Adam Hair wrote:Start testing all the bells and whistles yourself.
I'm somewhat at a loss since it seems completely obvious. Testing competing AI's would seem to be a goal with no shortage of champions, yet one finds it a goal of almost no one. And when it is suggested, eyebrows are raised as if the suggestion were utterly ludicrous (complete with laughing emoticons).Adam Hair wrote:You certainly feel strong about this. However, the strength of your convictions does not determine whether you are
right or wrong about an issue.