Page 1 of 2

Rating list question

Posted: Sat Jun 19, 2010 7:12 pm
by Hood
Hi,

are open-chess forum members able to provide independent rating list ?

I think it would be the shorter way to finish the 'known problem' :-).

Rgds Hood.

Re: Rating list question

Posted: Sat Jun 19, 2010 7:58 pm
by LetoAtreides82
I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.

Re: Rating list question

Posted: Sat Jun 19, 2010 10:18 pm
by LucenaTheLucid
I posted this idea in another topic. Currently I am gathering up some old cpu's to start out with. This is going to be no easy task.

Re: Rating list question

Posted: Sun Jun 20, 2010 9:37 am
by thorstenczub
i don't think that one more rating list is making the results and ELO more accurate.

Re: Rating list question

Posted: Sun Jun 20, 2010 9:58 am
by yanquis1972
i talked about something like this as well. we finally have an ippolit engine being 'officially tested' by ingo, but all other rating lists ignore these monsters afaik. ingo has a particular way of testing (ponder on, short time controls) & another 'open' rating list of some kind would be complementary and not redundant, i think. i myself would like to see one that puts as much emphasis on simulating basic human analysis as possible. not exactly sure what that would be; i've thought about time controls like 2+12, ponder off, but that may be too offbeat.

Re: Rating list question

Posted: Sun Jun 20, 2010 10:35 am
by Chris Whittington
Rating lists as they are currenlty compiled are worse than useless and perpetuate by positive feedback the cul-de-sac that is chess program development.

Fundamentally a rating list should be independent of the development process. In other words it should measure something that can't be easily predicted by developers.

Think about a parallel system for a minute. Education and exams. If the exam is known to teachers beforehand, and teachers are rewarded by class exam results, then they will teach "to the exam", even, at worst, teaching known answers to the known questions. This is no education that anyone could want, but it leads to successful exam grades (teachers like it, their pay depends), continual use of the same exam boards that set the questions (exam boards like that) and collusion between teachers and exam boards which can result in payoffs, manipulations, fraud and worse.

Now, translate this back to computer chess. Developers work, basically all in the same way at present, they make a tweak or write some new code and then autoplay test several tens of thousands of games to detect if there's a better win/draw/loss statistic. If there is they keep the version, else they tweak again. The education process for a chess engine.

Testers then autoplay several tens of thousands or more games, and present a list of win/draw/loss statistics (the exam). These statistics tell the developers nothing they didn't already know, instead they provide positive feedback to developers to continue doing what they do. The exam is known and developers work to the exam.

We get continued use of favoured exam boards, continued use of a flawed(I think so) development process that emphasises playing in the comp-comp pool only (which favours tactical developments over strategic developments) and opens up all possibilities for collusion, manipulation and corruption between examiners (rating lists) and teachers (programmers) and others too. Meanwhile the feedback loop between rating lists and developments pushes computer chess into a particular direction and pushes the rating lists to ever new and unrealistic highs as they get more and more out of tune with reality.

Solution is to dump rating lists and promote a different way to measure excellence. Even find a new measure of excellence.

Re: Rating list question

Posted: Sun Jun 20, 2010 11:23 am
by BTO7
[quote="LetoAtreides82"]I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.[/quote

You need to talk the guys around there into ....just being testers and being independent of engine controversy. Being neutral and testing all engines is in the best interest of chess players. When you test only for the engine makers ...us chess players lose out.

Regards
BT

Re: Rating list question

Posted: Sun Jun 20, 2010 1:33 pm
by Hood
BTO7 wrote:
You need to talk the guys around there into ....just being testers and being independent of engine controversy. Being neutral and testing all engines is in the best interest of chess players. When you test only for the engine makers ...us chess players lose out.

Regards
BT
That I wanted to hear and point out.
Time for independent rating list :-). All programs on similar = equal hardware, not neccessary the newest one. We need comparison of the programs not the hardware.
rgds Hood

Re: Rating list question

Posted: Sun Jun 20, 2010 2:45 pm
by thorstenczub
yanquis1972 wrote: 'officially tested'
there is nothing officially with rating lists. they are not scientific.

Re: Rating list question

Posted: Sun Jun 20, 2010 2:54 pm
by Gerold
LetoAtreides82 wrote:I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.
I have been testing Houdini and that family for 6 months. Its interesting to see how some have progressed and others
have not.
Lately i have been recording the results but keeping results private.

Best,
Gerold.