Rating list question
Rating list question
Hi,
are open-chess forum members able to provide independent rating list ?
I think it would be the shorter way to finish the 'known problem' .
Rgds Hood.
are open-chess forum members able to provide independent rating list ?
I think it would be the shorter way to finish the 'known problem' .
Rgds Hood.
Smolensk 2010. Murder or accident... Cui bono ?
There are not bugs free programms. There are programms with undiscovered bugs.
Alleluia.
There are not bugs free programms. There are programms with undiscovered bugs.
Alleluia.
-
- Posts: 32
- Joined: Thu Jun 10, 2010 12:46 am
Re: Rating list question
I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.
-
- Posts: 160
- Joined: Thu Jun 10, 2010 2:14 am
- Real Name: Luis Smith
Re: Rating list question
I posted this idea in another topic. Currently I am gathering up some old cpu's to start out with. This is going to be no easy task.
- thorstenczub
- Posts: 593
- Joined: Wed Jun 09, 2010 12:51 pm
- Real Name: Thorsten Czub
- Location: United States of Europe, germany, NRW, Lünen
- Contact:
Re: Rating list question
i don't think that one more rating list is making the results and ELO more accurate.
-
- Posts: 36
- Joined: Wed Jun 09, 2010 9:15 pm
Re: Rating list question
i talked about something like this as well. we finally have an ippolit engine being 'officially tested' by ingo, but all other rating lists ignore these monsters afaik. ingo has a particular way of testing (ponder on, short time controls) & another 'open' rating list of some kind would be complementary and not redundant, i think. i myself would like to see one that puts as much emphasis on simulating basic human analysis as possible. not exactly sure what that would be; i've thought about time controls like 2+12, ponder off, but that may be too offbeat.
- Chris Whittington
- Posts: 437
- Joined: Wed Jun 09, 2010 6:25 pm
Re: Rating list question
Rating lists as they are currenlty compiled are worse than useless and perpetuate by positive feedback the cul-de-sac that is chess program development.
Fundamentally a rating list should be independent of the development process. In other words it should measure something that can't be easily predicted by developers.
Think about a parallel system for a minute. Education and exams. If the exam is known to teachers beforehand, and teachers are rewarded by class exam results, then they will teach "to the exam", even, at worst, teaching known answers to the known questions. This is no education that anyone could want, but it leads to successful exam grades (teachers like it, their pay depends), continual use of the same exam boards that set the questions (exam boards like that) and collusion between teachers and exam boards which can result in payoffs, manipulations, fraud and worse.
Now, translate this back to computer chess. Developers work, basically all in the same way at present, they make a tweak or write some new code and then autoplay test several tens of thousands of games to detect if there's a better win/draw/loss statistic. If there is they keep the version, else they tweak again. The education process for a chess engine.
Testers then autoplay several tens of thousands or more games, and present a list of win/draw/loss statistics (the exam). These statistics tell the developers nothing they didn't already know, instead they provide positive feedback to developers to continue doing what they do. The exam is known and developers work to the exam.
We get continued use of favoured exam boards, continued use of a flawed(I think so) development process that emphasises playing in the comp-comp pool only (which favours tactical developments over strategic developments) and opens up all possibilities for collusion, manipulation and corruption between examiners (rating lists) and teachers (programmers) and others too. Meanwhile the feedback loop between rating lists and developments pushes computer chess into a particular direction and pushes the rating lists to ever new and unrealistic highs as they get more and more out of tune with reality.
Solution is to dump rating lists and promote a different way to measure excellence. Even find a new measure of excellence.
Fundamentally a rating list should be independent of the development process. In other words it should measure something that can't be easily predicted by developers.
Think about a parallel system for a minute. Education and exams. If the exam is known to teachers beforehand, and teachers are rewarded by class exam results, then they will teach "to the exam", even, at worst, teaching known answers to the known questions. This is no education that anyone could want, but it leads to successful exam grades (teachers like it, their pay depends), continual use of the same exam boards that set the questions (exam boards like that) and collusion between teachers and exam boards which can result in payoffs, manipulations, fraud and worse.
Now, translate this back to computer chess. Developers work, basically all in the same way at present, they make a tweak or write some new code and then autoplay test several tens of thousands of games to detect if there's a better win/draw/loss statistic. If there is they keep the version, else they tweak again. The education process for a chess engine.
Testers then autoplay several tens of thousands or more games, and present a list of win/draw/loss statistics (the exam). These statistics tell the developers nothing they didn't already know, instead they provide positive feedback to developers to continue doing what they do. The exam is known and developers work to the exam.
We get continued use of favoured exam boards, continued use of a flawed(I think so) development process that emphasises playing in the comp-comp pool only (which favours tactical developments over strategic developments) and opens up all possibilities for collusion, manipulation and corruption between examiners (rating lists) and teachers (programmers) and others too. Meanwhile the feedback loop between rating lists and developments pushes computer chess into a particular direction and pushes the rating lists to ever new and unrealistic highs as they get more and more out of tune with reality.
Solution is to dump rating lists and promote a different way to measure excellence. Even find a new measure of excellence.
Re: Rating list question
[quote="LetoAtreides82"]I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.[/quote
You need to talk the guys around there into ....just being testers and being independent of engine controversy. Being neutral and testing all engines is in the best interest of chess players. When you test only for the engine makers ...us chess players lose out.
Regards
BT
You need to talk the guys around there into ....just being testers and being independent of engine controversy. Being neutral and testing all engines is in the best interest of chess players. When you test only for the engine makers ...us chess players lose out.
Regards
BT
Re: Rating list question
That I wanted to hear and point out.BTO7 wrote:
You need to talk the guys around there into ....just being testers and being independent of engine controversy. Being neutral and testing all engines is in the best interest of chess players. When you test only for the engine makers ...us chess players lose out.
Regards
BT
Time for independent rating list . All programs on similar = equal hardware, not neccessary the newest one. We need comparison of the programs not the hardware.
rgds Hood
Smolensk 2010. Murder or accident... Cui bono ?
There are not bugs free programms. There are programms with undiscovered bugs.
Alleluia.
There are not bugs free programms. There are programms with undiscovered bugs.
Alleluia.
- thorstenczub
- Posts: 593
- Joined: Wed Jun 09, 2010 12:51 pm
- Real Name: Thorsten Czub
- Location: United States of Europe, germany, NRW, Lünen
- Contact:
Re: Rating list question
there is nothing officially with rating lists. they are not scientific.yanquis1972 wrote: 'officially tested'
Re: Rating list question
I have been testing Houdini and that family for 6 months. Its interesting to see how some have progressed and othersLetoAtreides82 wrote:I've been planning on testing engines like Houdini privately and in the same manner that I carry out the tests for CEGT Blitz, and compiling a private rating list. I might start testing Houdini 1.02 as soon as tomorrow.
have not.
Lately i have been recording the results but keeping results private.
Best,
Gerold.