To kick off some technical discussions

Code, algorithms, languages, construction...
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: To kick off some technical discussions

Post by hyatt » Thu Jun 10, 2010 6:41 pm

Chan Rasjid wrote:Should not dumb captures deleted!

A QxN with the N protected by a pawn is very bad for the Q to make. It might be possible to delete in QS or deeper in QS. In full search, I can't see why they could not be reduced.

Of course, it is easier to detect protect by P/N. Protect by B/R mor costly

Rasjid

I have _always_ discarded losing captures (according to SEE) in the q-search. In Crafty and in Cray Blitz. But discarding them in the main search is dangerous because you would never see the result of a sound sacrifice since the sacrifice would not be searched. But you can expend less effort to dismiss it, without giving up on it completely.

Mincho Georgiev
Posts: 31
Joined: Thu Jun 10, 2010 7:30 am
Contact:

Re: To kick off some technical discussions

Post by Mincho Georgiev » Thu Jun 10, 2010 8:54 pm

hyatt wrote:
Mincho Georgiev wrote:I thought about it while ago. Let's hope that my SEE inefficiency will not eat up the real gain. Thanks for pointing that out, Bob!
A couple of key points.

(1) I always sort captures based on SEE (except for obvious winning moves like PxQ or BxR which win no matter what). Those captures with expected gain >= 0 are searched right after the hash table move.

(2) If you look at the crafty source, I have a "phase" variable that tells me what phase of the move selection I am in, from "HASH_MOVE" to "CAPTURE_MOVES" to "KILLER_MOVES" to "REMAINING_MOVES". I do not reduce moves until I get to REMAINING_MOVES (effectively the L (late) in LMR. So for me, there is no extra SEE calls. I have already used SEE to choose which captures are searched in CAPTURE_MOVES, leaving the rest for REMAINING_MOVES. I therefore reduce anything in REMAINING_MOVES (except for moves that give check). So there is really no extra SEE usage at all.

The net effect of this is to simply expend less effort on bad captures, just as we expend less effort on moves that appear to be ineffective for other reasons.
Thanks, I will look into that!

mcostalba
Posts: 91
Joined: Thu Jun 10, 2010 11:45 pm
Real Name: Marco Costalba

Re: To kick off some technical discussions

Post by mcostalba » Thu Jun 10, 2010 11:51 pm

hyatt wrote:I have just finished testing a small modification to Crafty's LMR algorithm. When everyone started this, conventional wisdom said that certain moves should not be reduced, in particular captures. I've changed this particular idea just a bit. Since I always use SEE (or a faster equivalent when possible) to order captures, so that winning or equal captures are searched early, while losing captures are delayed until after the hash move, good captures, and killer moves, I wondered "Why would I not want to reduce dumb captures?" A simple fix to address this and I got a quick +15 Elo verified with my usual 30,000 game cluster tests.

Easy enough for anyone to try. One other idea for another 5-10 Elo is that I also now no longer extend checks that appear unsafe according to SEE.

Roughly +25 elo for a very easy effort.
They look as 2 interesting ideas. I like the first one, although a bad capture sub-tree has such a low score (much below beta) that as soon as arrives at razoring depths it is almost instantly discarded anyway.

The second one is interesting, we already do something similar in qsearch, actually there we prune altoghter negative SEE check moves.

Anyhow thanks for the ideas :-)

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: To kick off some technical discussions

Post by hyatt » Fri Jun 11, 2010 2:11 am

mcostalba wrote:
hyatt wrote:I have just finished testing a small modification to Crafty's LMR algorithm. When everyone started this, conventional wisdom said that certain moves should not be reduced, in particular captures. I've changed this particular idea just a bit. Since I always use SEE (or a faster equivalent when possible) to order captures, so that winning or equal captures are searched early, while losing captures are delayed until after the hash move, good captures, and killer moves, I wondered "Why would I not want to reduce dumb captures?" A simple fix to address this and I got a quick +15 Elo verified with my usual 30,000 game cluster tests.

Easy enough for anyone to try. One other idea for another 5-10 Elo is that I also now no longer extend checks that appear unsafe according to SEE.

Roughly +25 elo for a very easy effort.
They look as 2 interesting ideas. I like the first one, although a bad capture sub-tree has such a low score (much below beta) that as soon as arrives at razoring depths it is almost instantly discarded anyway.

The second one is interesting, we already do something similar in qsearch, actually there we prune altoghter negative SEE check moves.

Anyhow thanks for the ideas :-)

I also cull q-search checks with SEE < 0, although I don't do the SEE test until the very last instant, hoping that I get a cutoff before I have to do that test...

Uri_Blass
Posts: 2
Joined: Fri Jun 11, 2010 7:03 am
Real Name: Uri Blass

Re: To kick off some technical discussions

Post by Uri_Blass » Fri Jun 11, 2010 7:56 am

hyatt wrote:I have just finished testing a small modification to Crafty's LMR algorithm. When everyone started this, conventional wisdom said that certain moves should not be reduced, in particular captures. I've changed this particular idea just a bit. Since I always use SEE (or a faster equivalent when possible) to order captures, so that winning or equal captures are searched early, while losing captures are delayed until after the hash move, good captures, and killer moves, I wondered "Why would I not want to reduce dumb captures?" A simple fix to address this and I got a quick +15 Elo verified with my usual 30,000 game cluster tests.

Easy enough for anyone to try. One other idea for another 5-10 Elo is that I also now no longer extend checks that appear unsafe according to SEE.

Roughly +25 elo for a very easy effort.
I wonder if you tried reducing promotions(this is an idea that I read in the past and I believe that it is a good idea).
I know that it is not LMR because you do not search promotions early but I believe that reducing promotions can help chess programs because you may want to search to relatively bigger depth positions when there is a pawn in the 7th and there is a high uncertainty if the pawn is going to promote.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: To kick off some technical discussions

Post by hyatt » Fri Jun 11, 2010 2:02 pm

Uri_Blass wrote:
hyatt wrote:I have just finished testing a small modification to Crafty's LMR algorithm. When everyone started this, conventional wisdom said that certain moves should not be reduced, in particular captures. I've changed this particular idea just a bit. Since I always use SEE (or a faster equivalent when possible) to order captures, so that winning or equal captures are searched early, while losing captures are delayed until after the hash move, good captures, and killer moves, I wondered "Why would I not want to reduce dumb captures?" A simple fix to address this and I got a quick +15 Elo verified with my usual 30,000 game cluster tests.

Easy enough for anyone to try. One other idea for another 5-10 Elo is that I also now no longer extend checks that appear unsafe according to SEE.

Roughly +25 elo for a very easy effort.
I wonder if you tried reducing promotions(this is an idea that I read in the past and I believe that it is a good idea).
I know that it is not LMR because you do not search promotions early but I believe that reducing promotions can help chess programs because you may want to search to relatively bigger depth positions when there is a pawn in the 7th and there is a high uncertainty if the pawn is going to promote.
Promotions are part of the capture search. Those that appear safe are not reduced. Those that appear to lose material are reduced. So the answer is "yes". Normally promotions are searched early, but only those that don't drop the pawn outright.

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: To kick off some technical discussions

Post by Chris Whittington » Fri Jun 11, 2010 2:56 pm

hyatt wrote:
Chris Whittington wrote:
hyatt wrote:I have just finished testing a small modification to Crafty's LMR algorithm. When everyone started this, conventional wisdom said that certain moves should not be reduced, in particular captures. I've changed this particular idea just a bit. Since I always use SEE (or a faster equivalent when possible) to order captures, so that winning or equal captures are searched early, while losing captures are delayed until after the hash move, good captures, and killer moves, I wondered "Why would I not want to reduce dumb captures?" A simple fix to address this and I got a quick +15 Elo verified with my usual 30,000 game cluster tests.

Easy enough for anyone to try. One other idea for another 5-10 Elo is that I also now no longer extend checks that appear unsafe according to SEE.

Roughly +25 elo for a very easy effort.
Hi Bob,

As I read you, also many times in the past, your objective is to raise ELO. In practice, for you and also the others, this means raising ELO against a pool of other (similarly minded?) machines. I know you and others also play and test against allcomers on servers but in practice I suspect that coding changes are realistically and more or less exclusively tested by multi machine-machine games.

My assertion has always been that machine pool testing runs the danger of leading up a cul de sac but I have a another criticism .....

What the technique does is to optimise statistically, try to win more often than lose over a large game set and a decent sized pool of machine opponents of course. Any statistical optimisation means, you win some, you lose some, but, more importantly, it means you do well in a large pool of positions/games (caveat: against like-behaving opponents), ok in a large pool of positions and badly in a (small?) pool of positions. For an intelligent opponent therefore, strategy should be to steer into those regions where your machine statistically does bad and although those regions may be small in your testing, they may be larger in reality (whatever that might mean!) and larger against intelligently organised opposition.

I guess what I'm saying is that although your testing methodology appears smart, appears to work, it is in fact, due to its all encompassing statistical nature really quite dumb. I bet there is no looking at individual games/positions from a chess perspective looking for what went wrong, or for weaknesses, because the methodology isn't interested in individual games or positions.

And, everytime you are pruning something, a bad capture, a bad check (which may not be of course) you are opening a window for a smart entity to step through and take you on a journey you never expected.
As the old 14th century world maps used to have marked on them in various places, "Here there be dragons..." that applies here as well. I am playing against a pool of strong programs, including stockfish which may well be the best there is right now or at least very close to it. If I statistically win more games against a pool of good players using a large set of very equal starting positions, I'm confident that we are getting better and better. Other independent testing has verified this as we continue to do better in every rating list that exists.

We _do_ look at individual games, just not the games we play on the cluster (at 30,000 games every couple of hours or so, that is intractable). But we still play lots of games vs Rybka, Robo* and such on ICC, and the longer games we spend a lot of time examining. Others in our group play slow games on local hardware and look at the games carefully there as well.

There is no doubt that every "trick" opens a hole. But if it improves Elo, it also must be closing an even larger hole. I see no other way to make real progress. We tried, for years, to look at games and make judgements. And when we started the cluster stuff, we did both to see how they compared. And quite often a "fix" that improved the play in a single game we were examining would cause a drastic drop in Elo overall. Now we are fairly confident that any fix we incorporate actually works in a large set of different circumstances without breaking anything.
The above just confirms the point ....

you are hill climbing, together with a bunch of relatively similar machines, all competing against he same metric (statistical win rate, aka ELO)

there's no guarantee, and in fact it is very likely, that the hill being climbed is no way the highest hill around, and when you get to the top, or near the top, there's no way to jump over onto another higher hill to repeat the process, because, as you say "quite often a "fix" that improved the play in a single game we were examining would cause a drastic drop in Elo overall" which has the effect of keeping you on the same hill.

In fact, I think a case can be made that your ELO can continue to rise through the methodology used even when you got to the top of the (non-optimal) hill already and there's nowhere higher to be gone to.

zamar
Posts: 16
Joined: Thu Jun 10, 2010 1:28 am

Re: To kick off some technical discussions

Post by zamar » Fri Jun 11, 2010 6:02 pm

Chris Whittington wrote: there's no guarantee, and in fact it is very likely, that the hill being climbed is no way the highest hill around, and when you get to the top, or near the top, there's no way to jump over onto another higher hill to repeat the process, because, as you say "quite often a "fix" that improved the play in a single game we were examining would cause a drastic drop in Elo overall" which has the effect of keeping you on the same hill.

In fact, I think a case can be made that your ELO can continue to rise through the methodology used even when you got to the top of the (non-optimal) hill already and there's nowhere higher to be gone to.
It's easy to agree with you in theory. The practical problem is that there is no point in claiming that "your testing method is sub-optimal" if you aren't able to propose better testing method.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: To kick off some technical discussions

Post by hyatt » Sat Jun 12, 2010 2:36 am

Chris Whittington wrote:
The above just confirms the point ....

you are hill climbing, together with a bunch of relatively similar machines, all competing against he same metric (statistical win rate, aka ELO)

there's no guarantee, and in fact it is very likely, that the hill being climbed is no way the highest hill around, and when you get to the top, or near the top, there's no way to jump over onto another higher hill to repeat the process, because, as you say "quite often a "fix" that improved the play in a single game we were examining would cause a drastic drop in Elo overall" which has the effect of keeping you on the same hill.

In fact, I think a case can be made that your ELO can continue to rise through the methodology used even when you got to the top of the (non-optimal) hill already and there's nowhere higher to be gone to.
The alternative is witchcraft/voodoo/magic/etc. I personally believe that with a large set of opening positions, and opponents that are stronger than me, so long as I can close the gap I am getting better overall. This is a far sounder assumption than trying to look at a specific game, isolate a particular move, and adjust either the search or evaluation to choose a better move. Been there. Done that. Got the T-shirt. It is a flawed methodology.

As I mentioned, we tried a few of these early on, just to get a feel for what we _had_ been doing. I would see a game played on ICC where I could analyze and determine some point where a losing move was made. And with (sometimes) some GM help, we'd look at the good move vs the bad move, and try to determine if it was depth or knowledge that caused the error. And for the normal cases, after we came up with a fix that made us switch from the bad move to the good one, cluster testing would often show that the "fix" hurt overall. It is _very_ difficult to envision how a change in the evaluation for this position will effect all the other similar but subtly different positions we have to play through.

We often find new ideas by looking at individual games, but this is usually in the form of "we are just not evaluating this very well" or "we have no term that attempts to quantify this particular positional concept". But as we fix those things, we don't just use the game where we made a boo-boo, we play 30,000 games to make sure that it helps in more cases than it hurts. Which guarantees upward progress. I had way too many steps backward with crafty prior to cluster-testing. Others seem to be doing the same, although with different approaches. Rybka apparently plays about 40,000 game in 1 second games to tune things. I prefer to occasionally vary the time control to make sure that something that helps at fast games doesn't hurt at slow games.

But the point is that this is an objective mechanism, not subjective. Lots of "good ideas" have been tossed out because even though they sounded reasonable, we could not find any implementation that didn't hurt overall results. I like the idea of making a change, then running a quick test and in an hour have a really good idea of whether the idea as implemented worked or not. If not, we try to figure out why as on quite a few occasions, the idea was good, but the implementation had a bug. Humans think too highly of their subjective abilities. I've drifted away from that approach after proving over and over that my subjective opinion was quite a bit away from the real truth.

Is it possible to reach a local maxima? Of course. But we are not doing automated tuning, we are making changes and testing the resulting programs. Which means that as a human, we can recognize a trend that needs attention and do something about it, even if it requires a complete rewrite of something, such as king safety or pawn structure or whatever.

User avatar
thorstenczub
Posts: 593
Joined: Wed Jun 09, 2010 12:51 pm
Real Name: Thorsten Czub
Location: United States of Europe, germany, NRW, Lünen
Contact:

Re: To kick off some technical discussions

Post by thorstenczub » Sat Jun 12, 2010 9:09 am

zamar wrote:
Chris Whittington wrote: there's no guarantee, and in fact it is very likely, that the hill being climbed is no way the highest hill around, and when you get to the top, or near the top, there's no way to jump over onto another higher hill to repeat the process, because, as you say "quite often a "fix" that improved the play in a single game we were examining would cause a drastic drop in Elo overall" which has the effect of keeping you on the same hill.

In fact, I think a case can be made that your ELO can continue to rise through the methodology used even when you got to the top of the (non-optimal) hill already and there's nowhere higher to be gone to.
It's easy to agree with you in theory. The practical problem is that there is no point in claiming that "your testing method is sub-optimal" if you aren't able to propose better testing method.
i was recently involved in helping don to play out some eng-eng games on my computers.
others did so too. don invited people in public to help him. the engines, all different
komodo versions, played on my computers against each other. all done automatically.
time control was 30 seconds for the whole game, with a fischer bonus of 0.5 second.
e.g. on my quadcore, 8 different komodo versions played 4 games.
this way a large amount of games was "generated" easily.

the idea is that this large amount of games allows to
measure the ELO difference of these "similar" versions of komodo.
to differenciate between good, worse and better versions.
kind of evolution process. first mutation (done by ideas of the programmer and new implemented stuff), and selection by fighting against eacht other, results in THE BEST CREATURE wins. so evolution :-)

engines played arround without operators watching the games.
only the results were used.

the question for me was: is this incest testing really allowing to make progress ?
in old days of computerchess we generated games (of course much longer time controls)
and WATCHED the games and were looking for errors.
then programmer tried to fixed it, and the new engine was again on the autoplayers.

Post Reply