Strange Stockfish behavior?

Uly · Post by **Uly** » Mon Mar 28, 2011 4:12 pm

I can't put the position but here's what is happening:

.25/10	 3:40 	-0.78--	31.Rg2 a5 32.b3
 25/03	 5:15 	-0.56++	31.Qe3 Nc4 32.Qc1 
 25/04	 7:17 	-0.89--	31.Rg2 a5 32.b3

Qe3 false fails high. Rg2 ended with a -0.82 score. Then user forces Qe3 and:

24/23	 0:55 	-0.79 	31...Rf8 32.b3

So it was indeed better, a false false fail high (because it was a real fail high).

I commented to Jeremy that I dislike how Stockfish handles fail lows, instead of resolving an exact score, when a fail low is hit, Stockfish goes and looks at all alternative moves for one that goes over the margin, and only after not finding anything it returns to mainline. This is a new behavior introduced by some Rybka version that engines like Naum or Critter have also copied.

It's in this stage that the false false fail high occurs. If Stockfish behaved differently, 31.Rg2 would have been scored -0.82 before 31.Qe3 was examined, Qe3 would have beaten the margin (the margin would have been -0.82 instead of -0.78) and Stockfish would have ended with the better move without requiring user interaction.

Jeremy Bernstein · Post by **Jeremy Bernstein** » Mon Mar 28, 2011 4:17 pm

I spent a little time looking at this today, but didn't figure out the problem yet. I did run into an assertion, though, so I'm looking into that, as well... Will keep you posted.

Jeremy Bernstein · Post by **Jeremy Bernstein** » Tue Mar 29, 2011 7:02 pm

OK, the issue appears to be with the razoring code in search(). There is even a comment there:

Code: Select all

// Logically we should return (v + razor_margin(depth)), but
// surprisingly this did slightly weaker in tests.

This "departure from logic" causes a) a couple of assertions to fire in a debug build (in particular the two "assert(tte->static_value() != VALUE_NONE)" in search() and qsearch()) and b) leads to the strange fail high/fail low behavior that Uly was observing. If I change the line to read

Code: Select all

return (v + razor_margin(depth)); // was previously returning 'v'

the assertions disappear, and the engine no longer throws out unlikely high scores for low-score moves, as far as I can tell in my limited testing. I tend to think that the strength of an engine should be based on deliberately functioning code, rather than buggy luck, so I'll probably check this code into the PA_GTB repository and release some builds for testing in the next 24 hours, if anyone is interested.

I am not by any means an experienced engine developer, and I freely admit that I might be misunderstanding the significance of this code. However, it does lead to assertions and is clearly not correct.

Jeremy

Jeremy Bernstein · Post by **Jeremy Bernstein** » Tue Mar 29, 2011 7:14 pm

Jeremy Bernstein wrote:OK, the issue appears to be with the razoring code in search(). There is even a comment there:
Code: Select all
// Logically we should return (v + razor_margin(depth)), but
// surprisingly this did slightly weaker in tests.
This "departure from logic" causes a) a couple of assertions to fire in a debug build (in particular the two "assert(tte->static_value() != VALUE_NONE)" in search() and qsearch()) and b) leads to the strange fail high/fail low behavior that Uly was observing. If I change the line to read
Code: Select all
return (v + razor_margin(depth)); // was previously returning 'v'
the assertions disappear, and the engine no longer throws out unlikely high scores for low-score moves, as far as I can tell in my limited testing. I tend to think that the strength of an engine should be based on deliberately functioning code, rather than buggy luck, so I'll probably check this code into the PA_GTB repository and release some builds for testing in the next 24 hours, if anyone is interested.

I am not by any means an experienced engine developer, and I freely admit that I might be misunderstanding the significance of this code. However, it does lead to assertions and is clearly not correct.

Jeremy

Spoke too soon. 10 minutes later, I got the assertion. Crap. However, this does appear to solve the fail high/low issue. The assertion code can be easily rewritten to "do the right thing" if there is no static_value.

jb

Uly · Post by **Uly** » Wed Mar 30, 2011 3:13 am

Thanks Jeremy.

Jeremy Bernstein wrote:and the engine no longer throws out unlikely high scores for low-score moves

Wait, what I was experiencing was that those high scores were correct if the user forced the move. I hope to have time soon to try to create a test case that is reproducible at 1CPU from a position that is not from my ongoing games (I really hit a brick here and couldn't give useful feedback). One basically would like Stockfish to find the move that was best at the end of a given iteration, instead of showing a fail high and then abandoning the move, or not considering the move at all.

Jeremy Bernstein · Post by **Jeremy Bernstein** » Wed Mar 30, 2011 7:58 am

Uly wrote:Thanks Jeremy.

Jeremy Bernstein wrote:and the engine no longer throws out unlikely high scores for low-score moves
Wait, what I was experiencing was that those high scores were correct if the user forced the move. I hope to have time soon to try to create a test case that is reproducible at 1CPU from a position that is not from my ongoing games (I really hit a brick here and couldn't give useful feedback). One basically would like Stockfish to find the move that was best at the end of a given iteration, instead of showing a fail high and then abandoning the move, or not considering the move at all.

If I now force the move, the resulting score is now comparable to the analysis of the move at the previous depth. I'll send you a build via PM to test yourself, though.

mcostalba · Post by **mcostalba** » Wed Mar 30, 2011 8:07 am

hyatt wrote:This is a particular form of bug fixing called "treat the symptom rather than treating the bug".

false fail-highs should not occur...

I have written a patch to possibly fix this...will test in the next days.

zamar · Post by **zamar** » Wed Mar 30, 2011 10:56 am

hyatt wrote:How can it be correct to get a fail high and then not play that move??? If you fail high and it is not a better move, that's a bug.

Haven't you ever heard of search instability???

This is exactly what happens when you combine a) minimalistic aspiration window b) aggressive late move pruning close to leaves based on history heuristic c) very aggressive LMR based on history heuristic move ordering.

The result may not look pretty and scores go all over the places, but despite all this, it's the recipe which works best for us. Fixing "the bug" would make SF dozens of elo points weaker.

Of course we should report the user of false fail-highs, we have just been lazy to fix this.

hyatt · Post by **hyatt** » Wed Mar 30, 2011 4:23 pm

zamar wrote:
hyatt wrote:How can it be correct to get a fail high and then not play that move??? If you fail high and it is not a better move, that's a bug.
Haven't you ever heard of search instability???

This is exactly what happens when you combine a) minimalistic aspiration window b) aggressive late move pruning close to leaves based on history heuristic c) very aggressive LMR based on history heuristic move ordering.

The result may not look pretty and scores go all over the places, but despite all this, it's the recipe which works best for us. Fixing "the bug" would make SF dozens of elo points weaker.

Of course we should report the user of false fail-highs, we have just been lazy to fix this.

"Search instability" does not mean that you should ignore fail highs. If you fail high incorrectly, you have a bug. Plain and simple. Fail high followed by fail low is not uncommon and I've explained the deep-draft trans/ref entry issue as one well-known cause. But if your pruning is causing you to fail high when you should not, or to not fail high when you should, then it would seem more reasonable to fix that problem rather than just ignoring the condition, which makes a _really_ bad assumption. Namely that on a fail-high followed by a fail-low, you believe the fail low is correct every time. Bogus concept.

yes one might miss failing high because of a reduction or pruning. That's the risk. But you choose to accept it to do the pruning/reduction stuff. But when you first fail high, and then fail low, and you somehow assume that the fail low is correct, that makes no sense at all. And, in fact, most of the time the fail high is probably correct unless the program is broken... One does not want to ignore information a search gleans, just because it fails low on a re-search, where changing the alpha/beta window changes a _lot_ of things.

As far as "dozens of elo points weaker" goes, pure guesswork. I said "fix it". Not "break it further". There is a difference.

Or you can throw out the baby with the bath water and miss the elo that you are currently giving away. Not that I want to see SF get any stronger.

But it obviously will once this is done right.

Uly · Post by **Uly** » Wed Mar 30, 2011 4:35 pm

hyatt wrote:And, in fact, most of the time the fail high is probably correct unless the program is broken...

Yes, this is what I've been experiencing. Though I'm using it for analysis, the developers use it for games (where apparently, the "bug" brings elo).

Will try Jeremy's version ASAP.

OpenChess

OpenChess

Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?