Strange Stockfish behavior?

zamar · Post by **zamar** » Wed Mar 30, 2011 6:02 pm

hyatt wrote:
Namely that on a fail-high followed by a fail-low, you believe the fail low is correct every time. Bogus concept.

And you are suggesting that we should believe fail-high instead... I wonder what makes that less bogus.

And what should we do when facing fail-low + fail-high combo?

Uly · Post by **Uly** » Wed Mar 30, 2011 8:03 pm

zamar wrote:And what should we do when facing fail-low + fail-high combo?

What about resolving the real score of the move? That's what the user does when he forces the move that failed high and it turns to be better.

zamar · Post by **zamar** » Wed Mar 30, 2011 10:30 pm

Uly wrote:
zamar wrote:And what should we do when facing fail-low + fail-high combo?
What about resolving the real score of the move?

There is no such a thing as "real score of the move". Just pick up any very strong engine and a complex position, turn on SMP mode and do a fixed depth search many times. You get a different score each time (sometimes even different bestmove!).

hyatt · Post by **hyatt** » Thu Mar 31, 2011 4:12 am

There are two possible cases here.

If you get a fail high first, and then a fail low, I would trust the fail high. It is almost always caused by a TT entry that is a bound, left over from a deeper search. You get the fail high, come back with the research with a higher beta value, and now that TT entry can't cause the cutoff, you can't search deep enough to see the "good thing" so you end up returning alpha. This fail high is a good move.

If you first fail low, I always buy that and move on. If you want to resolve the score for that move and you re-search, you lower alpha, and you might not get a good score, and instead get a beta score and a fail high, for the inverse of the reason given above. Either don't do the research, and see if you can find a better move in the rest of the list, or if you do do the re-search, you have to accept the fail high, relax beta, and try yet again...

Throwing information away is not the answer. It is the question. No is the answer.

Why is trusting the first fail high more than the following fail low not just as bogus? See my explanation about the TT LOWER entry that causes the fail high, but can't help us resolve the real score.

hyatt · Post by **hyatt** » Thu Mar 31, 2011 4:15 am

zamar wrote:
Uly wrote:
zamar wrote:And what should we do when facing fail-low + fail-high combo?
What about resolving the real score of the move?
There is no such a thing as "real score of the move". Just pick up any very strong engine and a complex position, turn on SMP mode and do a fixed depth search many times. You get a different score each time (sometimes even different bestmove!).

If you take "real score" to be what it really is in an SMP search, it is a real value but it has a normal distribution scattered around the single-cpu score. Which is OK.

There are lots of ways to implement the PVS at the root idea. Belle used to remember the first fail high, but do nothing unless another move fails high. Then it would re-search the first move, get a real score, and then re-search the second with a null-window with alpha = new score to see if the second move failed high or failed low. If low, it was discarded as inferior to the first. If high, it was remembered, but not resolved, unless _another_ move fails high at this depth. Only problem is there is no move to ponder lots of times where one move fails high and the others all fail low.

Uly · Post by **Uly** » Thu Mar 31, 2011 8:59 am

For what is worth, I've seen how Rybka and Naum behave on these situation. This is an example:

Depth 19 - 19.d4 is best move with score 0.60
Depth 20 starts - 19.d4 fails low as 0.45--
Engine starts analyzing the alternative moves
19.Nf3 is discarded
19.c4 is discarded
19.e4 fails high as 0.75++

Now what Rybka and Naum do is ordering this move first, that is, the GUI was showing "4/16" (e4 is 4th best move, out of 16) before this fail high happens, and aftwerwards, they show "1/16" (e4 becomes the new best move out of 16). Then the score of e4 is resolved, just as if depth 19 had ended with e4 as the best move.

This behavior seems satisfactory.

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 31, 2011 10:21 am

hyatt wrote:There are two possible cases here.

If you get a fail high first, and then a fail low, I would trust the fail high. It is almost always caused by a TT entry that is a bound, left over from a deeper search. You get the fail high, come back with the research with a higher beta value, and now that TT entry can't cause the cutoff, you can't search deep enough to see the "good thing" so you end up returning alpha. This fail high is a good move.

If you first fail low, I always buy that and move on. If you want to resolve the score for that move and you re-search, you lower alpha, and you might not get a good score, and instead get a beta score and a fail high, for the inverse of the reason given above. Either don't do the research, and see if you can find a better move in the rest of the list, or if you do do the re-search, you have to accept the fail high, relax beta, and try yet again...

Throwing information away is not the answer. It is the question. No is the answer.

Why is trusting the first fail high more than the following fail low not just as bogus? See my explanation about the TT LOWER entry that causes the fail high, but can't help us resolve the real score.

There could be another explanation for search instability than TT overwrites: aggressive pruning, be it LMR, razoring or what have you. Suppose you have found move X to be the best for a few iterations now, and trying to resolve its value one more time. All opponent replies to X that appear to be "desperate" at shallow depths, in the sense that a static eval is below -beta, will be pruned or heavily reduced. Such moves could be sacrifices, or otherwise positionally inferior-looking moves. But these moves could have a very deep tactical justification. This will only appear when beta is high enough (i.e. -beta low enough, and the situation for the opponent itself is "desperate" enough). Then move X will actually fail low. It might not make move X a losing move, but at least move X won't give you the advantage you thought it would. In any case, there is ample reason to continue searching and not blindly playing move X.

zamar · Post by **zamar** » Thu Mar 31, 2011 10:29 am

Uly wrote:For what is worth, I've seen how Rybka and Naum behave on these situation. This is an example:

Depth 19 - 19.d4 is best move with score 0.60
Depth 20 starts - 19.d4 fails low as 0.45--
Engine starts analyzing the alternative moves
19.Nf3 is discarded
19.c4 is discarded
19.e4 fails high as 0.75++

Now what Rybka and Naum do is ordering this move first, that is, the GUI was showing "4/16" (e4 is 4th best move, out of 16) before this fail high happens, and aftwerwards, they show "1/16" (e4 becomes the new best move out of 16). Then the score of e4 is resolved, just as if depth 19 had ended with e4 as the best move.

SF does practically the same. Only the output to GUI is different. The question we are now talking about is: What should one do when the "resolved score" fails low.

Uly · Post by **Uly** » Thu Mar 31, 2011 1:59 pm

zamar wrote:SF does practically the same. Only the output to GUI is different. The question we are now talking about is: What should one do when the "resolved score" fails low.

In this example, the resolved score won't fail low unless it's <0.45 (I've actually never seen a 0.75++ fail high with Rybka or Naum that fails that badly).

But the main difference in behavior here is that if 19.e4 fails low, the engine would go and search all the other alternative moves, it doesn't magically make 19.d4 the main move again like Stockfish does.

hyatt · Post by **hyatt** » Thu Mar 31, 2011 6:14 pm

Rein Halbersma wrote:
hyatt wrote:There are two possible cases here.

If you get a fail high first, and then a fail low, I would trust the fail high. It is almost always caused by a TT entry that is a bound, left over from a deeper search. You get the fail high, come back with the research with a higher beta value, and now that TT entry can't cause the cutoff, you can't search deep enough to see the "good thing" so you end up returning alpha. This fail high is a good move.

If you first fail low, I always buy that and move on. If you want to resolve the score for that move and you re-search, you lower alpha, and you might not get a good score, and instead get a beta score and a fail high, for the inverse of the reason given above. Either don't do the research, and see if you can find a better move in the rest of the list, or if you do do the re-search, you have to accept the fail high, relax beta, and try yet again...

Throwing information away is not the answer. It is the question. No is the answer.

Why is trusting the first fail high more than the following fail low not just as bogus? See my explanation about the TT LOWER entry that causes the fail high, but can't help us resolve the real score.
There could be another explanation for search instability than TT overwrites: aggressive pruning, be it LMR, razoring or what have you. Suppose you have found move X to be the best for a few iterations now, and trying to resolve its value one more time. All opponent replies to X that appear to be "desperate" at shallow depths, in the sense that a static eval is below -beta, will be pruned or heavily reduced. Such moves could be sacrifices, or otherwise positionally inferior-looking moves. But these moves could have a very deep tactical justification. This will only appear when beta is high enough (i.e. -beta low enough, and the situation for the opponent itself is "desperate" enough). Then move X will actually fail low. It might not make move X a losing move, but at least move X won't give you the advantage you thought it would. In any case, there is ample reason to continue searching and not blindly playing move X.

Then perhaps "static eval" is not a good way to prune/reduce moves in the first place?

And nobody is suggesting that you "blindly play move X". If, at the start of depth N, move A is best, then B fails high and fails low on the re-search, B is the move to play. Unless C, D, ... fail high in which case _they_ become the best move. But I would _never_ reject a fail-high move. To do so rejects the entire idea behind alpha/beta. If some search idea breaks that, then the idea is not sound.

OpenChess

OpenChess

Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?