Strange Stockfish behavior?

Code, algorithms, languages, construction...
User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 17, 2011 1:27 am

That change wouldn't do anything, what I would like is a patch that always resolves moves that fail high until their real score is known, in this case, I'd expect the change to find a5 always when it fails high, and thus, the version would find those moves without requiring user interaction (because, a5 is always better than Qf3, so what Stockfish is doing wrong is abandoning it).

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Sun Apr 17, 2011 2:33 am

Uly wrote:That change wouldn't do anything, what I would like is a patch that always resolves moves that fail high until their real score is known, in this case, I'd expect the change to find a5 always when it fails high, and thus, the version would find those moves without requiring user interaction (because, a5 is always better than Qf3, so what Stockfish is doing wrong is abandoning it).
You can't do what you are asking for. There are valid reasons for a fail high followed by a fail-low. That doesn't mean the move is bad, but it does mean you are not going to resolve the score this iteration. All you can do is take the fail-high move as best, and advance to the next iteration where you have a better chance of resolving the true score.

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 17, 2011 8:36 am

hyatt wrote:it does mean you are not going to resolve the score this iteration.
Why not? Other engines just restart the iteration, and resolve the fail low, and get a true score for the move that failed high-then low (though it's something that happens extremely rare anyway, I would have problems constructing a position that shows it, with Stockfish it was trivial), is this something Stockfish specific?

UncombedCoconut
Posts: 44
Joined: Thu Jun 10, 2010 1:43 am
Real Name: Justin Blanchard
Location: United States

Re: Strange Stockfish behavior?

Post by UncombedCoconut » Sun Apr 17, 2011 10:30 am

Uly wrote:That change wouldn't do anything, what I would like is a patch that always resolves moves that fail high until their real score is known, in this case, I'd expect the change to find a5 always when it fails high, and thus, the version would find those moves without requiring user interaction (because, a5 is always better than Qf3, so what Stockfish is doing wrong is abandoning it).
It won't do what you want, but it will help a user or a GUI track what the engine is thinking.
The patch does show where (in the code) one can put experimental treatments of moves that fail high and then fail low. But I basically agree with Bob that no patch can get quite the behavior you want, and (reluctantly!) agree with Marco and Joona that on average you'll get a stronger analysis by letting SF second-guess its fail-high moves.
I also feel like some terminology is being confused. In at least your position, there is no fail-low to resolve: the search that fails low is a widened-window search to decide the value of the move that failed high. The predicted value from the root position isn't changing. If you believe the subtree's fail-low, it just means that a move you'd expected to be bad actually is bad.

Ancalagon
Posts: 15
Joined: Sat Sep 11, 2010 1:24 pm

Re: Strange Stockfish behavior?

Post by Ancalagon » Sun Apr 17, 2011 12:31 pm

Uly wrote:
hyatt wrote:it does mean you are not going to resolve the score this iteration.
Why not? Other engines just restart the iteration, and resolve the fail low, and get a true score for the move that failed high-then low (though it's something that happens extremely rare anyway, I would have problems constructing a position that shows it, with Stockfish it was trivial), is this something Stockfish specific?
It is mainly a speed issue I think. Because a Fail Low or High is produced by a nullwindow search, the information is fast but limited, you only get an upper or lower limit on the score. It is a consequence of searching with very narrow windows (in Stockfish I mean) and also unstable lines where the moves tried can vary intentionally, to create a very little bit of Monte Carlo because you are looking at a very limited number of moves with Futility Pruning on, this compensates a bit for the general selectivity (of a nullwindow search). But Stockfish is not really tuned for great depths so, in my opinion, the nullwindow searches are relatively too unstable whenever they have to say something about a position to reach that is many plies away and the road to that position is selective in all those nodes in between. Hashtable saturation compounds matters. And there is a subjective effect because you expect lines to be more accurate when you have to wait longer for them :)

But okay, I agree with you Uly there are many ways that you could try to compensate for this general search instability effect, you just have to find something that is a bit elegant I think and fits with the minimal effort philosophy of Stockfish.
Resolving a score will always take time though, as Robert says..

Eelco

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Mon Apr 18, 2011 1:05 am

UncombedCoconut wrote:It won't do what you want, but it will help a user or a GUI track what the engine is thinking.
The user has seen the move that false failed high, and has to force it so Stockfish finds it's actually better. As long as Stockfish doesn't find the move on its own, the user isn't being helped.

This isn't a matter of speed, the current behavior is actually slower at finding such moves, and for the cases where the fail high is actually false, the user is wasting more time forcing the move unnecessarily to see if it's such a case. It's a loss-loss situation:

1.- Either the move that failed high and was abandoned is better, but Stockfish won't find it on its own.

2.-Or the move that failed high and was abandoned is worse, and the user will waste time investigating it.

That's why I think the current behavior is poor for analysis and interaction, though I guess my complain would go better on the "Designing an analysis friendly Stockfish?" thread (the current behavior is clearly unfriendly).

Richard Vida
Posts: 50
Joined: Thu Jun 10, 2010 12:48 am

Re: Strange Stockfish behavior?

Post by Richard Vida » Mon Apr 18, 2011 3:27 am

Uly wrote:
The user has seen the move that false failed high, and has to force it so Stockfish finds it's actually better. As long as Stockfish doesn't find the move on its own, the user isn't being helped.

This isn't a matter of speed, the current behavior is actually slower at finding such moves, and for the cases where the fail high is actually false, the user is wasting more time forcing the move unnecessarily to see if it's such a case. It's a loss-loss situation:

1.- Either the move that failed high and was abandoned is better, but Stockfish won't find it on its own.

2.-Or the move that failed high and was abandoned is worse, and the user will waste time investigating it.

That's why I think the current behavior is poor for analysis and interaction, though I guess my complain would go better on the "Designing an analysis friendly Stockfish?" thread (the current behavior is clearly unfriendly).
Hi Uly,

Sorry for interrupting this thread. But I think you can't have your cake and eat it too. The search instability you are trying to fight is a kind of built-in "feature" of SF. Get rid of instabilities - you will most probably end up with a _much_ weaker engine...

I understand that your main goal is SF's useability in interactive analysis, but be prepared that this "increased useability" is most probably (with only a "hackwork") achievable only at the cost of rather serious ELO decrease.

just my 2c

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Mon Apr 18, 2011 5:06 am

Uly wrote:
hyatt wrote:it does mean you are not going to resolve the score this iteration.
Why not? Other engines just restart the iteration, and resolve the fail low, and get a true score for the move that failed high-then low (though it's something that happens extremely rare anyway, I would have problems constructing a position that shows it, with Stockfish it was trivial), is this something Stockfish specific?

That is not always possible. The easiest case to understand is this,

Suppose your opponent takes forever to make a move, and you complete a 30 ply search while waiting.

He makes a different move and you start searching. When you get to depth 25 (or whatever) you find a hash entry left from that 30 ply search, that eventually causes you to fail high at the root. You open the window, which invalidates that hash entry, and you fail low. And you will _never_ get a true score for that move that reflects the fail high, until you get to depth=30 where you stored that hash entry in the first place.

Old hash entries with deep draft can cause this to happen easily. It can happen in other ways as well, but this is probably the easiest one to understand. You can't get a true score for every fail high (nor fail lows) because of this very reason...

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Mon Apr 18, 2011 10:46 am

hyatt wrote:When you get to depth 25 (or whatever) you find a hash entry left from that 30 ply search, that eventually causes you to fail high at the root. You open the window, which invalidates that hash entry, and you fail low. And you will _never_ get a true score for that move that reflects the fail high, until you get to depth=30 where you stored that hash entry in the first place.
I wouldn't expect a true score that reflects the fail high, but a true score that reflects the fail low after the fail high. For what I've seen in these cases is, that if Stockfish would not abandon the move, it would fail high again, or at least have a better score than the move Stockfish is currently going back to.

I am confused by this part:

>In at least your position, there is no fail-low to resolve: the search that fails low is a widened-window search to decide the value of the move that failed high.

~UncombedCoconut

So a5 can't get a true score at the same depth it fails high? Why can it do it at the same relative depth when the user forces the move?

A move at, say, depth 19 should have the same score as when you force it and reach depth 18. Conversely, a move that has a true score at depth 18 should have the same score at depth 19 if you backtrack the move.

What we have here:
18/21	 0:12 	-0.22 	2.Nf5 a4 3.h4 Bb4 4.Qg4 axb3 5.cxb3 Nc5 6.h5 Rh8 7.Rf2 Qd7 8.g6 hxg6 9.Qxg6 Kc8 10.h6 Nxd3 11.Rxd3 Kb8 12.Rg3 (9.501.819) 750
Is Stockfish giving a true score of -0.22 to the relative depth 19 of 1...a5, so, I don't really understand why can't Stockfish give that true score to a5 at depth 19, one move before, without needing the user to force the move.

UncombedCoconut
Posts: 44
Joined: Thu Jun 10, 2010 1:43 am
Real Name: Justin Blanchard
Location: United States

Re: Strange Stockfish behavior?

Post by UncombedCoconut » Tue Apr 19, 2011 12:49 am

Uly wrote:I wouldn't expect a true score that reflects the fail high, but a true score that reflects the fail low after the fail high. For what I've seen in these cases is, that if Stockfish would not abandon the move, it would fail high again, or at least have a better score than the move Stockfish is currently going back to.

I am confused by this part:

>In at least your position, there is no fail-low to resolve: the search that fails low is a widened-window search to decide the value of the move that failed high.

~UncombedCoconut

So a5 can't get a true score at the same depth it fails high? Why can it do it at the same relative depth when the user forces the move?
OK, let's make sure we're on the same page w.r.t. what SF is doing. I modified it to show the process in more detail in the search log. (Scores are from Black's PoV.)
It will also show the score and line that "reflects the fail low after the fail high", although it won't try to resolve the upper-bound to an exact score.

Code: Select all

Searching: r2k1r2/pp1n1q1p/1npP4/4p1P1/4P3/1PbB2NP/P1P1Q1RK/3R4 b - -
infinite: 0 ponder: 0 time: 0 increment: 0 moves to go: 0
 1     +0.35   00:01      152 Ke8 
 2     -0.27   00:01      196 Ke8 Nf5 
 2     +0.34   00:01      392 Qf3 Rf1 Qxe2 Rxf8+ Nxf8 Nxe2 
 3     -0.14   00:01      705 Qf3 Nf5 Qxe2 Bxe2 
 4     -0.12   00:01     1339 Qf3 Nf5 Qxe2 Bxe2 Nd5 
 5     +0.10   00:01     2080 Qf3 Nf5 Qxe2 Bxe2 Nc5 Bf3 
 6     +0.20   00:01     3717 Qf3 Qxf3 Rxf3 Nf5 Rf4 Kg3 Ke8 
 7  >  +0.44   00:01     6469 Qf3 Qxf3 Rxf3 Nf5 Nc5 Be2 Rf4 
Move has failed high: re-searching with window [  -0.04,   +0.69]
 7     +0.34   00:02     9714 Qf3 a3 Qxe2 Bxe2 Bb2 a4 Nc5 a5 
 7  >  +0.69   00:02    13266 Bb4 Rf1 Qe6 Rxf8+ Nxf8 
Move has failed high: re-searching with window [  +0.34,   +1.17]
Move's score resolved to a bad one:
 7  <  +0.33   00:02    15736 Bb4 Nf5 Qg6 Bc4 
Reverting to previous best move, with a higher score:
 7     +0.34   00:02    15736 Qf3 a3 Qxe2 Bxe2 Bb2 a4 Nc5 a5 
 8  >  +0.57   00:02    22583 Qf3 a3 Qxe2 Bxe2 a5 Nf5 Nc5 
Move has failed high: re-searching with window [  +0.12,   +0.79]
 8     +0.49   00:02    23804 Qf3 a3 Qxe2 Bxe2 a5 Bg4 a4 Ne2 Bd4 
 9     +0.39   00:02    37111 Qf3 a4 Qxe2 Bxe2 Nc5 Bg4 Ke8 Bf5 Rg8 Ne2 
10     +0.29   00:02    52604 Qf3 Qxf3 Rxf3 Be2 Rf4 Bg4 Bb4 Bf5 Nc8 Nh5 Rf3 Bg4 
11  <  +0.11   00:03    69919 Qf3 Qxf3 Rxf3 Be2 Rf4 Bg4 Bb4 Bf5 Nc8 Nh5 Rf3 Nf6 
                              Nxf6 gxf6 Nxd6 Rg8+ 
11     +0.11   00:03    98993 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Nh5 a6 Ng7 Nc5 Bf5 
                              Rf7 
12     +0.07   00:04   123997 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Nc8 c3 Bc5 b4 
                              Bxd6 Bxh7 
13     +0.00   00:05   224380 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
14     +0.00   00:06   287088 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
15     +0.00   00:07   426962 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
16     +0.00   00:11   833395 Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
17     +0.00   00:16    1493K Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
18     +0.00   00:23    2550K Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
19     +0.00   00:33    3886K Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 
19  >  +0.11   01:00    7460K a5 Nf5 a4 Qe3 axb3 cxb3 Bb4 Rc2 Qe6 Be2 Nd5 
Move has failed high: re-searching with window [  +0.00,   +0.22]
19  >  +0.22   01:14    9357K a5 Nf5 a4 Qe3 axb3 cxb3 Bb4 Rc2 Qe6 Be2 Nd5 
Move has failed high: re-searching with window [  +0.00,   +0.44]
Move's score resolved to a bad one:
19  <  +0.00   02:37   18990K a5 Nf5 a4 h4 axb3 axb3 Bb4 h5 Nc5 Qg4 Nxd3 Rxd3 
                              Kc8 g6 hxg6 hxg6 
Reverting to previous best move, with a higher score:
19     +0.00   02:37   18990K Qf3 Qxf3 Rxf3 Be2 Rf8 Bg4 Bb4 Bf5 Rf7 Be6 Rf8 Bf5 

Nodes: 19001735
Nodes/second: 120156
Best move: Qf3
Ponder move: Qxf3
Comments:
We have one search that says Bb4 is >=+0.69 and a later search that says Bb4 is <=+0.33, and we need to decide what to do with this contradictory information.

Well, we could ask SF to try to resolve it with another search, which sounds like what you want. What window should we use? Lowering alpha would be wasteful. We could try [+0.34, +0.69] as a tie-breaker. But I'd expect that to be biased toward the most recent search, since we did nothing but shrink the window. So most likely we fail low, and still abandon the move (for this iteration at least).

Maybe I'm misunderstanding, and what you mean is that the position's score should be <=+0.33 (TBD) and the best move at depth 7 is Bb4. That sounds fishy. To what score do we compare the remaining moves? What if another move fails high?

I think all programmers here would rather choose a priori which bound to believe, and not only for reasons of speed. Bob persuasively argues for trusting the earlier lower bound. Joona considers it a waste of information to neglect the most recent search. (To put it in your terms, the recent search really was a resolution of the true score of the move: it answered "worse than Qf3".) If Marco's tests showed an ELO change within error bars, we'd know which way is better most of the time.

(Another comment: SF at 120Kn/s is not fun; never leave a laptop unlocked anywhere outside your house. :()
Uly wrote: A move at, say, depth 19 should have the same score as when you force it and reach depth 18.
This is only true in simple programs. SF will have different hash entries, and it can search a different tree depending on the aspiration window.
Uly wrote: Conversely, a move that has a true score at depth 18 should have the same score at depth 19 if you backtrack the move.
I think you're right, for as long as the "exact" TT entry survives.

Post Reply