Strange Stockfish behavior?

Code, algorithms, languages, construction...
Post Reply
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Fri Apr 01, 2011 10:16 pm

zamar wrote:
Then as I said, one should simply take each fail high position in the tree, and re-search with a relaxed beta bound, to see how many fail low the second time around. Then you _know_ how incorrect this assumption is.
Usually there is just no reason to do this. Root is an exception. We need to know the exact score of the move, so we need to resolve the fail-high and sometimes it just happens that while resolving we get fail-low. And you suggest that we should just ignore that completely free extra information that move could be very bad and just trust on previous fail-high. Sound very bogus for me...
Sorry. If it sounds bogus, you simply don't understand what is going on... Handle it however you like. It is not a huge elo gain, but it will lose you an extra game here and there when you miss a fail high that you should have found... Far be it from me to tell you how to write your code, I am just telling you that what you are doing is theoretically (and practically) unsound...


If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Of course fail-high is very good in majority of cases. Like having two aces is very good thing in Poker _in_majority_of_cases. But fail-low (after fail-high) is a signal that your opponent might have three kings
Or, more likely, an indication that a fail-high TT entry (that helped you find the fail high) is no longer valid after you change the beta bound, and your search can't see deeply enough to find the better move. Or that during the re-search, you overwrote something that would have helped you get a real score, had you kept it in the TT.

Etc...


This is yet another of those ideas that says you should treat the root differently from the rest of the tree.
We must resolve fail-highs at root, even Crafty does that. Resolving gives us extra information. For optimum decision, you want to use all available information.
My point exactly. The fail high is information you are throwing away.
To which I would always answer "why?" And I always wait for a sound theoretical explanation, or a sound experimental explanation. And all I usually get is "sound" (as in noise). :)
beep beep beep

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Fri Apr 01, 2011 10:22 pm

mcostalba wrote:
If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Ok, here is the error in the argumentation.

You are talking of the general case, but you should focus on the small subset of cases where a fail high is strictly followed by a fail low. In the majority of cases fails high are good...simply because there is no fail low following ;-)
That is so badly flawed it is almost not worth commenting on, but here goes, anyway. Why don't you get fail lows at other fail highs? Because you stop the search immediately. At the root there is absolutely no requirement you resolve the score on a fail high. Belle worked like that, as did early versions of Cray Blitz when PVS was first mentioned, starting around 1978.

So you don't test fail highs anywhere else, except at the root, and you conclude that _only_ for the root, where you do test the fail-high with a re-search and it fails low, you should ignore it? :)

First, try the experiment. I did, already. "test" those search fail highs that are not at the root, just for fun, and count how many fail low vs how many produce a good score. Yes it will be way slower. But then you get _real_ data and you will see that some of those fail highs will _also_ fail low. You just don't normally notice because you don't do the re-search except at the root.

Again, why is the root different? It is a flawed idea...


Please let me add that approaching a case with the gross hammer of general principles makes you blind to details that are instead what really matters, what makes the difference between an idea that works and an useless one.
I agree 100%. But I am not blind to the details. I have actually _tested_ this, as I previously mentioned, tracking down an interesting and subtle bug years ago...

mcostalba
Posts: 91
Joined: Thu Jun 10, 2010 11:45 pm
Real Name: Marco Costalba

Re: Strange Stockfish behavior?

Post by mcostalba » Sat Apr 02, 2011 9:32 am

hyatt wrote:
mcostalba wrote:
If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Ok, here is the error in the argumentation.

You are talking of the general case, but you should focus on the small subset of cases where a fail high is strictly followed by a fail low. In the majority of cases fails high are good...simply because there is no fail low following ;-)
That is so badly flawed it is almost not worth commenting on, but here goes, anyway. Why don't you get fail lows at other fail highs? Because you stop the search immediately.
I never mentioned other nodes apart from root. Did I ? You are introducing a new stuff that just blurs the discussion. Let's keep focused: we are talking of root nodes.

I have said that most of fail high in root nodes do not experience a strictly following fail low. So your statement:"fails high are good more then not" should be tested not against all the fail high that occurs in root nodes but only in the subset of fail high (at root node) strictly followed by a fail low.

I think this time you really cannot escape ;-) but actually you had already surprised me in the past many times with an incredible ability to steer the argumentations where it is more convenient for you :-)

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Sat Apr 02, 2011 4:30 pm

Did you not say "we only do this at the root?" That certainly, by direct implication, mentions "non-root moves".

If you read what I wrote, I was implicitly addressinng fail-highs followed by fail-lows since that is the topic of the thread.

What I suggested doing was to expand the test into the tree. At every fail high, rather than terminating the search, test to see how often you get a fail-low if you increase beta and re-search. It will happen. No more frequently than when you get a fail-high/fail-low at the root, but no less either. yet you take that fail-high inside the search every time it happens, no questions asked, no verification to determine if it will fail low. Why inside the tree but not at the root?

If you have a good way to actually measure the accuracy of the fail-high/fail-low cases, by doing something _really_ deep on the original and the fail-high move to see which is better, then run it. I don't see any theoretically sound way of doing that.

Again, if you fail high and are wrong, you have a bug, plain and simple. And that bug is affecting you throughout the tree, not just at the root. It just so happens that you _see_ it happening at the root.

I don't know of any other program that ignores a fail-high at the root, regardless of what happens on the re-search. There's a reason...

There is nothing to "escape" from here. If you want to use an unsound idea, that's your choice. Others have told you this is broken by observation, I have told you it is broken because I actually tested it a few years back chasing just such a bug that was called by a subtle bug in null-move searches. Run whatever tests you want. But at least _do_ run some tests. And not on a few positions. I suppose I could run the test myself, again, but doing just what you do. But I already know the result, although I don't remember the exact number in terms of Elo. It is small. It is non-zero. Take it or leave it...

mcostalba
Posts: 91
Joined: Thu Jun 10, 2010 11:45 pm
Real Name: Marco Costalba

Re: Strange Stockfish behavior?

Post by mcostalba » Sat Apr 02, 2011 7:51 pm

hyatt wrote: If you have a good way to actually measure the accuracy of the fail-high/fail-low cases, by doing something _really_ deep on the original and the fail-high move to see which is better, then run it. I don't see any theoretically sound way of doing that.

Again, if you fail high and are wrong, you have a bug, plain and simple. And that bug is affecting you throughout the tree, not just at the root. It just so happens that you _see_ it happening at the root.
If we accept that a fail high is good, or at least, it has to be considered good, then why do we research in aspiration window after a fail high ? We should just research only after a fail low, instead after a fail high we could just move to the next iteration....but it is not like this that aspiration window works.

BTW I _have_ tested, please see some post before, always in this thread.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Sun Apr 03, 2011 5:38 am

Again, and I do mean "again". There is _nothing_ that says you must re-search after the first fail high at the root. Original PVS as implemented in cray-blitz/blitz/belle did not re-search unless a _second_ move failed high. Now you have two moves and you have to research them to resolve which is best. Simple approach was to re-search first fail-high move to get a real score, then re-search second fail-high move on a null-window. If it fails low, the first fail-high move was better. If the second fails high, it is better although you don't know by how much. You then wait until you get _another_ fail-high move before you try to resolve the second, and you repeat this. the last fail-high move was never resolved...

The only problem is there is no PV. And unless you have some tricks up your sleeve (I try the PV, if there is none, which can happen on the fail-high/fail-low case, I try the hash move, if there is none, I do a short search for the opponent's side to find a move to ponder. Else you sit and wait.

There are other possible benefits to resolving the fail-high when you get it. At the next iteration, the move might legitimately fail low due to a deeper tactic. It is nice to know, when a move fails low, how much the score drops, so that you have some idea of how much extra time you are willing to commit to finding a better move...

Your last comment makes no sense (after the fail-high we could just go to the next iteration) unless you mean what I explained above. On a single fail-high, you are absolutely correct that you don't have to re-search if you don't want to. In Cray Blitz, if the first move failed high, and then failed low, I just kept the old PV from the previous search. If a later move failed high and then failed low, I just could not use the PV to order the next iteration since I didn't have one. I eventually changed to the "resolve at the fail-high" for several reasons..

(1) you just failed high. The TT is loaded with move ordering information from that search. Doing it again _now_ gives us the best chance to maximize move ordering accuracy before those entries get overwritten by searching other moves. (2) I always want a score, so that when I fail low on the first move at the next iteration, I need to know how far the score dropped to adjust how much time I was willing to commit to finding a better move; (3) I want a good move to ponder. (4) I want a good PV to order the next iteration...

Whether those are all worth the overhead is not something I have accurate data on. If you don't re-search, you save a lot of search space. But then you have less accurate ordering info for the next iteration (no PV), nor do you have an accurate idea of the actual score so that an aspiration window will be as high as possible...

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 1:03 am

mcostalba wrote:After 11068 games 1884 - 1934 - 7250 ELO -1 (+- 3.9)
When I created this thread, I didn't have elo in mind, but improving analysis friendliness like in the other thread.

Currently:

IF - A move fails high, and then fails low, but it is best, Stockfish will not find it unless the user forces the move.

This causes the user to either miss the move or have to interact, if the solution costs 1 elo then it's very reasonable! It would be a shame if a future version of Stockfish didn't show the fail high until it stood up, as then the user has nothing to force, and miss the move.

Anyway, I'm back, I'm going to try to find a position where the issue is reproducible at 1CPU and compare it with the version Jeremy sent me.

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 1:07 am

BB+ wrote:Is this a fair description?
1. At depth 20, move X is best with score 0.55.
2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35.
3. Move Y then fails high with score above 0.75. The aspiration window is then changed to [0.45,1.05] or something.
4. Move Y now "fails low" with a score less than 0.45, due to search instability, hash interaction, or other assorted general nonsense.
5. And all the other moves X,W,V,U,... also score less than 0.45, so there is a general "fail low" at the root node.
The part "2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35" is not necessary, I've seen instances when a move fails high and then is abandoned, even if the main move didn't fail low, and the abandoned move is best at same relative depth when forced (though this is really rare, I think I've seen 2 cases...)

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 2:31 am

Turns out constructing a position that shows the problem in all its infamy isn't that hard:

r2k1r2/pp1n1q1p/1npP4/4p1P1/4P3/1PbB2NP/P1P1Q1RK/3R4 b - -

(Black to move)

Gran2k at 1CPU (so the result is reproducible):
.14/12	 0:00 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (287.084) 680
 15/12	 0:00 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (426.958) 718
 16/12	 0:01 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (833.391) 750
 17/12	 0:01 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (1.493.995) 758
 18/12	 0:03 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (2.550.949) 769
 19/12	 0:04 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (3.886.852) 779
 19/11	 0:09 	-0.11++	1...a5 2.Nf5 a4 3.Qe3 axb3 4.cxb3 Bb4 5.Rc2 Qe6 6.Be2 Nd5 (7.460.300) 763
 19/11	 0:12 	-0.22++	1...a5 2.Nf5 a4 3.Qe3 axb3 4.cxb3 Bb4 5.Rc2 Qe6 6.Be2 Nd5 (9.357.355) 761
 20/12	 0:26 	 0.00 	1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Bb4 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 (20.495.558) 763
a5 pop ups with a fail high at depth 19, a double fail high at that, even though Qf3 didn't fail low, a5 is abandoned.

Now, the user forces a5:
.14/17	 0:01 	-0.20 	2.Nf5 a4 3.Qf2 axb3 4.axb3 Kc8 5.Be2 Ra2 6.Bg4 Kb8 7.Qg3 Bb2 8.Qe3 Ba3 9.Qc3 Qg6 10.Be2 (1.095.226) 730
 15/19	 0:02 	-0.26 	2.Nf5 a4 3.Qf2 Kc8 4.Be2 Kb8 5.Qg3 Bb2 6.Rb1 axb3 7.axb3 Ra2 8.Rf1 Qe6 9.Qe3 Ba3 10.Qd3 Bb4 11.c3 (1.625.542) 732
 16/19	 0:03 	-0.26 	2.Nf5 a4 3.Qf2 Kc8 4.Be2 Kb8 5.Qg3 Bb2 6.Rb1 axb3 7.axb3 Ra2 8.Rf1 Qe6 9.Qe3 Ba3 10.Qd3 Bb4 11.c3 (2.419.554) 740
 16/23	 0:04 	-0.08++	2.a4 Qf3 3.Qxf3 Rxf3 4.Be2 Re3 5.Rf2 Nc5 6.Rf8+ Kd7 7.Rf7+ Ke6 8.Rxh7 Nxe4 9.Rh6+ Kf7 10.Bh5+ Kg8 11.Nxe4 Rxe4 12.Rg6+ Kf7 13.Rf6+ (3.153.167) 741
 17/19	 0:09 	-0.35 	2.Nf5 a4 3.h4 Bb4 4.Qg4 axb3 5.cxb3 Nc5 6.h5 Rh8 7.Rf2 Qd7 8.g6 hxg6 9.Qxg6 Kc8 10.h6 Nxd3 11.Rxd3 (6.957.996) 743
 18/15	 0:11 	-0.23++	2.Nf5 a4 3.h4 Bb4 4.Qg4 axb3 5.cxb3 Nc5 6.h5 Rh8 7.Rf2 Nxd3 8.Rxd3 Bc5 9.Rc2 (8.699.313) 747
 18/21	 0:12 	-0.22 	2.Nf5 a4 3.h4 Bb4 4.Qg4 axb3 5.cxb3 Nc5 6.h5 Rh8 7.Rf2 Qd7 8.g6 hxg6 9.Qxg6 Kc8 10.h6 Nxd3 11.Rxd3 Kb8 12.Rg3 (9.501.819) 750
 19/24	 0:14 	-0.26 	2.Nf5 a4 3.h4 Bb4 4.Qg4 axb3 5.cxb3 Nc5 6.h5 Rh8 7.Rf2 Qd7 8.g6 hxg6 9.Qxg6 Kc8 10.h6 Qh7 11.Qf6 Nbd7 12.Qe7 Qg6 13.Qh4 Kb8 (10.874.309) 750
Clearly better at same relative depth.

The problem is that the move that is failing high is never given a real score (because of the presumed fail low that follows), the idea is that if it was resolved until its true score was known, it would be know that is really -0.22 at depth 19.

(Ironically, with Stockfish 2.0.1 a5 does not pop up at depth 19, but it does at depth 20, is resolved and Stockfish sticks with it, which is irrelevant as 2.0.1 is unusable for analysis due to bad granularity and hash management...)

ernest
Posts: 247
Joined: Thu Sep 02, 2010 10:33 am

Re: Strange Stockfish behavior?

Post by ernest » Sun Apr 10, 2011 12:18 pm

Uly wrote:Gran2k at 1CPU (so the result is reproducible)
What you get also depends on the Hash
With 512KB:

Code: Select all

r2k1r2/pp1n1q1p/1npP4/4p1P1/4P3/1PbB2NP/P1P1Q1RK/3R4 b - - 0 1
Analysis by Stockfish 2.0.1 PA GTB Gran2k:
1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Nc8 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 
  =  (0.00)   Depth: 19/12   00:00:07  5615kN
1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Nc8 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 
  =  (0.00)   Depth: 20/12   00:00:08  6894kN
1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Nc8 5.Bf5 Rf7 6.Be6 Rf8 7.Bf5 
  =  (0.00)   Depth: 21/12   00:00:12  9782kN
1...Qf3 2.Qxf3 Rxf3 3.Be2 Rf8 4.Bg4 Nc8 5.Nh5 Bd4 6.c3 Bxc3 7.Ng7 Nc5 8.Rc2 Nxd6 9.Rxd6+ Ke7 10.Nf5+ 
  =  (0.11 --)   Depth: 22/18   00:00:25  21329kN
1...Bb4 2.Rf1 Qg6 3.Rxf8+ Nxf8 
  =  (-0.11 !)   Depth: 22/18   00:00:42  33877kN
1...Bb4 2.Nf5 a5 3.Qg4 a4 4.h4 axb3 5.axb3 Kc8 6.h5 Rh8 7.Kg3 Qe8 8.g6 hxg6 9.Qxg6 Qf8 10.h6 Nf6 11.Rf1 Nbd7 12.Rgf2 Bxd6 13.Nxd6+ Qxd6 14.Rxf6 Nxf6 15.Rxf6 
  =  (-0.19)   Depth: 22/28   00:00:53  42908kN
1...Bb4 2.Nf5 a5 3.a4 Qg6 4.h4 Bxd6 5.Bc4 Bc5 6.h5 Qe8 7.Ng7 Qe7 8.Ne6+ Ke8 9.Nc7+ Kd8 10.Ne6+ 
  =  (0.00)   Depth: 23/18   00:02:10  102mN
1...a5 2.a4 Qf3 3.Qxf3 Rxf3 4.Be2 Rf8 5.Bg4 Nc8 6.Bf5 Rh8 7.Nh5 Bb4 8.Nf6 Nxf6 9.gxf6 Nxd6 10.c3 Bc5 11.Rg7 Ke8 12.Re7+ Kf8 13.Rxe5 Be3 14.Kg3 Rg8+ 15.Bg4 
  =  (-0.16)   Depth: 23/28   00:02:46  130mN
1...a5 2.a4 Qf3 3.Qxf3 Rxf3 4.Be2 Rf8 5.Bg4 Nc8 6.Bf5 Rh8 7.Nh5 Bb4 8.Nf6 Nxf6 9.gxf6 Nxd6 10.Rg7 Ke8 11.Bxh7 Kf8 12.Kg2 b5 13.Kf3 bxa4 14.bxa4 Re8 15.Kg4 Nc4 
  =  (-0.21)   Depth: 24/29   00:03:03  144mN
1...a5 2.a4 Qf3 3.Qxf3 Rxf3 4.Be2 Rf8 5.Bg4 Nc8 6.Nh5 Bd4 7.c3 
  =  (-0.10 --)   Depth: 25/12   00:03:34  169mN
1...a5 2.a4 Qf3 3.Qxf3 Rxf3 4.Be2 Rf8 5.Bg4 Rg8 6.h4 Bb4 7.Nf5 Rf8 8.Kh3 Nc5 9.Ng3 Ncd7 10.Bf5 
  =  (0.00)   Depth: 25/18   00:03:59  188mN
1...a5 2.a4 Kc8 3.Qg4 Kb8 4.Be2 Ka7 5.Rf1 Qg6 
  =/+  (-0.27 !)   Depth: 26/9   00:06:00  284mN
[/size]

Post Reply