value of LMR and null-move

hyatt · Post by **hyatt** » Tue Jul 13, 2010 5:36 pm

mcostalba wrote:
Rebel wrote: BTW, I made the changes as you suggested. Reading Bob I get the impression he made different ones?
I have to thank Bob for what he has done so far for testing 1.8 on his cluster, really nice thing.

But to have serious tests I would ask Bob to following this guidelines:

1) Post the exactly applied patch that removed LMR so that I can verify is correct

2) Post used TC. I cannot speak for Crafty, but for SF almost surely long TC helps LMR. So I suggest 30" per game and even 1' per game to see how LMR scales with TC.

3) Limit the test to 1.8 vs 1.8-no-LMR, gauntlet in this case could be thrown in a second step if needed (and I think is not for this particualr LMR scalability test).

Anhyhow thanks for testing !
Marco

Here you go:

Code: Select all

at line 886
                    if (  0 &&  depth >= 3 * OnePly
                        && !dangerous
                        && !captureOrPromotion
                        && !move_is_castle(move))
                    {

at line 1354:
          if ( 0 &&   depth >= 3 * OnePly
              && !captureOrPromotion
              && !dangerous
              && !move_is_castle(move)
              && !move_is_killer(move, ss))
          {

and at line 1731:
      if ( 0 &&  !captureOrPromotion
          && !dangerous
          && !move_is_castle(move)
          && !move_is_killer(move, ss))
      {

since the 0 && will never be true, the three ifs are always false, and the optimizer just tosses the entire block out which offers a tiny speed gain since that dead code never gets fetched since it doesn't exist.

The time control was 1min+1sec. I didn't look carefully, but typical game was about 3 minutes or so, with significant variance since some go way beyond 100 moves, some are done by move 40.

Easy to eliminate the extra games, but it provides less accuracy since the gauntlet programs have nothing to connect their ratings.

Rebel · Post by **Rebel** » Tue Jul 13, 2010 5:57 pm

hyatt wrote:My time control was 1+1 on 3.2ghz 64-bit intel hardware. If you are getting more than 50, with just LMR removed, I'd look at the methodology you are using first. I _know_ that my changes to search.cpp did nothing but disable LMR, and that it did disable it at all 3 points where it is done. Why you would use 4 cores is beyond me.

Because using 4 cores in general will speed-up a factor of 3, thus actually they are playing 15min blitz games, a decent TC.

BTW 17:3 doesn't tell "anything".

It's currently 24/35/4 (66%) for the LMR version.

Impossible to give an exact elo improvement, except that it is above 50

Ed

Sentinel · Post by **Sentinel** » Tue Jul 13, 2010 6:10 pm

hyatt wrote:
Sentinel wrote:Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo

Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
The first question is "how many games"? As far as t/c goes, mine was 1min+1sec.

2000 or exactly 10 elo error bar, except for 4'+4'' where it was only 500 games so far.

hyatt · Post by **hyatt** » Tue Jul 13, 2010 6:26 pm

Sentinel wrote:
hyatt wrote:
Sentinel wrote:Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo

Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
The first question is "how many games"? As far as t/c goes, mine was 1min+1sec.
2000 or exactly 10 elo error bar, except for 4'+4'' where it was only 500 games so far.

How many games can you play at once? 4+4 is about 5 games per hour when I test (per processor). About 400 CPU hours to play 2000 games...

Oh, wait. I just noticed you are playing ip vs ip. As previously mentioned, I don't like that test as it exaggerates the ratings between the two programs, because they are identical except for that one change. The difference when using a gauntlet is generally lower. If we are going to compare results, we all need to run the same test. If everyone prefers A vs A' I can run that much faster, but am convinced that the answers are exaggerated...

nepossiver · Post by **nepossiver** » Tue Jul 13, 2010 7:33 pm

Rebel wrote: Because using 4 cores in general will speed-up a factor of 3, thus actually they are playing 15min blitz games, a decent TC.

I guess if you run 4 15 min games simultaneously, you would be testing faster than with your current set-up...

hyatt · Post by **hyatt** » Tue Jul 13, 2010 7:57 pm

This is simply classic "pseudo-science". Everyone is trying to measure the same thing, using completely different testing methodologies. And then we have discussions about how different the results are, ignoring how different the tests are as well...

I've been doing "my style" of testing for a couple of years. Always new and old versions vs a common set of opponents, using the same starting positions, for enough games to get the error bar down to the +/-3 range. What we have at present is way more noise than signal...

hyatt · Post by **hyatt** » Tue Jul 13, 2010 7:59 pm

Sentinel wrote:
mcostalba wrote:You may want to try with longer TC, LMR really kicks in at deep searches.

It would be interesting how much it gains going from 10"+0.1 to some thing like 30"+0.1, you just need to test 1.8 vs 1.8a because we have already seen that more or less is comparable to gauntlet result.
Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo

Now it's starting to really take a lot of time to test, but I wonder when does it saturate...

Another important question. Exactly how did you disable LMR? By hand, as I have done, or by using some UCI parameter that may well affect other things at the same time???

Sentinel · Post by **Sentinel** » Tue Jul 13, 2010 10:24 pm

hyatt wrote:
Sentinel wrote:
mcostalba wrote:You may want to try with longer TC, LMR really kicks in at deep searches.

It would be interesting how much it gains going from 10"+0.1 to some thing like 30"+0.1, you just need to test 1.8 vs 1.8a because we have already seen that more or less is comparable to gauntlet result.
Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo

Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
Another important question. Exactly how did you disable LMR? By hand, as I have done, or by using some UCI parameter that may well affect other things at the same time???

By hand. For LMR I applied what I explained here:
http://www.open-chess.org/viewtopic.php?p=3737#p3737
For null move I added (0 && ...) in all null move conditions. For example in all_node.c:

Code: Select all

	if (0 && DYNAMIC->value >= VALUE && MY_NULL_OK)

I agree with you that self-testing (A vs. A') exaggerates the difference. Moreover, it sometime even makes it smaller that it really is. What I believe is that direction of a change is right in 99.9% of the cases, and as long as the direction is good, I have nothing to worry about.
I would like to run test against full gauntlets (and I have a small gauntlet of SF, Rybka and current version which I use for critical testing), but I don't have enough processor time (I have 10 cores at my disposal, out of which only 6 I can always effectively use).

hyatt · Post by **hyatt** » Wed Jul 14, 2010 1:12 am

Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?

My tests have been only without LMR (for stockfish)...

Sentinel · Post by **Sentinel** » Wed Jul 14, 2010 1:20 am

hyatt wrote:Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?

My tests have been only without LMR (for stockfish)...

Both LMR and null move (but not futility pruning).
So it's a combined effect and that's the reason the difference is even higher on larger depths.
The bottom line is at 1'+1'' they should bring around 200elo combined in most of today's top programs (SF, Ippo, Rybka, even Crafty 23.3).

I will test only LMR once 4'+4'' test is finished (in 3 days time).

OpenChess

OpenChess

value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move

Re: value of LMR and null-move