General discussion about computer chess...
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Tue Jul 13, 2010 5:36 pm
mcostalba wrote:Rebel wrote:
BTW, I made the changes as you suggested. Reading Bob I get the impression he made different ones?
I have to thank Bob for what he has done so far for testing 1.8 on his cluster, really nice thing.
But to have serious tests I would ask Bob to following this guidelines:
1) Post the exactly applied patch that removed LMR so that I can verify is correct
2) Post used TC. I cannot speak for Crafty, but for SF almost surely long TC helps LMR. So I suggest 30" per game and even 1' per game to see how LMR scales with TC.
3) Limit the test to 1.8 vs 1.8-no-LMR, gauntlet in this case could be thrown in a second step if needed (and I think is not for this particualr LMR scalability test).
Anhyhow thanks for testing !
Marco
Here you go:
Code: Select all
at line 886
if ( 0 && depth >= 3 * OnePly
&& !dangerous
&& !captureOrPromotion
&& !move_is_castle(move))
{
at line 1354:
if ( 0 && depth >= 3 * OnePly
&& !captureOrPromotion
&& !dangerous
&& !move_is_castle(move)
&& !move_is_killer(move, ss))
{
and at line 1731:
if ( 0 && !captureOrPromotion
&& !dangerous
&& !move_is_castle(move)
&& !move_is_killer(move, ss))
{
since the 0 && will never be true, the three ifs are always false, and the optimizer just tosses the entire block out which offers a tiny speed gain since that dead code never gets fetched since it doesn't exist.
The time control was 1min+1sec. I didn't look carefully, but typical game was about 3 minutes or so, with significant variance since some go way beyond 100 moves, some are done by move 40.
Easy to eliminate the extra games, but it provides less accuracy since the gauntlet programs have nothing to connect their ratings.
-
Rebel
- Posts: 515
- Joined: Wed Jun 09, 2010 7:45 pm
- Real Name: Ed Schroder
Post
by Rebel » Tue Jul 13, 2010 5:57 pm
hyatt wrote:My time control was 1+1 on 3.2ghz 64-bit intel hardware. If you are getting more than 50, with just LMR removed, I'd look at the methodology you are using first. I _know_ that my changes to search.cpp did nothing but disable LMR, and that it did disable it at all 3 points where it is done. Why you would use 4 cores is beyond me.
Because using 4 cores in general will speed-up a factor of 3, thus actually they are playing 15min blitz games, a decent TC.
BTW 17:3 doesn't tell "anything".
It's currently 24/35/4 (66%) for the LMR version.
Impossible to give an exact elo improvement, except that it is above 50
Ed
-
Sentinel
- Posts: 122
- Joined: Thu Jun 10, 2010 12:49 am
- Real Name: Milos Stanisavljevic
Post
by Sentinel » Tue Jul 13, 2010 6:10 pm
hyatt wrote:Sentinel wrote:Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo
Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
The first question is "how many games"? As far as t/c goes, mine was 1min+1sec.
2000 or exactly 10 elo error bar, except for 4'+4'' where it was only 500 games so far.
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Tue Jul 13, 2010 6:26 pm
Sentinel wrote:hyatt wrote:Sentinel wrote:Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo
Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
The first question is "how many games"? As far as t/c goes, mine was 1min+1sec.
2000 or exactly 10 elo error bar, except for 4'+4'' where it was only 500 games so far.
How many games can you play at once? 4+4 is about 5 games per hour when I test (per processor). About 400 CPU hours to play 2000 games...
Oh, wait. I just noticed you are playing ip vs ip. As previously mentioned, I don't like that test as it exaggerates the ratings between the two programs, because they are identical except for that one change. The difference when using a gauntlet is generally lower. If we are going to compare results, we all need to run the same test. If everyone prefers A vs A' I can run that much faster, but am convinced that the answers are exaggerated...
-
nepossiver
- Posts: 3
- Joined: Tue Jul 13, 2010 3:31 am
Post
by nepossiver » Tue Jul 13, 2010 7:33 pm
Rebel wrote:
Because using 4 cores in general will speed-up a factor of 3, thus actually they are playing 15min blitz games, a decent TC.
I guess if you run 4 15 min games simultaneously, you would be testing faster than with your current set-up...
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Tue Jul 13, 2010 7:57 pm
This is simply classic "pseudo-science". Everyone is trying to measure the same thing, using completely different testing methodologies. And then we have discussions about how different the results are, ignoring how different the tests are as well...
I've been doing "my style" of testing for a couple of years. Always new and old versions vs a common set of opponents, using the same starting positions, for enough games to get the error bar down to the +/-3 range. What we have at present is way more noise than signal...
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Tue Jul 13, 2010 7:59 pm
Sentinel wrote:mcostalba wrote:You may want to try with longer TC, LMR really kicks in at deep searches.
It would be interesting how much it gains going from 10"+0.1 to some thing like 30"+0.1, you just need to test 1.8 vs 1.8a because we have already seen that more or less is comparable to gauntlet result.
Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo
Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
Another important question. Exactly how did you disable LMR? By hand, as I have done, or by using some UCI parameter that may well affect other things at the same time???
-
Sentinel
- Posts: 122
- Joined: Thu Jun 10, 2010 12:49 am
- Real Name: Milos Stanisavljevic
Post
by Sentinel » Tue Jul 13, 2010 10:24 pm
hyatt wrote:Sentinel wrote:mcostalba wrote:You may want to try with longer TC, LMR really kicks in at deep searches.
It would be interesting how much it gains going from 10"+0.1 to some thing like 30"+0.1, you just need to test 1.8 vs 1.8a because we have already seen that more or less is comparable to gauntlet result.
Bob doesn't see the difference with increasing TC in Crafty. I would say that's really strange.
For example for Ippo in my tests it's the following for default vs. no null move/LMR version (A vs A' testing):
6''+0.1'' 130elo
15''+0.25'' 146elo
1'+1'' 200elo
4'+4'' 242elo
Now it's starting to really take a lot of time to test, but I wonder when does it saturate...
Another important question. Exactly how did you disable LMR? By hand, as I have done, or by using some UCI parameter that may well affect other things at the same time???
By hand. For LMR I applied what I explained here:
http://www.open-chess.org/viewtopic.php?p=3737#p3737
For null move I added (0 && ...) in all null move conditions. For example in all_node.c:
Code: Select all
if (0 && DYNAMIC->value >= VALUE && MY_NULL_OK)
I agree with you that self-testing (A vs. A') exaggerates the difference. Moreover, it sometime even makes it smaller that it really is. What I believe is that direction of a change is right in 99.9% of the cases, and as long as the direction is good, I have nothing to worry about.
I would like to run test against full gauntlets (and I have a small gauntlet of SF, Rybka and current version which I use for critical testing), but I don't have enough processor time (I have 10 cores at my disposal, out of which only 6 I can always effectively use).
-
hyatt
- Posts: 1242
- Joined: Thu Jun 10, 2010 2:13 am
- Real Name: Bob Hyatt (Robert M. Hyatt)
- Location: University of Alabama at Birmingham
-
Contact:
Post
by hyatt » Wed Jul 14, 2010 1:12 am
Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?
My tests have been only without LMR (for stockfish)...
-
Sentinel
- Posts: 122
- Joined: Thu Jun 10, 2010 12:49 am
- Real Name: Milos Stanisavljevic
Post
by Sentinel » Wed Jul 14, 2010 1:20 am
hyatt wrote:Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?
My tests have been only without LMR (for stockfish)...
Both LMR and null move (but not futility pruning).
So it's a combined effect and that's the reason the difference is even higher on larger depths.
The bottom line is at 1'+1'' they should bring around 200elo combined in most of today's top programs (SF, Ippo, Rybka, even Crafty 23.3).
I will test only LMR once 4'+4'' test is finished (in 3 days time).