Making an opening book (TCEC and more)

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
Post Reply
BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Making an opening book (TCEC and more)

Post by BB+ » Wed Jan 12, 2011 5:31 am

The first task when making an opening book is to determine its purpose. :idea:
For instance, is the goal to produce "interesting" chess, or "relevant" chess, or to measure Elo somehow, or to beat your friends' books? I guess my goal is some combination of the top three.

Martin Thoresen asked me what I would do regarding openings in TCEC. This is partially in answer to that, though some of the concepts are a bit more general. For instance, if you asked me to create a suite of positions for engine testers to use, much of the answers would be the same [in particular, if an author "tunes" toward my suggestions, I'd likely think this was simply a reasonable way to test/develop an engine].

Here is my methodology. I stress the numbers don't matter much, though I realise that whatever numbers I throw out have a decent chance of those actually being used (due to laziness).

* Start with a PGN of "top-level" human games, perhaps meaning both players are 2600+ (or maybe 2500+, or both are 2500+ and at least one is 2600+). Only consider "serious" games, so get rid of blitz/blindfold/Armageddon.
* Find all positions that meet the following constraints (I suspect some database program could do this, though it is not trivial):
** Popularity: Position has occurred (say) 50-100 times
** Topical: Position has occurred at least 5-10 times in the last 5 years (so the line is not "busted", as it were)
** Playability: There is no "one move" that is greatly preferred, perhaps some constraint like no move is played more than 2/3 of the time, and/or the ratio from the most common to the second most is no worse than 3:1.
** Volatility: Draw percentage is between 45-65% (or maybe 40-60%, or maybe 50-70% -- depends on the Elo range you chose)
** Equality: White scores between 50-60% (maybe after adjusting for Elo)
* I have no idea how many positions this would be, but I suspect it will be at least a few hundred. One "solution" at this point is simply to randomly choose the desired number of positions.
* However, I would rather add on an additional layer (which is likely to be complicated): there should be some weighting so that the "overall popularity" of moves is somewhat like from the human games. For instance, the e4 versus d4 ratio should be reasonably close to the expected. It might not be that easy to do with various openings (for instance, the Petrov is too drawish (see Volatility), and others perhaps not enough so), but I think at least some attempt can be made here. After this additional "weighting", then the desired quantity of positions is selected randomly.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Making an opening book (TCEC and more)

Post by BB+ » Wed Jan 12, 2011 5:47 am

I'll just comment on the 3:1:0 system here (for lack of a better place):
#2: The scoring system will go back to the standard instead of 3-1-0
I personally consider this wise, though others will disagree. I might give some quotations from the "KC conferences" regarding 3:1:0.
korsar274: Hello! What do you think about the soccer system of counting points? Does it seriously increase the number of decisive games? Is there a future for it?
Shirov: Personally it bothers me. Draws have always been worth half a point, and now all of a sudden they are only worth a third. [If it’s so good,] why don’t they compute ratings according to the same system? :)
Eriksson: What do you think of the points system (win – 3 points, draw – 1 point)?
Grischuk:I don’t see anything wrong with it, but there’s nothing great about it either.

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Making an opening book (TCEC and more)

Post by BB+ » Wed Jan 12, 2011 7:27 am

Not to besmirch anyone (and this is a forum for discussion after all), but here's a list of played openings (24 of the 28 total so far) in the first Division, and what I think of them:
1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Bg5 Bd7 7. Qd2 Nxd4 8. Qxd4 Qa5 9. Qd2 Rc8 10. Bd3 Ng4 11. h3 Ne5 12. Be2 g6
No idea what the source game is for this Sicilian. Using ChessOK's Opening Tree mode, I only matched through move 9, with other choices rather than 9. Qd2 being reasonably popular. Looks fairly decent for a somewhat off-beat line.
1. e4 e6. 2. d4 d5 3. Nd2 c5 4. Ngf3 cxd4 5. Nxd4 Nf6 6. e5 Nfd7 7. Bb5 Qc7 8. Qh5 g6 9. Qe2 Nc6 10. N2f3 Nxd4 11. Nxd4 Bb4+ 12. c3 Bf8
This Tarrasch French was out of my book (or ChessOK's) at move 10, and was done to single-digits in games by move 7. White's 8th move Qh5 seems a bit rambunctious, while Black's Bb4+ has little point IMO.
1. Nf3 Nf6 2. c4 g6 3. d4 Bg7 4. Nc3 d6 5. e4 O-O 6. Be2 e5 7. O-O Nc6 8. d5 Ne7 9. Ne1 Nd7 10. Nd3 f5 11. Bd2 Nf6 12. f3 c5
This was a mainline KID until Black's last (f4 is played in over 1000 games). In general I don't like the "X moves" criterion, as theory is much deeper in some lines compared to others.
1. Nf3 Nf6 2. d4 d5 3. c4 e6 4. Nc3 Be7 5. Bg5 h6 6. Bh4 O-O 7. e3 b6 8. Be2 Bb7 9. Rc1 Nbd7 10. Bxf6 Nxf6 11. O-O dxc4 12. Bxc4 a6
White's ninth is not too popular (Bxf6), and after Nbd7 I can't find any games with Bxf6. It looks to me that White gave away the bishop pair with little to show for it.
1.c4 Nf6 2. d4 e6 3. Nf3 d5 4. Nc3 Be7 5. Bg5 h6 6. Bh4 O-O 7. e3 b6 8. Rc1 Bb7 9. cxd5 Nxd5 10. Bxe7 Qxe7 11. Bd3 Nxc3 12. Rxc3 Na6
A lot like the previous game. :) Up until Nxc3 this is fairly common. About the only comment I have is that Black has the statistical advantage.
1.e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be3 e5 7. Nf3 Be7 8. Bc4 O-O 9. a3 Nbd7 10. O-O b5 11. Bd5 Rb8 12. Ba2 Bb7
Not too easy to be out of book by move 10 in a Najdorf, but White's ninth seems to do the trick...
1. d4 Nf6 2. c4 g6 3. Nc3 c5 4. d5 e6 5. d6 Qb6 6. Nb5 Na6 7. Bf4 Ne4 8. Nf3 f5 9. Be5 Rg8 10. e3 Qa5+ 11. Nd2 Ng5 12. b4 cxb4
Already at move 5 this is hairy. I'm not sure any master (say 2200+) has ever gone down this line.
1. d4 Nf6 2. c4 b6 3. Nf3 e6 4. Nc3 Bb7 5. a3 d5 6. cxd5 Nxd5 7. Qc2 Nxc3 8. bxc3 c5 9. e4 Nd7 10. Bb5 a6 Bd3 Rc8 12. Qe2 b5
This is fairly topical QID line, with many 2700+ games for the first 9 moves. White's 10th is not most popular, but could be in vogue for all I know.
1. e4 c5 2. Nf3 e6 3. d4 cxd4 4. Nxd4 d6 5. Nc3 Nf6 6. g4 h6 7. h4 Be7 8. Qf3 Nc6 9. Bb5 Bd7 10. Bxc6 bxc6 11. g5 hxg5 12. hxg5 Rxh1+
This is a Keres Attack, and stays in common lines until about move 8 or 9, and then becomes unique by move 12. It seems to me that Black gets the edge from this.
1. c4 e5 2. Nc3 Nc6 3. g3 d6 4. Bg2 g6 5. e3 Bg7 6. Nge2 Nge7 7. O-O O-O 8. d3 Bg4 9. e4 Nd4 10. f3 Be6 11. Nxd4 exd4 12. Nd5 c6
An English Opening, and the position at move 8 is played a few hundred times, but White's ninth only has a computer game as a precursor. Seems playable, though (as usual) I think stretching the line to 12 moves is a bit much.
1. Nf3 d5 2. d4 Nf6 3. c4 c6 4. Nc3 e6 5. Bg5 Nbd7 6. cxd5 exd5 7. e3 Be7 8. Bd3 O-O 9. Qc2 h6 10. Bh4 Bd6 11. O-O Re8 12. Rfe1 Qa5
This a quite common Semi-Slav line, though usually Black plays Re8 at move 10 (almost 1000 games). White scores quite well in this line (over 60%).
1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. Nc3 c6 5. e3 Nbd7 6. Bd3 dxc4 7. Bxc4 b5 8. Bd3 Bb7 9. e4 b4 10. Na4 Be7 11. O-O O-O 12. Qc2 h6
Another very common Semi-Slav. White's 9th is played 1000 times, yet is less popular than castling. Black's 10th is almost always 10... c5 (~95% of the time), though there are still maybe 20-30 games that follow the route above.
1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. d4 exd4 5. Nxd4 Bb4 6. Nxc6 bxc6 7. Bd3 d5 8. exd5 Qe7+ 9. Qe2 Qxe2+ 10. Kxe2 cxd5 11. Bd2 O-O 12. Nb5 Bxd2
Almost looks White is playing for a draw against a higher-rated opponent. :-P Up to move 11 this is not all that rare (50 or so games), but White really needs to play 11. Nb5 -- I would go so far as to rate Bd2 as a blunder.
1. e4 d6 2. d4 Nf6 3. Nc3 e5 4. Nf3 Nbd7 5. Bc4 Be7 6. O-O O-O 7. Re1 c6 8. a4 a5 9. b3 exd4 10. Nxd4 Ne5 11. Bf4 Nxc4 12. bxc4 Bg4
Up to move 8, maybe even 10, there are some recent 2700+ games following this, but Black usually opts for Nb6. This looks reasonable, though again might follow "book" for a move too many.
1. Nf3 Nf6 2. d4 g6 3. c4 Bg7 4. g3 O-O 5. Bg2 d6 6. O-O Nbd7 7. Nc3 e5 8. e4 c6 9. dxe5 Nxe5 10. Nxe5 dxe5 11. Qe2 Be6 12. Rd1 Qa5
Fianchetto against a KID is quite common (~10000 games), but 9. dxe5 is not, and is not only more drawish than 9. h3 or 9. b3, but also switches the statistics to favour Black.
1. d4 Nf6 2. Nf3 e6 3. g3 b6 4. Bg2 Bb7 5. O-O Be7 6. c4 O-O 7. Nc3 Ne4 8. d5 Nxc3 9. bxc3 Na6 10. e4 Nc5 11. Nd4 Ba6 12. Qe2 Nb7
White's 8th move is quite obscure (no games from 2500+ players), while Qc2/Nxe4/Bd2 all have 1000+ games. Black has the statistical edge after this, though I think White should still have a slight edge. Black's 12... Nb7 might not be that astute, though in the game White did not play my preferred 13. e5 (or maybe 13. Qg4).
1. d4 Nf6 2. Bg5 e6 3. e4 h6 4. Bxf6 Qxf6 5. Nc3 d6 6. Qd2 g5 7. O-O-O Bg7 8. Bb5+ Nd7 9. Nh3 e5 10. dxe5 Qxe5 11. f4 gxf4 12. Nxf4 c6
Again we have the "book depth" problem --- I don't mind seeing an occasional Trompowsky, but following some game for 12 moves does not seem all that desirable.
1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. Nc3 Be7 5. Bf4 O-O 6. e3 c5 7. dxc5 Bxc5 8. a3 Nc6 9. Bd3 dxc4 10. Bxc4 Qxd1+ 11. Rxd1 b6 12. O-O Bb7
Topical Gruenfeld up until White's ninth, when Bd3 is almost a novelty (though probably not that bad -- humans don't like wasting the tempo, but computers are not always so picky).
1. e4 c6 2. d3 d5 3. Nd2 e5 4. Ngf3 Bd6 5. g3 Nf6 6. Bg2 O-O 7. O-O Re8 8. b3 Bg4 9. Bb2 d4 10. h3 Bh5 11. g4 Bg6 12. Nc4 Qc7
A Caro-Kann met by d3 -- not exactly what I'd expect, though this is played at the highest levels occasionally. The games become sparser and sparser until by move 10 or 11 they are all gone.
1. e4 d6 2. d4 Nf6 3. Nc3 g6 4. Be3 Bg7 5. f3 c6 6. Qd2 b5 7. Bh6 Bxh6 8. Qxh6 b4 9. Nd1 Qb6 10. Qd2 c5 11. dxc5 Qxc5 12. a3 bxa3
Another Pirc. This has never too popular at the top levels, as Black can't leverage too many dynamic possibilities, and has a harder time to draw than with a Petroff. White's 7th move is essentially equi-popular with 4 other moves, but is next-to-last in statistics. Seems decent in the end.
1. d4 f5 2. Nf3 Nf6 3. g3 g6 4. Bg2 Bg7 5. c4 O-O 6. Nc3 d6 7. O-O Nc6 8. d5 Ne5 9. Nxe5 dxe5 10. Qb3 h6 11. e4 f4 12. c5 Kh7
A Leningrad Dutch, perhaps a good choice under the scoring system. :) I get the book-exit at more than +0.5 for White, but maybe computers don't understand this position too well. White's 10. Qb3 is not the most popular (e4 has 600 games, compared to 400), but White scores 60%+ from it.
1. Nf3 d5 2. g3 g6 3. Bg2 Bg7 4. d3 e5 5. O-O Ne7 6. e4 Nbc6 7. Nbd2 O-O 8. Re1 Re8 9. c3 a5 10. a4 h6 11. Qc2 d4 12. Nc4 Bg4
Some sort of King's Indian Attack from White. There is a notable statistical edge to Black already at move 6, though maybe if you adjust for Elo (not too many strong players play this as White), it is not so bad.
1. c4 g6 2. Nc3 Bg7 3. g3 d6 4. Bg2 e5 5. Nf3 f5 6. O-O Nf6 7. d3 O-O 8. c5 Nc6 9. Qa4 d5 10. Bg5 e4 11. Nd4 Nxd4 12. Qxd4 c6
A Catalan from White, I think. The mainline is 8. Rb1, though the game-line looks reasonable. White's 9. Qa4 seems to dissipate most of any advantage.
1. g3 e5 2. d3 d5 3. Nf3 Nc6 4. Bg2 Nf6 5. O-O Be7 6. Nbd2 O-O 7. e4 Re8 8. c3 dxe4 9. dxe4 a5 10. Qe2 b6 11. Nc4 Bc5 12. Bd2 Qe7
I guess this transposed into an almost recognisable King's Indian Attack. You won't see too many Whites play like this in high-level games, though Black didn't exactly push his edge at every turn.

Martin Thoresen
Posts: 386
Joined: Thu Jun 10, 2010 5:27 am

Re: Making an opening book (TCEC and more)

Post by Martin Thoresen » Wed Jan 12, 2011 12:37 pm

BB+,

I will, after the analysis on my laptop has finished (should be a couple of more days), post a draft of my upcoming
TCEC book in my forum. I would very much like you to join and give your impressions when the draft is posted.

I have gathered the games from players only over 2690 ELO, some high profile correspondence games and a few high profile computer games.

The total number of unique openings ended on 10.762 before my analysis. They are also cut down to 10 moves.

Houdini 1.5 is running an analysis for the next 4 moves out of book before writing the eval in the pgn.
I will then use pgnscanner to filter out the openings that are outside of certain accepted eval criteria.

I sure understand that it is possible to make a new pgn for each Division and that was what I did in the beginning of TCEC (the "old" system),
but that became a burden and I would really like to have just one major opening lines source.

Best,
Martin
TCEC - Thoresen Chess Engines Competition
http://tcec.chessdom.com

DrChess
Posts: 3
Joined: Sun Aug 08, 2010 8:58 pm
Real Name: Angel

Re: Making an opening book (TCEC and more)

Post by DrChess » Thu Jan 13, 2011 7:11 am

BB+
Where can I find your book?
Thanks

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: Making an opening book (TCEC and more)

Post by BB+ » Thu Jan 13, 2011 7:23 am

They are also cut down to 10 moves.
I guess I still don't understand a fixed limit. For instance, 10 moves into a Marshall Gambit is almost nothing, but 10 moves into a Trompowsky Attack is likely to have exited theory.
Where can I find your book?
I'm no expert on culling information from game databases, otherwise I might be able to do what I described above. Right now my "book" only exists "in theory", as it were.

smurfie
Posts: 3
Joined: Wed Sep 08, 2010 11:03 am
Real Name: Joan

Re: Making an opening book (TCEC and more)

Post by smurfie » Thu Jan 13, 2011 12:39 pm

For my tournaments I use a book using games from TWIC (last 10 years) and elo of each player superior to 2500. And let the engines follow the book until number of games <5 (it can be increased to 10 for more efficiency but is not likely 5 GMs play the same mistake). The pros and the cons:
pros:
- Interesting current openings (maybe the engines will find some interesting novelty).
- Avoiding the removal of lines where computer gives -1 or -2 but the compensation is enough. Normally lines with sacrifices.
- Not finishing a Najdorf at move 12 or playing an Alvin further than the 8 move. Each game will be played until the theory is almost finished.
cons:
- Reaching a position where the best thing is a threefold repetition
- Petrof and Berlin defence (Spanish) are quite popular these days and are drawish lines but also played at GM level why not in computer level? (maybe deleteng Kramnik and Leko games will solve this :D but Shirov plays berlin defence too :( )
So maybe to avoid the cons it can be removed the games that finishes in draw before move 20 (or 30?) and the lines where the % of draw is superior to 70%.

Martin Thoresen
Posts: 386
Joined: Thu Jun 10, 2010 5:27 am

Re: Making an opening book (TCEC and more)

Post by Martin Thoresen » Sat Jan 15, 2011 7:09 pm

BB+,

I've attached a sample of the new book in my forum. The book ended on 9.106 openings.

Best,
Martin
TCEC - Thoresen Chess Engines Competition
http://tcec.chessdom.com

Post Reply