Some tablebase formats
Posted: Thu Nov 25, 2010 11:54 pm
An old FAQ: http://www.horizonchess.com/FAQ/Winboard/egtb.html You can read about formats like Ferret, Nimzo, Yace, and maybe more.
I will discuss the more well-known ones, and ones that post-date that page.
Nalimov
By Eugene Nalimov, circa 1998?
Generation code has appeared in various public/private places; the most useful is perhaps the rewriting by Gian-Carlo Pascutto (see http://talkchess.com/forum/viewtopic.php?t=29745 ), though I'm not sure the compression failures were ever resolved.
It takes about 1 day to build a 6-piece ending (not sure if with a pawn?), with it being bound by streaming I/O. Not sure what the RAM requirements are, and it uses the grandfather method I think. The 6 piece do not include 5 vs 1, and take about 1.2TB; the 5 piece are around 7.4GB. Uses 8K blocks with compression from code of Kadatch, and can take about 20 minutes to load the indexing for the 6 piece (I'm not even sure how much RAM this uses, but around 8GB).
One of the main problems is that it is not easy to obtain a license from Nalimov. There is a paper with Nalimov as a co-author that tells about indexing schemes, though in retrospect (due to nearly free compression of broken positions), it is not clear whether these are all that useful, at least for disk storage. It lists some main objectives as: White/Black treated symmetrically, practical and efficient for OTB play (indexing should be compact, 8K blocks are called "time-optimal", side-to-move men are clustered). By far the most used amongst aficionados, even with all the warts.
FEG ftp://ftp.ubisoft.com/games/chessmaster ... Readme.TXT
Made by Johan de Koning (1998-2003). A generator came with ChessMaster. Can build 6 piece in 512MB of RAM, and Bourzutschky built RPP vs R in about a month using 3 desktop computers many years ago. He built a subset of the 6 piece, already taking about 0.5 terabytes.
Says all 5 piece can be built on 1.5GHz machine (2002) in less than 5 days. Unclear if parallel build is possible, but likely I/O bound for 6 piece in any event. I'm not sure which method it uses; in the Readme it says "backmoves" are generated, but that doesn't imply too much. Uses DTM with length-to-mate or not-win (losses/draws are the same) I think, with about 5.7GB for the 5 piece. Has a few ambiguities with mates in more than 254 (only in RN vs NN I think). Some versions of the generator had bugs with en passant. Has a non-public format, in 32K chunks (reverse engineered by Bourzutschky). Can actually build 7 piece, and even 5 vs 5 supposedly.
Scorpio bitbases http://cap.connx.com/chess-engines/scorpio/
From Daniel Shawul (2006?), and also used in Toga. These are distributed as 1-sided bases, and use 8K blocks as with Nalimov (it uses some LZ/Huffman method, though not the same of Kadatch's). The generation seems to use the grandfather method. They take around 340MB (though the download is 204MB I think). I do not think they can be loaded efficiently into memory too easily due to the compression method, though presumably a factor of 4 could be saved by decompressing each result into 2 bits rather than a byte. There was a thread about building some 6 piece bases already in 2007, but I don't see where they are available (if at all). It is not clear to me what the current status of the project is.
Shredderbases
By Stefan Meyer-Kahlen (2006?), bitbases for use with the Shredder engine.
These are 441MB for 5 piece, and 157MB for a half-set. I think (not completely sure) they use a method involving capture/check resolution to increase data redundancy, and then some type of run-length encoding. This is similar to the checkers work of Schaeffer et al., and the RobboTripleBases have followed the same path. They can be loaded into memory and then accessed directly w/o any further decompression into a larger table. The 6 piece have been said to be coming soon for a few years.
RobboBases
By "Roberto Pescatore" as it were (Oct 2009?). Open source generator code and bitbases.
First the tablebases. The data format uses 64K blocks (or 1MB for 6 piece I think), and splits up as broken/win/draw/loss-in-X, where distance-to-conversion is used. There is a line in the code: "Suffix_fare (bufI, indici, pro); /* larsson plus sadakane */" so it seems they are using the Larsson/Sadakane method with suffix trees for the compression, in a bzip2-style overall. This seems to work well in practise, as they take 3.4GB for the 5 piece and only 475GB for the 6 piece (including 5 vs 1). The generation code uses the out-counting method, and runs in parallel, building the 5s in around 3 hours on a quad. For 6 piece, they have no disk-build option, and it seems to me that you need about 12GB to build them in RAM, though they say 32GB (and then it takes 3-4 cpu-months). They split pawns up by files in many cases, which allows these to be built in less RAM.
The bitbases are called "TripleBases" (I guess meaning three results), and seem to meant to be used in RAM, at least before the 6 piece became available. There is now some sort of loading option from disk. These use capture-resolution and run-length encoding for compression. There is some stuff about Huffman encoding on top of that in the code, but it seems not to be used (perhaps for speed?). One upshot of this is that, for instance, the 5 piece "TripleBases" use around 570MB in a loadable state, but can be compressed down to about half that (useful when downloading). The 6 piece "TripleBases" are about 100GB on disk. To build a "TripleBase", you need the associated "TotalBase" of above, but for usage you do not.
They provide full download options for everything, including the 6 piece sets. Both types of RobboBases can include something called "BlockedBases", which occur for white/black pawn pairs on say e3/e4. These increase the size of the data by about 40% it seems.
Gaviota http://sites.google.com/site/gaviotache ... blebases-1
By Miguel Ballicora (Feb 2010). This is the newest, and sort of the Swiss army knife of the bunch. For instance, you can plug in your own compression method. The tablebases are distance-to-mate. The 5 piece are currently available, and the plans for 6 piece are unclear. As usual, the concerns about RAM and streaming I/O will come into play, though SMP is already (partially?) feasible in the build process. There is a lot of pawn slicing done for endgames with pawns, though I have not been able to determine the exact impact of this.
I had thought the generation code was available (given that there was a great cry "Finally, Open Source Tablebases!" when the announcement was made), but I didn't find it at http://sites.google.com/site/gaviotache ... e/download and all is said in the original post is the ominous: "PS: The generator code release will come later." With this in mind, I can't really say anything about the bitbases either, other than that they exist, and are said to be "on-the-fly"?!
Josh Shriver has 145 files for 3/4/5s that can be downloaded, and they total up to about 6.5GB, but as I say, the specific
compression can be changed (see http://sites.google.com/site/gaviotachessengine/schemes for more). There seem to be version updates on a fairly regular basis, though it is unclear what the next major development goal is (for instance, is 6 piece being pursued?).
I will discuss the more well-known ones, and ones that post-date that page.
Nalimov
By Eugene Nalimov, circa 1998?
Generation code has appeared in various public/private places; the most useful is perhaps the rewriting by Gian-Carlo Pascutto (see http://talkchess.com/forum/viewtopic.php?t=29745 ), though I'm not sure the compression failures were ever resolved.
It takes about 1 day to build a 6-piece ending (not sure if with a pawn?), with it being bound by streaming I/O. Not sure what the RAM requirements are, and it uses the grandfather method I think. The 6 piece do not include 5 vs 1, and take about 1.2TB; the 5 piece are around 7.4GB. Uses 8K blocks with compression from code of Kadatch, and can take about 20 minutes to load the indexing for the 6 piece (I'm not even sure how much RAM this uses, but around 8GB).
One of the main problems is that it is not easy to obtain a license from Nalimov. There is a paper with Nalimov as a co-author that tells about indexing schemes, though in retrospect (due to nearly free compression of broken positions), it is not clear whether these are all that useful, at least for disk storage. It lists some main objectives as: White/Black treated symmetrically, practical and efficient for OTB play (indexing should be compact, 8K blocks are called "time-optimal", side-to-move men are clustered). By far the most used amongst aficionados, even with all the warts.
FEG ftp://ftp.ubisoft.com/games/chessmaster ... Readme.TXT
Made by Johan de Koning (1998-2003). A generator came with ChessMaster. Can build 6 piece in 512MB of RAM, and Bourzutschky built RPP vs R in about a month using 3 desktop computers many years ago. He built a subset of the 6 piece, already taking about 0.5 terabytes.
Says all 5 piece can be built on 1.5GHz machine (2002) in less than 5 days. Unclear if parallel build is possible, but likely I/O bound for 6 piece in any event. I'm not sure which method it uses; in the Readme it says "backmoves" are generated, but that doesn't imply too much. Uses DTM with length-to-mate or not-win (losses/draws are the same) I think, with about 5.7GB for the 5 piece. Has a few ambiguities with mates in more than 254 (only in RN vs NN I think). Some versions of the generator had bugs with en passant. Has a non-public format, in 32K chunks (reverse engineered by Bourzutschky). Can actually build 7 piece, and even 5 vs 5 supposedly.
Scorpio bitbases http://cap.connx.com/chess-engines/scorpio/
From Daniel Shawul (2006?), and also used in Toga. These are distributed as 1-sided bases, and use 8K blocks as with Nalimov (it uses some LZ/Huffman method, though not the same of Kadatch's). The generation seems to use the grandfather method. They take around 340MB (though the download is 204MB I think). I do not think they can be loaded efficiently into memory too easily due to the compression method, though presumably a factor of 4 could be saved by decompressing each result into 2 bits rather than a byte. There was a thread about building some 6 piece bases already in 2007, but I don't see where they are available (if at all). It is not clear to me what the current status of the project is.
Shredderbases
By Stefan Meyer-Kahlen (2006?), bitbases for use with the Shredder engine.
These are 441MB for 5 piece, and 157MB for a half-set. I think (not completely sure) they use a method involving capture/check resolution to increase data redundancy, and then some type of run-length encoding. This is similar to the checkers work of Schaeffer et al., and the RobboTripleBases have followed the same path. They can be loaded into memory and then accessed directly w/o any further decompression into a larger table. The 6 piece have been said to be coming soon for a few years.
RobboBases
By "Roberto Pescatore" as it were (Oct 2009?). Open source generator code and bitbases.
First the tablebases. The data format uses 64K blocks (or 1MB for 6 piece I think), and splits up as broken/win/draw/loss-in-X, where distance-to-conversion is used. There is a line in the code: "Suffix_fare (bufI, indici, pro); /* larsson plus sadakane */" so it seems they are using the Larsson/Sadakane method with suffix trees for the compression, in a bzip2-style overall. This seems to work well in practise, as they take 3.4GB for the 5 piece and only 475GB for the 6 piece (including 5 vs 1). The generation code uses the out-counting method, and runs in parallel, building the 5s in around 3 hours on a quad. For 6 piece, they have no disk-build option, and it seems to me that you need about 12GB to build them in RAM, though they say 32GB (and then it takes 3-4 cpu-months). They split pawns up by files in many cases, which allows these to be built in less RAM.
The bitbases are called "TripleBases" (I guess meaning three results), and seem to meant to be used in RAM, at least before the 6 piece became available. There is now some sort of loading option from disk. These use capture-resolution and run-length encoding for compression. There is some stuff about Huffman encoding on top of that in the code, but it seems not to be used (perhaps for speed?). One upshot of this is that, for instance, the 5 piece "TripleBases" use around 570MB in a loadable state, but can be compressed down to about half that (useful when downloading). The 6 piece "TripleBases" are about 100GB on disk. To build a "TripleBase", you need the associated "TotalBase" of above, but for usage you do not.
They provide full download options for everything, including the 6 piece sets. Both types of RobboBases can include something called "BlockedBases", which occur for white/black pawn pairs on say e3/e4. These increase the size of the data by about 40% it seems.
Gaviota http://sites.google.com/site/gaviotache ... blebases-1
By Miguel Ballicora (Feb 2010). This is the newest, and sort of the Swiss army knife of the bunch. For instance, you can plug in your own compression method. The tablebases are distance-to-mate. The 5 piece are currently available, and the plans for 6 piece are unclear. As usual, the concerns about RAM and streaming I/O will come into play, though SMP is already (partially?) feasible in the build process. There is a lot of pawn slicing done for endgames with pawns, though I have not been able to determine the exact impact of this.
I had thought the generation code was available (given that there was a great cry "Finally, Open Source Tablebases!" when the announcement was made), but I didn't find it at http://sites.google.com/site/gaviotache ... e/download and all is said in the original post is the ominous: "PS: The generator code release will come later." With this in mind, I can't really say anything about the bitbases either, other than that they exist, and are said to be "on-the-fly"?!
Josh Shriver has 145 files for 3/4/5s that can be downloaded, and they total up to about 6.5GB, but as I say, the specific
compression can be changed (see http://sites.google.com/site/gaviotachessengine/schemes for more). There seem to be version updates on a fairly regular basis, though it is unclear what the next major development goal is (for instance, is 6 piece being pursued?).