Page 1 of 1

Ultimate Database

Posted: Sat Sep 09, 2023 7:07 am
by EmptikBest
Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc

Re: Ultimate Database

Posted: Sat Sep 09, 2023 9:39 am
by deeds
Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games

Re: Ultimate Database

Posted: Sat Sep 09, 2023 10:50 am
by EmptikBest
deeds wrote:
Sat Sep 09, 2023 9:39 am
Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
Thanks, I will try pgn-extract first and then will try something to do with SCID that someone on outskirts told me :)

Also thanks for the FICS games, will add those too!

Re: Ultimate Database

Posted: Sat Sep 09, 2023 4:43 pm
by EmptikBest
deeds wrote:
Sat Sep 09, 2023 9:39 am
Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
I ran "./pgn-extract -D -o Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn Turnier-NN-60+0.6_gesamt-03.06.2022.pgn", the input file (Turnier-NN-60+0.6_gesamt-03.06.2022.pgn) was 1.14 GB, but the output file (Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn) was 1.40GB???

Re: Ultimate Database

Posted: Sat Sep 09, 2023 7:13 pm
by deeds
This happens often because pgn-extract saves the parts in a format using more characters :

pgn-extract.exe -ooutput.pgn input.pgn

Image

Re: Ultimate Database

Posted: Sun Sep 10, 2023 5:23 am
by EmptikBest
ANNOUNCEMENT:

FICS 2000-2012 will be added in next update, thanks to deeds! These are 116GiB unfiltered, no doubles removed, after filtering will probably be less..
ICCF 2015-2022 will be added in next update, 323MB unfiltered..

ALL comments will be removed to save space, sorry :(

If I have time, maybe I'll make a seperate Chess960 archive..

P.S: If anyone has Chess960 games/DBs to share, please send them, I will probably make a seperate archive for 960