Ultimate Database

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
Post Reply
EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Ultimate Database

Post by EmptikBest » Sat Sep 09, 2023 7:07 am

Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc

User avatar
deeds
Posts: 709
Joined: Wed Oct 20, 2021 9:24 pm
Location: France
Contact:

Re: Ultimate Database

Post by deeds » Sat Sep 09, 2023 9:39 am

Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games

EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest » Sat Sep 09, 2023 10:50 am

deeds wrote:
Sat Sep 09, 2023 9:39 am
Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
Thanks, I will try pgn-extract first and then will try something to do with SCID that someone on outskirts told me :)

Also thanks for the FICS games, will add those too!

EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest » Sat Sep 09, 2023 4:43 pm

deeds wrote:
Sat Sep 09, 2023 9:39 am
Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
I ran "./pgn-extract -D -o Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn Turnier-NN-60+0.6_gesamt-03.06.2022.pgn", the input file (Turnier-NN-60+0.6_gesamt-03.06.2022.pgn) was 1.14 GB, but the output file (Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn) was 1.40GB???

User avatar
deeds
Posts: 709
Joined: Wed Oct 20, 2021 9:24 pm
Location: France
Contact:

Re: Ultimate Database

Post by deeds » Sat Sep 09, 2023 7:13 pm

This happens often because pgn-extract saves the parts in a format using more characters :

pgn-extract.exe -ooutput.pgn input.pgn

Image

EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest » Sun Sep 10, 2023 5:23 am

ANNOUNCEMENT:

FICS 2000-2012 will be added in next update, thanks to deeds! These are 116GiB unfiltered, no doubles removed, after filtering will probably be less..
ICCF 2015-2022 will be added in next update, 323MB unfiltered..

ALL comments will be removed to save space, sorry :(

If I have time, maybe I'll make a seperate Chess960 archive..

P.S: If anyone has Chess960 games/DBs to share, please send them, I will probably make a seperate archive for 960

Post Reply