Page 1 of 1

How does crafty's cluster work?

Posted: Mon Jul 26, 2010 9:43 pm
by benstoker
I did some lazy searching in talkchess and came up null for answers to the following. Out of curiosity, can someone give a description of what Dr. Hyatt's cluster is made up of, which he runs his tests on? What's the hardware? What OS? Linux? What about the cluster software - what's that? How does it work? Can you assign a 3 ghz processor to 16 engines or 16 processors to one engine? Do you log in to a shell account with pre-assigned cores available?

Also, how does Dr. Hyatt run these engine-engine games? What software tool does he use to make the engines talk to each other? Surely not xboard, since he must run these tests via a CLI terminal only - or maybe not.

How much RAM is on this cluster?

If all the processors are NOT on one chip, how can the threads communicate fast enough?

p.s. I want one. Can I get one at the Apple store?

Re: How does crafty's cluster work?

Posted: Tue Jul 27, 2010 12:26 am
by hyatt
We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...

Re: How does crafty's cluster work?

Posted: Tue Jul 27, 2010 1:17 am
by benstoker
Thanks. Is your referee program available to the public or open source?
hyatt wrote:We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...

Re: How does crafty's cluster work?

Posted: Tue Jul 27, 2010 6:20 pm
by hyatt
benstoker wrote:Thanks. Is your referee program available to the public or open source?
I have not released it as I am not looking for _another_ program to support. :) I may, at some point, since it has played many tens of millions of games so far...

y

hyatt wrote:We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...

Re: How does crafty's cluster work?

Posted: Thu Jul 29, 2010 4:42 pm
by benstoker
If you do release it, may I suggest a name for it --- "kudzu". I was in Birmingham the other day and noticed all the kudzu. It evokes the notion of multiple connections.

[Idle thought for the day]
hyatt wrote:
benstoker wrote:Thanks. Is your referee program available to the public or open source?
I have not released it as I am not looking for _another_ program to support. :) I may, at some point, since it has played many tens of millions of games so far...

y

hyatt wrote:We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...

Re: How does crafty's cluster work?

Posted: Thu Jul 29, 2010 7:47 pm
by hyatt
benstoker wrote:If you do release it, may I suggest a name for it --- "kudzu". I was in Birmingham the other day and noticed all the kudzu. It evokes the notion of multiple connections.

[Idle thought for the day]
We definitely have Kudzu all over the south. Stuff grows up to 6' (six feet!) per night with lots of rain and sunshine. If you pass thru, give me a call. I'm in the phone book...


hyatt wrote:
benstoker wrote:Thanks. Is your referee program available to the public or open source?
I have not released it as I am not looking for _another_ program to support. :) I may, at some point, since it has played many tens of millions of games so far...

y

hyatt wrote:We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...

Re: How does crafty's cluster work?

Posted: Fri Jul 30, 2010 5:06 pm
by BrianR
Any update on cluster search for Crafty?

Re: How does crafty's cluster work?

Posted: Fri Jul 30, 2010 5:36 pm
by benstoker
I noted the nice temp also, that is, compared to the oppressive swelter of Austin. I did think about ringing in to see if I could come gawk at Chess Engine Cluster Central Station, but it was all biz - in and out and no time. Maybe next time.
hyatt wrote:
benstoker wrote:If you do release it, may I suggest a name for it --- "kudzu". I was in Birmingham the other day and noticed all the kudzu. It evokes the notion of multiple connections.

[Idle thought for the day]
We definitely have Kudzu all over the south. Stuff grows up to 6' (six feet!) per night with lots of rain and sunshine. If you pass thru, give me a call. I'm in the phone book...


hyatt wrote:
benstoker wrote:Thanks. Is your referee program available to the public or open source?
I have not released it as I am not looking for _another_ program to support. :) I may, at some point, since it has played many tens of millions of games so far...

y

hyatt wrote:We have two different cluster. One has 128 nodes, two cpus per node, for a total of 256. 8gb RAM, nodes connected with gigabit and InfiniBand.

The other cluster has 70 nodes, 8 cores (dual quad-core) per node, 560 total cores. 12GB RAM/node, same kind of interconnections.

Both clusters have huge disk storage arrays. Both run 64 bit linux, with both gcc and the Intel C++ compiler (I use the Intel compiler only, myself).

You run things by submitting shell scripts. You can submit one script for each cpu, one for each node, or one for a group of nodes (you specify this in the script.)

I use a referee program I wrote that does essentially what xboard does in match mode, except that there is no GUI to show the game as it is played out. The referee can play two programs against each other, and can be told which positions out of an EPD file to use as the starting positions, and how many games per position to play (usually 2). It can also be told the time control to use. It plays the games, records the PGN, and then BayesElo eats all the PGN and gives a clear result.

I have an automated script that will play many versions, and even play the same version several times varying one parameter for each different match, used for tuning. The tests typically do not involve parallel search at all, although I have run some parallel search test matches on the 8 core per node cluster. But no distributed search...