Bits and Pieces

benstoker · Post by **benstoker** » Tue Mar 01, 2011 5:07 pm

Okay. Here I go.

This is the popcnt routine (for NON-popcnt instruction cpus) from ivanhoe. The actual question is at the bottom, below the code.

Code: Select all

static inline int POPCNT (uint64 w)
{
  w = w - ((w >> 1) & 0x5555555555555555ull);
  w = (w & 0x3333333333333333ull) + ((w >> 2) & 0x3333333333333333ull);
  w = (w + (w >> 4)) & 0x0f0f0f0f0f0f0f0full;
  return (w * 0x0101010101010101ull) >> 56;
}

The above version is explained:

Code: Select all

typedef unsigned long long uint64;  //assume this gives 64-bits
const uint64 m1  = 0x5555555555555555ULL; //binary: 0101...
const uint64 m2  = 0x3333333333333333ULL; //binary: 00110011..
const uint64 m4  = 0x0f0f0f0f0f0f0f0fULL; //binary:  4 zeros,  4 ones ...
const uint64 m8  = 0x00ff00ff00ff00ffULL; //binary:  8 zeros,  8 ones ...
const uint64 m16 = 0x0000ffff0000ffffULL; //binary: 16 zeros, 16 ones ...
const uint64 m32 = 0x00000000ffffffffULL; //binary: 32 zeros, 32 ones
const uint64 hff = 0xffffffffffffffffULL; //binary: all ones
const uint64 h01 = 0x0101010101010101ULL; //the sum of 256 to the power of 0,1,2,3...

/* Implement 'popcount_3' from  Wikipedia */

/* """This uses fewer arithmetic operations than any other known  
   implementation on machines with fast multiplication.
   It uses 12 arithmetic operations, one of which is a multiply.""" */

static inline int popcount_wikipedia_3(uint64 *buf, int n) {
  int cnt=0;
  uint64 x;
  do {
    x = *buf;
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    cnt += (x * h01)>>56;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24)+...
    buf++;
  } while (--n);
  return cnt;
}

This is crafty's:

Code: Select all

int static __inline__ PopCnt(long word)
{
  long dummy, dummy2, dummy3;

asm("          xorq    %0, %0    " "\n\t"
    "          testq   %1, %1    " "\n\t"
    "          jz      2f        " "\n\t"
    "1:        leaq    -1(%1),%2 " "\n\t"
    "          incq    %0        " "\n\t"
    "          andq    %2, %1    " "\n\t"
    "          jnz     1b        " "\n\t"
    "2:                          " "\n\t"
:   "=&r"(dummy), "=&r"(dummy2), "=&r"(dummy3)
:   "1"((long) (word))
:   "cc");
  return (dummy);
}

And this is stockfish's non SSE4.2 bitcount routine:

Code: Select all

template<>
inline int count_1s<CNT64>(Bitboard b) {
  b -= ((b>>1) & 0x5555555555555555ULL);
  b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL);
  b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
  b *= 0x0101010101010101ULL;
  return int(b >> 56);
}

Here's one from the AMD optimization guide:

Code: Select all

unsigned int popcount(unsigned int v) 
{ 
  unsigned int retVal; 
  __asm { 
  MOV EAX, [v]         ;v 
  MOV EDX, EAX         ;v 
  SHR EAX, 1           ;v >> 1 
  AND EAX, 055555555h  ;(v >> 1) & 0x55555555 
  SUB EDX, EAX         ;w = v - ((v >> 1) & 0x55555555) 
  MOV EAX, EDX         ;w 
  SHR EDX, 2           ;w >> 2 
  AND EAX, 033333333h  ;w & 0x33333333 
  AND EDX, 033333333h  ;(w >> 2) & 0x33333333 
  ADD EAX, EDX         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333) 
  MOV EDX, EAX         ;x 
  SHR EAX, 4           ;x >> 4 
  ADD EAX, EDX         ;x + (x >> 4) 
  AND EAX, 00F0F0F0Fh  ;y = (x + (x >> 4) & 0x0F0F0F0F) 
  IMUL EAX, 001010101h ;y * 0x01010101 
  SHR EAX, 24          ;population count = (y * 0x01010101) >> 24 
  MOV retVal, EAX      ;store result 
  } 
  return (retVal); 
}

Now, if I run a test to find out which implementation is the fastest on my box, the one in ivanhoe is the tops:

Code: Select all

FreeBSD version 1   :  7508216 us; cnt=32500610
FreeBSD version 2   :  5855897 us; cnt=32500610
16-bit LUT          :  3236626 us; cnt=32500610
8-bit LUT           :  5387574 us; cnt=32500610
8-bit LUT v2        :  4152103 us; cnt=32500610
Wikipedia #2        :  4149610 us; cnt=32500610
Wikipedia #3        :  2980791 us; cnt=32500610
gcc popcount        :  6520142 us; cnt=32500610
gcc popcountll      :  3185706 us; cnt=32500610
HAKMEM 169/X11      :  5812659 us; cnt=32500610
Bitslice(7)         :  4815994 us; cnt=32500610
Bitslice(24)        :  3112013 us; cnt=32500610
naive               : 56568623 us; cnt=32500610
Wegner/Kernigan     : 33442013 us; cnt=32500610
Anderson            : 10516925 us; cnt=32500610
8x shift and add    : 49656566 us; cnt=32500610
32x shift and add   : 47866979 us; cnt=32500610

There are many other routines doing the same thing. Here's test for various versions: http://dalkescientific.com/writings/diary/popcnt.c
And it happens that the version used by ivanhoe is the fastest on my core2.

And now, the Question (or questions): if I copy exactly Crafty's popcount implementation for my engine, 1) is that "impermissibly derived"?; and, 2) should credit be given by referencing Hyatt's Crafty as the origination of the implementation, even if Hyatt borrowed the implementation from some other source?

I'll give the following a name, by calling it the "Hyatt Rule", or "One-Input-One-Output Function Rule". Hyatt: "If a function that is consistent from input to output ... then perhaps copying that is not a deal-breaker on tournament participation." And, the "Hyatt Corollary": "If you choose to borrow something, ... one must at least provide a proper citation or give credit to the originator of the code that was used."

Okay. Let us apply the Hyatt Rule. At the outset, the various popcount implementations appear to me to be obvious examples of a one-input-one-output function. Is that correct? If true, the Hyatt Rule appears to allow rote code copying of these routines. However, there are numerous variations to popcount implementations. Stockfish's is identical to ivanhoe's, though one in c and the other c++. What I find is that the various implementations of this one-input-one-output function differ widely when it comes to speed. As you can see, the choice of a popcount implementation is significant. The "wikipedia #3/ivanhoe/stockfish" routine is the fastest. Whoever wrote these various routines a) knows a lot; b) used their knowledge; and, c) created something. I will certainly improve my engine, if I replace gcc's __builtin_popcount with wikipedia#3/stockfish/ivanhoe's routine. Now, Stockfish's 'bitcount.h' contains the standard GPL boilerplate above its bitcount routine. However, that routine is apparently public domain - it's on wikipedia without citation.

But, all of this is for illustration, so let us assume for the moment that Madame X authored the wikipedia#3/stockfish/ivanhoe version. Clearly, Madame X has created something of significance by inventing, or authoring, a function that nobody else did previously, and although this function takes the same input and produces the exact same output as all the other implementations, Madame X's is more than twice the speed of gcc's builtin. If Madame X uses her implementation in her chess engine, and it is not known or used by any other, am I free to disassemble her fine engine to discover her fast popcount implementation and use it in my engine without credit or citation? Given it's nature, after disassembly, my reconstruction in c would be identical to Madame X's.

How does the Hyatt Rule handle one-input-one-output functions whose various implementations differ widely in speed, speed being the primary objective? In other words, though probably impossible (and irrelevant with new processors), assume I managed to create another implementation that was three times faster than Madame X's? This is very low level stuff. Does the Hyatt Rule exonerate all who would disassemble my closed source engine and use my super fast method in their engines without credit. Again, the nature of this low level code would probably mean a c rewrite of disassemble code would be identical to my version without having ever seen my c code.

BB+ · Post by **BB+** » Wed Mar 02, 2011 3:04 am

At the outset, the various popcount implementations appear to me to be obvious examples of a one-input-one-output function. Is that correct?

I would say so.

How does the Hyatt Rule handle one-input-one-output functions whose various implementations differ widely in speed, speed being the primary objective?

This was essentially the old-time argument against the multiple use of tablebases -- others would implement it other ways, with different performance criteria, etc. If you want a philosophical thought [not necessarily one I make my own], once it is merely "speed" that is of issue, then perhaps the "creative" sense has been diminished to the extent of making it a mere engineering problem, rather than a computer chess problem per se (though, of course, engineering is also a place where creativity can occur). It's more difficult to judge this when, say, you have to consider "speed" due to a given choice of data structure [like in move generation].

I'm not sure whether your overall criterion of interest here is "copyright" or "computer chess originality" for the popcount implementation. There is clearly some "creative spark" in the fastest implementation, and implementations in general fall under copyright guidelines. If you want a more "computer chess originality" version of this, Rybka uses a "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next) -- this is largely an "engineering" advance in that the repetition will always get detected anyway (so I think it fits the Hyatt rule). The implementation in IPPOLIT is very close to that of Rybka, with only minor differences [though ones that do affect the search].

There's also "popcount when no more than X bits can be set" than can be faster in some cases [for instance, no more than 16 pieces of one colour or something].

zwegner · Post by **zwegner** » Wed Mar 02, 2011 3:12 am

BB+ wrote:I'm not sure whether your overall criterion of interest here is "copyright" or "computer chess originality" for the popcount implementation. There is clearly some "creative spark" in the fastest implementation, and implementations in general fall under copyright guidelines. If you want a more "computer chess originality" version of this, Rybka uses a "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next) -- this is largely an "engineering" advance in that the repetition will always get detected anyway (so I think it fits the Hyatt rule). The implementation in IPPOLIT is very close to that of Rybka, with only minor differences [though ones that do affect the search].

This repetition idea is I believe a Gerd Isenberg idea, or at least I have some vague memory of him describing it. I'll try and find a reference.

EDIT: that didn't take long: http://www.talkchess.com/forum/viewtopic.php?t=13791

benstoker · Post by **benstoker** » Wed Mar 02, 2011 4:23 pm

BB+ wrote:
At the outset, the various popcount implementations appear to me to be obvious examples of a one-input-one-output function. Is that correct?
I would say so.

How does the Hyatt Rule handle one-input-one-output functions whose various implementations differ widely in speed, speed being the primary objective?
This was essentially the old-time argument against the multiple use of tablebases -- others would implement it other ways, with different performance criteria, etc. If you want a philosophical thought [not necessarily one I make my own], once it is merely "speed" that is of issue, then perhaps the "creative" sense has been diminished to the extent of making it a mere engineering problem, rather than a computer chess problem per se (though, of course, engineering is also a place where creativity can occur). It's more difficult to judge this when, say, you have to consider "speed" due to a given choice of data structure [like in move generation].

I'm not sure whether your overall criterion of interest here is "copyright" or "computer chess originality" for the popcount implementation. There is clearly some "creative spark" in the fastest implementation, and implementations in general fall under copyright guidelines. If you want a more "computer chess originality" version of this, Rybka uses a "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next) -- this is largely an "engineering" advance in that the repetition will always get detected anyway (so I think it fits the Hyatt rule). The implementation in IPPOLIT is very close to that of Rybka, with only minor differences [though ones that do affect the search].

There's also "popcount when no more than X bits can be set" than can be faster in some cases [for instance, no more than 16 pieces of one colour or something].

Thank you for this response.

I am after the "computer chess originality" notion, first and foremost, as applied to the general "one-engine-one-author" rule that guides ICGA tournaments. But to the extent copyright law informs the issue, then that should certainly be referenced. Application of copyright law and software licensing will be another interminable set of problems - tomorrow is another day for that! Your distinction between the "computer chess originality" concept and copyright law is very helpful and clarifies the issues.

A thorough, specific, detailed discussion of these concepts apparently does not take place in any fora I have seen. BB+ has taken a stab at. I wanted to take it down to the concrete, and hope to eke out a set of definitions and criteria for separating "okay-to-copy-and-still-be-original" from "not-okay-to-copy-clone".

Initial definitions:

"Clone" = impermissible derivative via code copying such that the engine is not properly considered the product of a single author.
"Original" = the engine is the sole creation of one author (or team) and qualifies for tournament entry.

In between "Clone" and "Original" and between the multitude of "Originals" there is a great deal of "Transfer of Ideas" and rote "code copying".

What you say above introduces the idea of "engineering advance" to the Hyatt Rule. It seems that you are saying that copying the code of an "engineering advance" does not necessarily define a Clone. How about this:

Hyatt Rule Section 1:
1(a): Rule: The source code of a one-input-one-output function may be copied to an Original even if that source code includes another author's engineering advance that merely increases the speed of the function. [If this is too general, then further specification needed.]
Examples of source code that Original engine authors may copy by rote:

-- popcount implementations
-- all bitscan implementations (?)
-- Repetition Check(), including "Rybka's "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next)"
-- move generator code
-- EGTB probing code
-- SEE
-- UCI/winboard stdin-stdout implementations (?)
-- TBD: other concrete examples ...

1(b): Rule: Attribution ... [TBD]

...

hyatt · Post by **hyatt** » Wed Mar 02, 2011 5:46 pm

benstoker wrote:Okay. Here I go.

This is the popcnt routine (for NON-popcnt instruction cpus) from ivanhoe. The actual question is at the bottom, below the code.

Code: Select all

static inline int POPCNT (uint64 w)
{
  w = w - ((w >> 1) & 0x5555555555555555ull);
  w = (w & 0x3333333333333333ull) + ((w >> 2) & 0x3333333333333333ull);
  w = (w + (w >> 4)) & 0x0f0f0f0f0f0f0f0full;
  return (w * 0x0101010101010101ull) >> 56;
}

The above version is explained:

Code: Select all

typedef unsigned long long uint64;  //assume this gives 64-bits
const uint64 m1  = 0x5555555555555555ULL; //binary: 0101...
const uint64 m2  = 0x3333333333333333ULL; //binary: 00110011..
const uint64 m4  = 0x0f0f0f0f0f0f0f0fULL; //binary:  4 zeros,  4 ones ...
const uint64 m8  = 0x00ff00ff00ff00ffULL; //binary:  8 zeros,  8 ones ...
const uint64 m16 = 0x0000ffff0000ffffULL; //binary: 16 zeros, 16 ones ...
const uint64 m32 = 0x00000000ffffffffULL; //binary: 32 zeros, 32 ones
const uint64 hff = 0xffffffffffffffffULL; //binary: all ones
const uint64 h01 = 0x0101010101010101ULL; //the sum of 256 to the power of 0,1,2,3...

/* Implement 'popcount_3' from  Wikipedia */

/* """This uses fewer arithmetic operations than any other known  
   implementation on machines with fast multiplication.
   It uses 12 arithmetic operations, one of which is a multiply.""" */

static inline int popcount_wikipedia_3(uint64 *buf, int n) {
  int cnt=0;
  uint64 x;
  do {
    x = *buf;
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    cnt += (x * h01)>>56;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24)+...
    buf++;
  } while (--n);
  return cnt;
}

This is crafty's:

Code: Select all

int static __inline__ PopCnt(long word)
{
  long dummy, dummy2, dummy3;

asm("          xorq    %0, %0    " "\n\t"
    "          testq   %1, %1    " "\n\t"
    "          jz      2f        " "\n\t"
    "1:        leaq    -1(%1),%2 " "\n\t"
    "          incq    %0        " "\n\t"
    "          andq    %2, %1    " "\n\t"
    "          jnz     1b        " "\n\t"
    "2:                          " "\n\t"
:   "=&r"(dummy), "=&r"(dummy2), "=&r"(dummy3)
:   "1"((long) (word))
:   "cc");
  return (dummy);
}

And this is stockfish's non SSE4.2 bitcount routine:

Code: Select all

template<>
inline int count_1s<CNT64>(Bitboard b) {
  b -= ((b>>1) & 0x5555555555555555ULL);
  b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL);
  b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
  b *= 0x0101010101010101ULL;
  return int(b >> 56);
}

Here's one from the AMD optimization guide:

Code: Select all

unsigned int popcount(unsigned int v) 
{ 
  unsigned int retVal; 
  __asm { 
  MOV EAX, [v]         ;v 
  MOV EDX, EAX         ;v 
  SHR EAX, 1           ;v >> 1 
  AND EAX, 055555555h  ;(v >> 1) & 0x55555555 
  SUB EDX, EAX         ;w = v - ((v >> 1) & 0x55555555) 
  MOV EAX, EDX         ;w 
  SHR EDX, 2           ;w >> 2 
  AND EAX, 033333333h  ;w & 0x33333333 
  AND EDX, 033333333h  ;(w >> 2) & 0x33333333 
  ADD EAX, EDX         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333) 
  MOV EDX, EAX         ;x 
  SHR EAX, 4           ;x >> 4 
  ADD EAX, EDX         ;x + (x >> 4) 
  AND EAX, 00F0F0F0Fh  ;y = (x + (x >> 4) & 0x0F0F0F0F) 
  IMUL EAX, 001010101h ;y * 0x01010101 
  SHR EAX, 24          ;population count = (y * 0x01010101) >> 24 
  MOV retVal, EAX      ;store result 
  } 
  return (retVal); 
}

Now, if I run a test to find out which implementation is the fastest on my box, the one in ivanhoe is the tops:

Code: Select all

FreeBSD version 1   :  7508216 us; cnt=32500610
FreeBSD version 2   :  5855897 us; cnt=32500610
16-bit LUT          :  3236626 us; cnt=32500610
8-bit LUT           :  5387574 us; cnt=32500610
8-bit LUT v2        :  4152103 us; cnt=32500610
Wikipedia #2        :  4149610 us; cnt=32500610
Wikipedia #3        :  2980791 us; cnt=32500610
gcc popcount        :  6520142 us; cnt=32500610
gcc popcountll      :  3185706 us; cnt=32500610
HAKMEM 169/X11      :  5812659 us; cnt=32500610
Bitslice(7)         :  4815994 us; cnt=32500610
Bitslice(24)        :  3112013 us; cnt=32500610
naive               : 56568623 us; cnt=32500610
Wegner/Kernigan     : 33442013 us; cnt=32500610
Anderson            : 10516925 us; cnt=32500610
8x shift and add    : 49656566 us; cnt=32500610
32x shift and add   : 47866979 us; cnt=32500610

There are many other routines doing the same thing. Here's test for various versions: http://dalkescientific.com/writings/diary/popcnt.c
And it happens that the version used by ivanhoe is the fastest on my core2.

And now, the Question (or questions): if I copy exactly Crafty's popcount implementation for my engine, 1) is that "impermissibly derived"?; and, 2) should credit be given by referencing Hyatt's Crafty as the origination of the implementation, even if Hyatt borrowed the implementation from some other source?

That is not my code. It is an idea that I "think" comes from "The bit-hacker's guide". Based on this:

c=0;
while (A & A - 1) c++
return (c);

Simple. My PopCnt() usage in Crafty is always on words with few one bits set. Which makes my code about as fast as one can do outside of using the hardware popcnt instruction. And, indeed, using hardware popcnt does not speed up Crafty enough to matter/measure. The nice thing about the above approach is that if there are N bits set, you execute the loop N times. The other folding approaches require log2(64) operations. If N is < log2(64) then I win. If N > log2(64) I lose. But in my case, I win because I don't do mobility bit counting during the search, it is all pre-computed when the program initializes...

I see no problem with using that code and any other interpretation is silly. For the above simple piece of C code, how many ways can you do it in asm? Nalimov wrote the code that uses the LEA trick to do two things in one instruction (compute A - 1 and put the result somewhere else).

I'll give the following a name, by calling it the "Hyatt Rule", or "One-Input-One-Output Function Rule". Hyatt: "If a function that is consistent from input to output ... then perhaps copying that is not a deal-breaker on tournament participation." And, the "Hyatt Corollary": "If you choose to borrow something, ... one must at least provide a proper citation or give credit to the originator of the code that was used."

Within reason. I am not going to cite using printf() or malloc(). Nor sqrt(). The bit count function is well-known and used everywhere and could well be in a library somewhere.

Okay. Let us apply the Hyatt Rule. At the outset, the various popcount implementations appear to me to be obvious examples of a one-input-one-output function. Is that correct? If true, the Hyatt Rule appears to allow rote code copying of these routines. However, there are numerous variations to popcount implementations. Stockfish's is identical to ivanhoe's, though one in c and the other c++. What I find is that the various implementations of this one-input-one-output function differ widely when it comes to speed. As you can see, the choice of a popcount implementation is significant. The "wikipedia #3/ivanhoe/stockfish" routine is the fastest. Whoever wrote these various routines a) knows a lot; b) used their knowledge; and, c) created something. I will certainly improve my engine, if I replace gcc's __builtin_popcount with wikipedia#3/stockfish/ivanhoe's routine. Now, Stockfish's 'bitcount.h' contains the standard GPL boilerplate above its bitcount routine. However, that routine is apparently public domain - it's on wikipedia without citation.

This is another instance of PRNGs. There are many. Some are fast, some are slow. Some have long periods, some short. You choose the one that best suits what you are doing. If you have few 1 bits, mine beats the others easily. If you have many one bits, the folding method is faster since mine is O(# 1 bits) while the folding approach is O(6). You choose what fits best and nobody in their right mind would consider that copying. But copy EvaluatePassedPawns() and the conclusion would be different.

But, all of this is for illustration, so let us assume for the moment that Madame X authored the wikipedia#3/stockfish/ivanhoe version. Clearly, Madame X has created something of significance by inventing, or authoring, a function that nobody else did previously, and although this function takes the same input and produces the exact same output as all the other implementations, Madame X's is more than twice the speed of gcc's builtin. If Madame X uses her implementation in her chess engine, and it is not known or used by any other, am I free to disassemble her fine engine to discover her fast popcount implementation and use it in my engine without credit or citation? Given it's nature, after disassembly, my reconstruction in c would be identical to Madame X's.

How does the Hyatt Rule handle one-input-one-output functions whose various implementations differ widely in speed, speed being the primary objective? In other words, though probably impossible (and irrelevant with new processors), assume I managed to create another implementation that was three times faster than Madame X's? This is very low level stuff. Does the Hyatt Rule exonerate all who would disassemble my closed source engine and use my super fast method in their engines without credit. Again, the nature of this low level code would probably mean a c rewrite of disassemble code would be identical to my version without having ever seen my c code.

one-input -> one-output is pretty clear and well-defined. The minute you reach one-input -> multiple possible outputs, you cross the line. But not before, IMHO. Note that this is a suggestion. NO one has adopted it as a rule or policy...

Bo Persson · Post by **Bo Persson** » Wed Mar 02, 2011 9:12 pm

Ben, I will give you another version:

Code: Select all

inline int MemberCount(long long Bits)
{
    int Count = 0;

    while (Bits != 0)
    {
        ++Count;
        Bits &= (Bits - 1);
    }

    return Count;
}
// COPYRIGHT 2011 Bo Persson

With VS2010 64-bit edition, this generates exactly the same instructions as the Crafty hand optimized assembler version.

Now, without copying any code, try to figure out how this works and write your own version.

Copyright problem solved!

benstoker · Post by **benstoker** » Wed Mar 02, 2011 9:24 pm

Bo Persson wrote:Ben, I will give you another version:
Code: Select all
inline int MemberCount(long long Bits)
{
    int Count = 0;

    while (Bits != 0)
    {
        ++Count;
        Bits &= (Bits - 1);
    }

    return Count;
}
// COPYRIGHT 2011 Bo Persson
With VS2010 64-bit edition, this generates exactly the same instructions as the Crafty hand optimized assembler version.

Now, without copying any code, try to figure out how this works and write your own version.

Copyright problem solved!

Code: Select all

static inline int popcount_original(unsigned *buf, int n) {
  int cnt=0;
  unsigned v;
  while (n--) {
    v = *buf;
    while (v) {
      cnt++;
      v &= v-1;
    }
    buf++;
  }
  return cnt;
}

benstoker · Post by **benstoker** » Thu Mar 03, 2011 6:06 am

My little accumulation of guidelines ...

benstoker wrote:
BB+ wrote:
At the outset, the various popcount implementations appear to me to be obvious examples of a one-input-one-output function. Is that correct?
I would say so.

How does the Hyatt Rule handle one-input-one-output functions whose various implementations differ widely in speed, speed being the primary objective?
This was essentially the old-time argument against the multiple use of tablebases -- others would implement it other ways, with different performance criteria, etc. If you want a philosophical thought [not necessarily one I make my own], once it is merely "speed" that is of issue, then perhaps the "creative" sense has been diminished to the extent of making it a mere engineering problem, rather than a computer chess problem per se (though, of course, engineering is also a place where creativity can occur). It's more difficult to judge this when, say, you have to consider "speed" due to a given choice of data structure [like in move generation].

I'm not sure whether your overall criterion of interest here is "copyright" or "computer chess originality" for the popcount implementation. There is clearly some "creative spark" in the fastest implementation, and implementations in general fall under copyright guidelines. If you want a more "computer chess originality" version of this, Rybka uses a "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next) -- this is largely an "engineering" advance in that the repetition will always get detected anyway (so I think it fits the Hyatt rule). The implementation in IPPOLIT is very close to that of Rybka, with only minor differences [though ones that do affect the search].

There's also "popcount when no more than X bits can be set" than can be faster in some cases [for instance, no more than 16 pieces of one colour or something].
Thank you for this response.

I am after the "computer chess originality" notion, first and foremost, as applied to the general "one-engine-one-author" rule that guides ICGA tournaments. But to the extent copyright law informs the issue, then that should certainly be referenced. Application of copyright law and software licensing will be another interminable set of problems - tomorrow is another day for that! Your distinction between the "computer chess originality" concept and copyright law is very helpful and clarifies the issues.

A thorough, specific, detailed discussion of these concepts apparently does not take place in any fora I have seen. BB+ has taken a stab at. I wanted to take it down to the concrete, and hope to eke out a set of definitions and criteria for separating "okay-to-copy-and-still-be-original" from "not-okay-to-copy-clone".

Initial definitions:

"Clone" = impermissible derivative via code copying such that the engine is not properly considered the product of a single author.
"Original" = the engine is the sole creation of one author (or team) and qualifies for tournament entry.

In between "Clone" and "Original" and between the multitude of "Originals" there is a great deal of "Transfer of Ideas" and rote "code copying".

What you say above introduces the idea of "engineering advance" to the Hyatt Rule. It seems that you are saying that copying the code of an "engineering advance" does not necessarily define a Clone. How about this:

Hyatt Rule Section 1:
1(a): Rule: The source code of a one-input-one-output function may be copied to an Original even if that source code includes another author's engineering advance that merely increases the speed of the function. [If this is too general, then further specification needed.]
Examples of source code that Original engine authors may copy by rote:
-- popcount implementations
-- all bitscan implementations (?)
-- Repetition Check(), including "Rybka's "pre-detection" method for repetitions when a move is "un-done" from one move to the next (Ng1-f3 one move, then Nf3-g1 the next)"
-- move generator code
-- EGTB probing code
-- SEE
-- UCI/winboard stdin-stdout implementations (?)
-- TBD: other concrete examples ...

1(b): Rule: Attribution ... [TBD]

...

NO it isn't "a fact". There are parts of my eval that are not "bitboard-centric". Low-level stuff is, but not all. Ditto for move generation. One can hide much with macros. And then there is the issue of hashing, which would be the same in the eval whether you use bitboards or not (pawn hash, perhaps king safety as some use). And then we get to critical things like search, move ordering, pruning, reductions, extensions. He could have copied significant parts. And you are overlooking the issue that we now know that he copied parts of Crafty, which _is_ a bitboard program.

At the basic level, nothing can be copied. We have at least agreed previously that everyone can use egtb.cpp if they get Eugene's permission. Ditto for my rotated bitboard move generation (and now Pradu's magic move generation code). Hopefully as this progresses, we can finally formalize a statement about this specific issue. I still like my idea of it is ok to copy F(x) where we have a relationship Y = F(X) where for each unique value of X, there is one and only one unique value Y that the function can return. Move generators. SEE. SAN input/output. Not evaluation, or search, or move ordering, or hashing, or any of a number of other things. Whether everyone will agree to that or not is unknown, but it offers a pretty precise definition that any decent programmer can understand and follow.

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Mar 03, 2011 2:36 pm

hyatt wrote: That is not my code. It is an idea that I "think" comes from "The bit-hacker's guide". ;

Often referred as Brian Kernighan's method:
http://stackoverflow.com/questions/1090 ... it-integer
post 30

Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to him that this method "was first published by Peter Wegner in CACM 3 (1960), 322. (Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)"

hyatt · Post by **hyatt** » Thu Mar 03, 2011 9:02 pm

Once again my head is ready to explode after a post from Gerd.

Well-researched. I only knew for certain that it wasn't my idea and I was not interested in trying to claim credit...

OpenChess

OpenChess

Bits and Pieces

Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces

Re: Bits and Pieces