Page 1 of 1

Originality via a complete re-write!?

Posted: Mon Feb 28, 2011 6:38 am
by BB+
This is somewhat of a philosophical question.

Suppose I take a ~2700 Elo engine (Crafty if you like), and re-write it by replacing all the modules one-by-one, getting in the end something also about ~2700 Elo. In what sense would be this original?

You might ask: why would I do it that way, rather than writing the whole thing from scratch with Crafty as a model? One reason is to avoid bugs. Many engines that are 300-400 Elo have various bugs, and by boot-strapping from a "Crafty chassis", I could ensure that at every step on the module-replacement there was no Elo dive from some bug I introduced. So, should this be considered a legitimate development model, or I have relied too much on Crafty here?

Re: Originality via a complete re-write!?

Posted: Mon Feb 28, 2011 7:28 am
by orgfert
Calling it yours might be a pretty big cheat. Your "from scratch" effort would be able to utilize 16-32 CPUs (at least) out of the gate. But if you respected the license and/or called it a fork, it would probably be acceptable.

Re: Originality via a complete re-write!?

Posted: Mon Feb 28, 2011 3:55 pm
by benstoker
BB+ wrote:This is somewhat of a philosophical question.

Suppose I take a ~2700 Elo engine (Crafty if you like), and re-write it by replacing all the modules one-by-one, getting in the end something also about ~2700 Elo. In what sense would be this original?

You might ask: why would I do it that way, rather than writing the whole thing from scratch with Crafty as a model? One reason is to avoid bugs. Many engines that are 300-400 Elo have various bugs, and by boot-strapping from a "Crafty chassis", I could ensure that at every step on the module-replacement there was no Elo dive from some bug I introduced. So, should this be considered a legitimate development model, or I have relied too much on Crafty here?
Until you all can answer this question, the dialogue will not rise above a conversation about why Tom Cruise divorced Nicole Kidman. Start with this from Hyatt:
While I don't think my old idea of "how much is permissible" is the end-all idea, it is a start.

I have taken the approach to use the following conceptual idea:

If a function that is consistent from input to output, where a move generator is a good example since each position has a fixed number of moves that can be played according to the rules of chess, then perhaps copying that is not a deal-breaker on tournament participation. I can think of other similar pieces of code, such as EGTB probing, where a given position always gives the same answer of mate in N or conversion in N or draw.

Other pieces of code clearly do not follow that concept. An evaluation function is one example. For a given position, there are an infinite number of possible different evaluation outputs. Search also, since one can reduce, extend or prune a move. So those are clearly off-limits. What else?

SEE seems to be the one input one output type of code, with some room for flexibility (do you handle pins, either absolute or not, do you check for legality, etc.) But the options are limited, and one could define, precisely, a SEE() (no pins), or SEEP() (recognizes absolute pins only), SEEAP() which recognizes things like knight pinned on queen by a bishop...

RepetitionCheck()? Rules of chess are quite precise so that might be an ok thing to borrow, although there are quite a few ways to do it (separate hash table, or a repetition list, or something else entirely). But since it is a one input, one output, it seems to fit.

So we are left with the important parts, namely the search (normal and q-search), code that deals with extensions, reductions and pruning, and all the evaluation code. The code that handles time utilization.

What about hashing? Harder case, since this is not a single input -> single output precisely. You can get lots of information from a hash table. A search result. A suggested move. A bound. A hint to avoid doing a null move search. A positional evaluation. Flags such as this was a singular position previously, etc... So maybe hashing is not a one input one output idea, particularly when there are lots of replacement strategies and such.

I think if you apply that question to each function in a chess program, you will find some that are "constant" and some that are "unique". And you might find some one input one output functions inside a non one-input-one-output function. For example, in the evaluation function, you might have a piece of code called HasOpposition(wk, bk, stm). Opposition is well-defined and for any position of the two kings and stm, that function will return T or F. So one might borrow that safely, but not the code where it is used since how you use that could vary all over the place.

One other major point. If you choose to borrow something, which for me would only include the magic move generation stuff (which, by the way does not affect my move generator code at all, I just changed a macro in a very simple way), would be that one must at least provide a proper citation or give credit to the originator of the code that was used. And one would have to abide by the GPL if pieces of a GPL program are borrowed. There is a subtle issue of a GPL program that borrows code from a non-GPL program. That's not supposed to happen, but it does, and it certainly clouds the issue.

The "heart and soul" of a chess engine is the search and evaluation. And copying that code should be a clear no-no. For those one-input-one-output functions, I don't see a problem with them. We already have several common examples. egtb.cpp was originally released as a part of Crafty, written by Eugene Nalimov. Many have used this code. Now MB has released his bitbase stuff and several have used that. Ditto for Pradu's magic move generator idea. Ditto for my rotated bitboard stuff. And we have never seen a program excluded from competition for using those. Nor should we.

There are grey areas such as piece/square tables and others. While there are an infinite number of possible piece/square tables, there are probably fewer than that useful ones. Can someone make a case for any particular piece where there are some obvious numbers and no others make any sense at all? I can't think of one, but that doesn't mean it doesn't exist. Individual scoring numbers? Would two different programmers agree on every scoring term in their evaluation? Can one make a case that fixing one value then dictates the rest if the evaluation is going to be "sane"? Doesn't seem reasonable, but doesn't seem like "impossible" could be proven.

This issue is a very complex one, and it is not going to be a quick and dirty fix I am afraid. It will take some logical/rational discourse and thought to arrive at something that is anywhere near workable.

And now you have answered the question "what is permissible?" That is only the first step, and probably the easiest step although it represents a really significant effort itself. Next is how to validate programs to screen out the bad ones. That is the real can of worms...

Re: Originality via a complete re-write!?

Posted: Mon Feb 28, 2011 5:21 pm
by hyatt
That was actually the "model" I had envisioned when I decided to release the Crafty source back in 1995. WIth the proviso that somethings do not need to be rewritten at all. Move Generation, wrapped around rotated bitboards, is a straightforward process and two different move generators based on rotated bitboards will look quite similar. SAN input and output, ditto. My interest would be in the search and evaluation (and related functions). That's where the creativity comes in and where uniqueness shows up (or at least should show up).

Re: Originality via a complete re-write!?

Posted: Mon Feb 28, 2011 7:59 pm
by kingliveson
BB+ wrote:This is somewhat of a philosophical question.

Suppose I take a ~2700 Elo engine (Crafty if you like), and re-write it by replacing all the modules one-by-one, getting in the end something also about ~2700 Elo. In what sense would be this original?

You might ask: why would I do it that way, rather than writing the whole thing from scratch with Crafty as a model? One reason is to avoid bugs. Many engines that are 300-400 Elo have various bugs, and by boot-strapping from a "Crafty chassis", I could ensure that at every step on the module-replacement there was no Elo dive from some bug I introduced. So, should this be considered a legitimate development model, or I have relied too much on Crafty here?
Base on your description and quoted derivative definitions below, I would say it's a legitimate development model, and what matters most is legality.

http://www.gnu.org/licenses/gpl-3.0.html
To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a “modified version” of the earlier work or a work “based on” the earlier work.
http://www.linfo.org/bsdlicense.html
The most basic definition of a derivative work is a product that is based on, or incorporates, one or more already existing works. This can become a complex issue, particularly with regard to software, but the primary indicator that a software program is a derivative of another program is if it includes source code from the original program, even if the source code has been modified, including improving, extending, reordering or translating it into another programming language.

In many cases, the use of open source code can allow companies to develop products more quickly and with less expense than if they wrote them with entirely original code. The fact that derivative products of BSD-licensed software are not required to be open source can be very useful for developers who want to create commercial products from open source code but who want to keep their modifications and/or extensions secret.