What do you folks make of this ?

General discussion about computer chess...
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: What do you folks make of this ?

Post by hyatt » Thu Jul 01, 2010 9:50 pm

Chris Whittington wrote:
hyatt wrote:
Chris Whittington wrote:
Rebel wrote:Allow me some remarks. I deliberately take the Vas=innocent position for the sake of the discussion.
BB+ wrote: Here is a list that I came up with (again I rely partially on ZW) of "suspicious" things in Rybka:
* Re-use of exact same File/Rank/Line arrays in PST values (as opposed to an "idea" where statics would be built up in this way, but with different numbers)
One can only add the word suspicious if one have checked more programs (say 10-15) and found no such similarities. Perhaps things like these are common in many chess programs. Have Bob, Zach, you, checked 10-15 programs for its absence?
* Time management and UCI parsing, particularly the "0.0" appearance
The 0.0 case is indeed suspicious. UCI parsing: perhaps its code is public domain. Fabien took it and so did Vas. Has this option been researched by Zach, Bob and you?
* Copying of the position at the top of the search (Fruit actually copies it back at every iteration), which is pointless as-is in Rybka (it is copied back after the search, again for no reason). Both search 4 ply when a move is forced. [You can also include setjmp under this "Search Start" heading if you like, and I haven't checked whether Rybka actually copies the position, as opposed to Strelka].
Mine does the same, copy the board before the search starts. It doesn't matter if that is pointless, there are many things in mine that are not in use, they are either remains of previous ideas and forgotten to remove or I leave them there on purpose for future ideas.

Mine searched forced moves 3 plies in the early days until I increased it to 5 in order to have a better move to ponder. I don't see the relevance, 4 is an excellent value also and I suspect most programs do it this way.
* Great similarity in evaluation. Here the "ideas" concept comes into play. Specific things could be the 1:2:4 weighting of minor:rook:queen (why not 3:5:9?) and the identical minutiae with the DrawBishopFlag -- to offset the former, Fruit has a linear interpolation across phases, while Rybka's is more complicated.
Using Vas own words: I took many things. I don't see the relevance.

When I was going through the Fruit-eval I wrote down 2 ideas, code for trapped (white) bishop (h6/a6) and the code for freeing ones rook in Kg1<>Rh1 situations. It's perfectly legal to add these ideas to mine, release it without any breach of the GPL. Again, what's the relevance?
* The use of 10, 30, 60, 100 weighting in passed pawns. If this were a one-off, I could believe the recurrence of this numerology was accidental.
Mine has similar values. It's not suspicious at all.

Ed
To add to this .... Bob has always made a meal out of the setjmp instruction in Rybka and Fruit, claiming nobody else does it.

Well, I used the technique in 1980-something in Z80 assembler to jump out of the search, all you need to do then is reset the stack pointer and you're fine, back at the tree root. I changed to a more 'acceptable' method of unwinding back up the tree after staff programmers made a huge fuss about how uncompliant the technique was. But, Oxford Softworks later licensed a small chess program source for porting to various small platforms and I was surprised to see the programmer using setjmp (ie jump out of the search and reset vital pointers), told him to get 'proper(!)' and he refused, claiming it was just fine. This was a highly educated and qualified university guy who wrote masses of complex stuff in a bunch of fields and was highly reliable for producing bug free code, fast.
Absolutely impossible, because setjmp() and longjmp() are _C_ things. I did things remarkably different in Cray Blitz. We used an iterated search so that when time ran out, we just exited the search function instantly. Doesn't work for recursive implementations, which I'd be willing to bet you didn't do in assembly language due to the inherent messiness. This is about how do you get out of a deep recursive stack of calls as quickly as possible. The setjmp()/longjmp() approach is a horrible solution. Global variables are in an unknown state and you have to clean them up and get them back to "sane". What about locks()? state of a file if you are writing a log? Board state assuming, logically, that no one does copy/make due to performance issues. Just because one person says "it is OK" certainly doesn't mean it is. I've seen lots of university CS faculty that write horrible code, and some that can't write code at all.
Why you think it impossible? The functional equivalent for Z80 of set/longjmp() is reset stack pointer to same as the root, and rebuild the data tables (which are now going to be all messed up) from the board representation previously saved at the root. Z80 had an instruction to load and save the SP (I assume 6502 as well, since Ed confirms). Sounds like what you did with Cray Blitz. I forget the details from 1981/2 but you're almost certainly right there was no recursive stack usage, stuff was indexed by ply and saved/loaded from a ply indexed array.
Because we are talking about comparing C code, not comparing ideas. There _is_ a difference. Jumping out of an iterated search is about as natural an idea as you can come up with. But doing the same for a recursive algorithm is a different animal. And as I have already stated, I have not found this in any program except for fruit/rybka. Whether others do the same or not is not that important. The thing is, setjmp()/longjmp() jump out at me as "a bad programming practice." In two separate programs. Then you look at what is going on in that area of the code, and other things jump out at you as being identical. That certainly defies the "there is a chance these two things were independently done by two different people yet they ended up with the same exact code." It just doesn't happen, except perhaps in the twilight zone.

Nevertheless, this argument is not about how horrible jumping out of search is, or how politically incorrect it is to jump anywhere, but whether or not use of ....jmp() implies copying. We say we have used the jumpout ourselves, know others who have done it, and it is therefore not as rare as you claim. Hence can't be used to imply plagiarism.
Again, I have not said that this idea, _by itself_ is enough to claim anything. I quite clearly stated that seeing the same code in two different programs was unusual enough to trigger further analysis. And the more analysis that was done, the more problematic the codes became.

btw, I assume you consider Fabien to be a really ace programmer? He did produce Fruit after all. He used it, according to you and Zach ;-)
Yes, and it is _still_ a lousy idea. And it always will be. Chess is not "stateless" where you can just jump back to the root and start again, there is a lot of global state that gets modified as you walk thru the tree search. It has to somehow get restored to its original state. Not to mention what the other (independent) threads are doing when that longjmp() is executed. For the record, those threads don't just drop dead instantly.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: What do you folks make of this ?

Post by hyatt » Thu Jul 01, 2010 9:55 pm

Chris Whittington wrote:
zwegner wrote:
BB+ wrote:
Well, in a sense you didn't need to say it because it came across through reading.
You are assuming that the average person read past the first page. ;)

I largely agree with what you say about the ZW analysis.

Here is a list that I came up with (again I rely partially on ZW) of "suspicious" things in Rybka:
* Re-use of exact same File/Rank/Line arrays in PST values (as opposed to an "idea" where statics would be built up in this way, but with different numbers)
* Time management and UCI parsing, particularly the "0.0" appearance
* Copying of the position at the top of the search (Fruit actually copies it back at every iteration), which is pointless as-is in Rybka (it is copied back after the search, again for no reason). Both search 4 ply when a move is forced. [You can also include setjmp under this "Search Start" heading if you like, and I haven't checked whether Rybka actually copies the position, as opposed to Strelka].
* Hash entries: more differences than similarities perhaps, but both start with the same "initial segment" (lock, move, depth, date). The same is true for pawn hash (the size of the entries is not even the same in that case). [Strelka is annoying here, as it does not preserve the Rybka data structures in all cases].
* Great similarity in evaluation. Here the "ideas" concept comes into play. Specific things could be the 1:2:4 weighting of minor:rook:queen (why not 3:5:9?) and the identical minutiae with the DrawBishopFlag -- to offset the former, Fruit has a linear interpolation across phases, while Rybka's is more complicated.
* The use of 10, 30, 60, 100 weighting in passed pawns. If this were a one-off, I could believe the recurrence of this numerology was accidental.
OK, I've needed to look over this again to go over your list, I will probably have some more to add. It will be very much on a as-I-find-the-time basis though. There are a bunch of weird things to add as well--the use of multiple move makers/move generators/move sorters, some of which are more similar to Fruit than others. It's hard to say exactly what happened here though.
Just out of interest, and I'm fairly open on this, although I'ld prefer to keep my rosy view of human nature and that Vas program is quite clean, what are some of the important DIFFERENCES between Fruit and Rybka? Has Vas added stuff, or does your research indicate a straight copy with no significant changes?
One obvious one is bitboards vs an array for the board. Which changes the move generator code, evaluation code, make/unmake code, and such. However, A year or so back I re-numbered the bits in my bitboard stuff to get away from the Cray way of numbering starting at the left, and going to the Intel way of starting from the right. It was a _massive_ change. Every evaluation pattern changed. Every move generator changed. Make/Unmake changed. Etc. Was the new version drastically different? Depends on your definition. Comparing lines of code, the changes were enormous. Comparing each idea, everything was identical (indeed, before the change was final, we required exact node count matches on several hundred test positions.)

I would not think that if someone copies Crafty, converts it back to a normal array-based board, that anyone would consider that an original work, even though the code would look wildly different. The analysis would be eerily similar. Changing a data structure does not create a new and original program. There are certainly eval differences, things Vas added or removed. Can someone copy my eval, and then add code and delete code and call this new thing "theirs"?? That's the problem here.


Second, what is your view on both the ethics and legality of a parallel development process where the INTENTION is to end up with a piece of work that is entirely self-written, both code and data, but perhaps uses a target program (say Fruit) as a testbed to verify and bugcheck the component functions - having done that, then improve, add, optimise etc. Is that ok in your book, all code and all data different?
Have not given it much thought. If you mean that we take a program, say Crafty to change the subject, and then write a new move generator to replace the current one, and once debugged, we save it, and then we replace the eval with a new one.. And repeat until everything is new, I suppose that would be perfectly acceptable. Using someone's code during development is not quite the same as using it in the "final product."

If you mean something else, please explain...

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: What do you folks make of this ?

Post by hyatt » Thu Jul 01, 2010 9:58 pm

CPU wrote:
hyatt wrote: When someone says "There is zero fruit code in Rybka 1" I take the common definition of "zero". 0.000, not "a little" or "not very much".
Where did he say that?
There have been _several_ such quotes on the Rybka forum. They have occasionally been copied over to CCC.

CPU
Posts: 17
Joined: Thu Jun 10, 2010 2:43 pm

Re: What do you folks make of this ?

Post by CPU » Thu Jul 01, 2010 11:08 pm

hyatt wrote:
CPU wrote:
hyatt wrote: When someone says "There is zero fruit code in Rybka 1" I take the common definition of "zero". 0.000, not "a little" or "not very much".
Where did he say that?
There have been _several_ such quotes on the Rybka forum. They have occasionally been copied over to CCC.
I am not trying to be difficult here, but could you point me to one example? I've been searching both forums, and the closest quote I could find is:
Rybka is and always was completely original code, with the exception of various low-level snippets which are in the public domain.
Here he is actually saying that some code is NOT original. Sounds completely honest to me...

User avatar
kingliveson
Posts: 1388
Joined: Thu Jun 10, 2010 1:22 am
Real Name: Franklin Titus
Location: 28°32'1"N 81°22'33"W

Re: What do you folks make of this ?

Post by kingliveson » Thu Jul 01, 2010 11:10 pm

I liken the counter argument to defense attorneys whose client robbed a bank and they base entirety of their case disputing amount stolen -- that is, the case ought to be dismissed because the amount allegedly stolen is incorrect -- or the amount is just too insignificant. In no way shape or form am I suggesting Vas to be a criminal; lets make that clear. Without equivocation and 100% certainty, Rybka contains verbatim code unique only to Fruit. I think he could have been a little more upfront when this debate 1st surfaced. If one should press on, it then becomes as if there is an agenda, and you are labeled anti-so-and-so.
PAWN : Knight >> Bishop >> Rook >>Queen

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: What do you folks make of this ?

Post by hyatt » Fri Jul 02, 2010 12:57 am

CPU wrote:
hyatt wrote:
CPU wrote:
hyatt wrote: When someone says "There is zero fruit code in Rybka 1" I take the common definition of "zero". 0.000, not "a little" or "not very much".
Where did he say that?
There have been _several_ such quotes on the Rybka forum. They have occasionally been copied over to CCC.
I am not trying to be difficult here, but could you point me to one example? I've been searching both forums, and the closest quote I could find is:
Rybka is and always was completely original code, with the exception of various low-level snippets which are in the public domain.
Here he is actually saying that some code is NOT original. Sounds completely honest to me...

I do not participate at the Rybka forum. All I can quote are examples that have been cut&pasted to CCC. On several different occasions.

BTW a "snippet" generally does not cover dozens or hundreds of lines of code. Many have copied the spinlock code from Crafty. That is a "snippet". Ditto for the asm code to do MSB()/LSB(). The eval code, pc/sq tables, search code, that stuff is not a "snippet".

JCoit
Posts: 7
Joined: Fri Jun 11, 2010 1:34 am

Re: What do you folks make of this ?

Post by JCoit » Fri Jul 02, 2010 1:50 am

Bob, I know you don't fulfill requests like this, but could you please have a couple of your grad students produce a 'new' engine using 'snippets of code' from the strongest engines? I'd like to see the ELO you guys could get...could be impressive I'd think.

-James

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: What do you folks make of this ?

Post by hyatt » Fri Jul 02, 2010 2:05 am

JCoit wrote:Bob, I know you don't fulfill requests like this, but could you please have a couple of your grad students produce a 'new' engine using 'snippets of code' from the strongest engines? I'd like to see the ELO you guys could get...could be impressive I'd think.

-James
I'm having enough fun working on Crafty. Have lots of ideas to test, if we can solve a cluster A/C problem to make it reliable enough to pump out some results...

BB+
Posts: 1484
Joined: Thu Jun 10, 2010 4:26 am

Re: What do you folks make of this ?

Post by BB+ » Fri Jul 02, 2010 7:36 am

ZW has already addressed some of this, but I enlarge on some points.
One can only add the word suspicious if one have checked more programs (say 10-15) and found no such similarities. Perhaps things like these are common in many chess programs. Have Bob, Zach, you, checked 10-15 programs for its absence?
The exact numerology is much beyond a similarity (as I mentioned, merely having the same Rank/File/Line centralisation strategy for PST is a different bailiwick than having the same numbers from them). I can't find any engines other than Fruit and Rybka whose PST values are derived from (minor exceptions with Rybka in central pawns):

Code: Select all

static const int PawnFile[8] = {-3, -1, +0, +1, +1, +0, -1, -3,};
static const int KnightLine[8] = {-4, -2, +0, +1, +1, +0, -2, -4,};
static const int KnightRank[8] = {-2, -1, +0, +1, +2, +3, +2, +1,};
static const int BishopLine[8] = {-3, -1, +0, +1, +1, +0, -1, -3,};
static const int RookFile[8] = {-2, -1, +0, +1, +1, +0, -1, -2,};
static const int QueenLine[8] = { -3, -1, +0, +1, +1, +0, -1, -3,};
static const int KingLine[8] = {-3, -1, +0, +1, +1, +0, -1, -3,};
static const int KingFile[8] = { +3, +4, +2, +0, +0, +2, +4, +3,};
static const int KingRank[8] = { +1, +0, -2, -3, -4, -5, -6, -7,};
I can imagine someone with the same idea producing 4 or 5 of these arrays that are the same, but not all 9.
Mine has similar values. It's not suspicious at all. [on the 10, 30, 60, 100 scaling]
I think the point again is that the relative scaling is exactly the same, even to the matching of the output of the quad() function of Fruit. For instance, if I thought of 10-30-60-100 scaling, and took 4821 as my PassedOpeningMax and just rounded, I would get 482, 1446, 2893, 4821, not the 489, 1450, 2900, 4821 from the Fruit quad function (the differences appear in how the rounding in Fruit works, as 26 is not exactly 256/10) as appears in Rybka. And again there is not just one example of an array formed by Fruit code, but 5 or 6 in this genre. With both this and PST you can certainly argue that there is no "copyright violation", but my impression is that the standard in (say) WCCC events for "derivatives" is higher. Having two guys independently use 10, 30, 60, 100 scaling is reasonable, while its "hex-approximation" of 26, 77, 154, 256 is less likely. The use of rounding (rather than the floor function) with the quad() output is again something that needn't have been exactly the same if one merely took the idea in the abstract. The abstract idea of 10-30-60-100 scaling for various passed pawn elements appears in Rybka as exactly the output of the quad() function of Fruit, with there being no particular reason (and indeed, probably at best a 10% chance of a "random" implementation of this idea doing the same) for the minutiae to be identical (the most "obvious" in this sense is PassedFree array as { 0, 0, 0, 101, 300, 601, 1000, 1000 } -- from a human standpoint, clearly Rybka derived these numbers "mechanically", rather than a human writing them down as 100,300,600,1000 , and the Fruit quad() function is a mechanism that produces this somewhat peculiar output -- by this point, it should up to a defense to explain what alternative mechanism would yield the same numbers, not just similar ones).
UCI parsing: perhaps its code is public domain. Fabien took it and so did Vas. Has this option been researched by Zach, Bob and you?
You have transferred the burden of proof to the impossibility of proving a negative -- surely it is any putative defense's responsibility to provide such evidence that the code actually is public domain, not vice-versa?
Mine does the same, copy the board before the search starts. It doesn't matter if that is pointless, there are many things in mine that are not in use, they are either remains of previous ideas and forgotten to remove or I leave them there on purpose for future ideas.
The point of this was to note that there were three independent features in both Fruit/Rybka at the start of the search (position copying, 4 ply limit, setjmp), along with the time management. Of course you can argue that any one of the three is an accident. Again I feel that the preponderance of evidence has reached the point where it is the responsibility of the accused to rebut this laundry list by noting some other (independent) engine that replicates each of these three (or four, if you include time management here) elements [preferably in the same order], rather than trying chip away at each point individually [I think the derogatory phrase for this is "lawyering", usually done to impress the jury with useless addenda rather than the gravamen of the debate].
I deliberately take the Vas=innocent position for the sake of the discussion.
I don't quite know what the connotation of "innocent" is here, but my impression is that most who have taken a similar position seem to have been quickly reduced to a variety of bafflegab and misdirection (or ad hominem) in their explanations of the evidence. There is notable evidence of "copying" in Rybka 1.0 Beta beyond mere "ideas" (the nine specific arrays re-used in the generation of engine-specific PST, for instance ), this copying appears to go beyond what is (or has been) acceptable in the field of computer chess/games [and I strongly feel that this is the proper standard to use for "originality", as opposed to a "legalistic" one, where perhaps only "code" is considered], and I personally find the attempts to dismiss this all as happenstance (or "unimportant") to be a bit outré. That being said, there is also a tendency to exaggerate the Rybka/Fruit connection in some other circles. [And by now, I think Rybka 4 has almost zero connection to the Fruit origins].

User avatar
Chris Whittington
Posts: 437
Joined: Wed Jun 09, 2010 6:25 pm

Re: What do you folks make of this ?

Post by Chris Whittington » Fri Jul 02, 2010 11:56 am

hyatt wrote:
Chris Whittington wrote:
zwegner wrote:
BB+ wrote:
Well, in a sense you didn't need to say it because it came across through reading.
You are assuming that the average person read past the first page. ;)

I largely agree with what you say about the ZW analysis.

Here is a list that I came up with (again I rely partially on ZW) of "suspicious" things in Rybka:
* Re-use of exact same File/Rank/Line arrays in PST values (as opposed to an "idea" where statics would be built up in this way, but with different numbers)
* Time management and UCI parsing, particularly the "0.0" appearance
* Copying of the position at the top of the search (Fruit actually copies it back at every iteration), which is pointless as-is in Rybka (it is copied back after the search, again for no reason). Both search 4 ply when a move is forced. [You can also include setjmp under this "Search Start" heading if you like, and I haven't checked whether Rybka actually copies the position, as opposed to Strelka].
* Hash entries: more differences than similarities perhaps, but both start with the same "initial segment" (lock, move, depth, date). The same is true for pawn hash (the size of the entries is not even the same in that case). [Strelka is annoying here, as it does not preserve the Rybka data structures in all cases].
* Great similarity in evaluation. Here the "ideas" concept comes into play. Specific things could be the 1:2:4 weighting of minor:rook:queen (why not 3:5:9?) and the identical minutiae with the DrawBishopFlag -- to offset the former, Fruit has a linear interpolation across phases, while Rybka's is more complicated.
* The use of 10, 30, 60, 100 weighting in passed pawns. If this were a one-off, I could believe the recurrence of this numerology was accidental.
OK, I've needed to look over this again to go over your list, I will probably have some more to add. It will be very much on a as-I-find-the-time basis though. There are a bunch of weird things to add as well--the use of multiple move makers/move generators/move sorters, some of which are more similar to Fruit than others. It's hard to say exactly what happened here though.
Just out of interest, and I'm fairly open on this, although I'ld prefer to keep my rosy view of human nature and that Vas program is quite clean, what are some of the important DIFFERENCES between Fruit and Rybka? Has Vas added stuff, or does your research indicate a straight copy with no significant changes?
One obvious one is bitboards vs an array for the board. Which changes the move generator code, evaluation code, make/unmake code, and such. However, A year or so back I re-numbered the bits in my bitboard stuff to get away from the Cray way of numbering starting at the left, and going to the Intel way of starting from the right. It was a _massive_ change. Every evaluation pattern changed. Every move generator changed. Make/Unmake changed. Etc. Was the new version drastically different? Depends on your definition. Comparing lines of code, the changes were enormous. Comparing each idea, everything was identical (indeed, before the change was final, we required exact node count matches on several hundred test positions.)

I would not think that if someone copies Crafty, converts it back to a normal array-based board, that anyone would consider that an original work, even though the code would look wildly different. The analysis would be eerily similar. Changing a data structure does not create a new and original program. There are certainly eval differences, things Vas added or removed. Can someone copy my eval, and then add code and delete code and call this new thing "theirs"?? That's the problem here.


Second, what is your view on both the ethics and legality of a parallel development process where the INTENTION is to end up with a piece of work that is entirely self-written, both code and data, but perhaps uses a target program (say Fruit) as a testbed to verify and bugcheck the component functions - having done that, then improve, add, optimise etc. Is that ok in your book, all code and all data different?
Have not given it much thought. If you mean that we take a program, say Crafty to change the subject, and then write a new move generator to replace the current one, and once debugged, we save it, and then we replace the eval with a new one.. And repeat until everything is new, I suppose that would be perfectly acceptable. Using someone's code during development is not quite the same as using it in the "final product."

If you mean something else, please explain...
You've half described what I mean. I'ld call it a parallel development piggy backing on the original work.

What I had more in mind though was that this could be done also to create a similar or identical series of black boxes, a move module, a movegen module and eval module where the output was the same as the original for a given input, but the black box used entirely different code and data to do the work. Maybe it would used the original data as a placeholder, and then change that later. With a bit of order shuffling of internal elements, and then addition/deletion of some idea chunks, the final product would bear no apparent relation to the original piggy-backed program and there could be no legal challenge. As you say: Using someone's code during development is not quite the same as using it in the "final product."

Seems to me that the data similarities and other similarities described by ZW are not incompatible with this approach having been used, and, in that case, the critical issue is the INTENT of the programmer. Does he want it to be utterly clean, all his own work, or is he not too bothered?

Post Reply