On Dalke

Code, algorithms, languages, construction...
Post Reply
User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: On Dalke

Post by User923005 » Wed Feb 29, 2012 3:28 am

syzygy wrote:
User923005 wrote:The string examples show very plainly that different source {IOW, source that is coded by different people but using the same algorithm} which is obviously not a source level copy can produce identical assembly language because the algorithm is the same. In the case of Rybka, even that cannot possibly be a perfect match because the underlying representation is different. The "semantic equivalence" that has been demonstrated is nothing more than showing similar algorithms for small snippets of code, which is exactly the same thing that we would see if we examined, for instance, two LMR implementations or two alpha-beta implementations which came from studying the same original versions.
So now I secretly take a decompiler, decompile Word.exe, and recompile it. I publish the resulting executable, claiming that I wrote it myself based on the same algorithms that Microsoft used.

Will I walk away free?

Assuming I walk away free, is it because:
(1) it is impossible to prove that I did not write it myself from scratch, or is it because
(2) the object code is in fact nothing more nothing less than the algorithm, so free of copyright?

Maybe your answer is: (1) and not (2). In that case, my Word.exe executable in principle shares sufficient creative aspects with the original Word.exe, but I have the defense of independent creation. However, in such a case (i.e. common creative aspects in the expression of the algorithm) it is the alleged infringer who bears the burden of proof for showing that there was independent creation.

But of course copyright law is more or less irrelevant here. What is important is rule 2. What is also important is that Vas was given the opportunity to clarify things, but did not bother.
I am doing this because that is exactly what has been demonstrated. Code copying has not been demonstrated.
As you very well know, the accusation is that of the copying of code, not the copying of algorithms. What you keep repeating has no basis in the facts.

You are contesting the factual finding that there was copying of code. So you are contesting the factual findings, not the procedure.
Re: The Evidence against Rybka

by BB+ » Mon Jul 18, 2011 1:24 pm

It appears that phrase "copying" is now being subjected to semantic gymnastics. So my claims are:

*) The Rybka 1.0 Beta executable contains no literally copied evaluation code from Fruit 2.1.
*) Rybka 1.0 Beta contains sufficiently much creative expression from the Fruit 2.1 evaluation code so as to transgress the ICGA Rules.
*) The question of whether and to what extent Rybka 1.0 Beta and later versions infringe the copyright of Fruit 2.1 will be the subject of future civil action. My own opinion is that this is closer in spirit to the second point.

Personally, if I heard the word "copy", I would not parse it as being limited to literal copying.

Regarding the first point, there is more than one instance of literally copied code from Fruit 2.1 in Rybka 1.0 Beta, any one of which should suffice for probative similarity in a copyright action.

I've been trying to find a list of court decisions that cite the Abstraction-Filtration-Comparison Test, but have been unable to do so as of yet. The best I found (with highs/lows of its applicability) was http://www.ladas.com/Patents/Computer/S ... twa06.html

Another useful reference place starts at http://digital-law-online.info/lpdi1.0/treatise21.html

[...] it is of course essential to any protection of literary property ... that the right cannot be limited literally to the text, else a plagiarist would escape by immaterial variations.


- - By zwegner (***) Date 2011-07-16 00:31
While I certainly wouldn't want to stop everyone from arguing needlessly, I'd just like to add my two cents on where Crafty/pre-beta Rybkas fit into this.

Whether pre-beta Rybkas violated Crafty's license is pretty irrelevant to me. The only people that were potentially harmed were the beta testers. The only significance is showing the pedigree of Rybka. Vas said before that Rybka started in 2003, and he had never done a rewrite from scratch. This seemed incredulous to me--why was so much of the base structure so similar to Fruit (UCI parser etc.)? Once the pre-beta Rybkas were made available, it was basically proven that Vas' statement was a lie (I would seriously doubt anyone's mental ability if they disputed this). While Mark and I only looked at portions of the early Rybkas (I spent maybe a couple weeks on it), we were unable to find any code that they shared with R1.

Coming back to R1/Fruit, yes, if you look at each example in detail, you can't say there is much hard evidence of direct code copying. As I said to Vas, I'm only completely certain that three characters were copied ("0.0"). But given the entire picture (how similarities to Fruit are all over the place, and the previous Rybka versions shared no code) it's just so obvious to me that Vas took Fruit as a base and rewrote things on top of it, presumably until he felt it was "clean". For all the debate, I'm really puzzled why people (who are sufficiently knowledgeable) would dispute this. A lot of talk has been about the evaluation, and while that is interesting from an originality standpoint, it's perfectly reasonable to say that it's OK as long as no code copied. But once the pre-beta Rybkas were released, it was all over. The pre-beta Rybkas were IMO the lynchpin of the entire case--no wonder Vas was pissed when they were leaked (once again, huge credit to Olivier). After that, it became very hard to claim that Rybka didn't start life as Fruit. In fact, I would tend to doubt their mental ability as above, but since so many seem to have differing opinions, I'll just say I disagree :)

From the above, Zach said, "Coming back to R1/Fruit, yes, if you look at each example in detail, you can't say there is much hard evidence of direct code copying."
From the above, Mark said, "*) The Rybka 1.0 Beta executable contains no literally copied evaluation code from Fruit 2.1."

Further:
I would like to add that Rybka 1.0 has never participated in an ICGA event.
I would like to add that Rybka 1.6.1 has never participated in an ICGA event.

Any information used to convict Vas of violations in an ICGA event should come from programs that actually participated in one.

I do realize that there is a huge clump of people who claim that Vas has engaged in "non-literal copying".
I call that using the same algorithms.

Whether Vas has done something more than that should be actually proven.
It certainly has not been proven yet.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: On Dalke

Post by User923005 » Wed Feb 29, 2012 5:10 am

BB+ wrote:I have three works:
A: The Fruit 2.1 source and object code (from a given compiler).
B: A suitable emulator E1 and the above Fruit 2.1 object code.
C: A suitable emulator E2 and the Fruit 2.1 object code suitably scrambled so as to avoid detection.
Mostly what C does is change the underlying byte-code representation (eg, 6502->68000 in olden days). Given that I don't need a universal emulator, and can thus write one suited to the purpose, I might be able to get within a reasonable factor of speed with these emulators.

I think most agree that B infringes upon A due to literal copying (of the object code). According to my understanding of the arguments of others, C does not infringe upon A. Somehow this seems undesirable to me. I would rather it be that the C combination is just a "translation" for copyright infringement. [I also think the emulation in the above example will likely fool BinDiff (into giving what to me is a "false" negative), whereas decompiling, translating to a different language, and then recompiling would probably not].
I would not claim that C does not infringe. However, I would claim that disassembly would not be suitable to use as proof of infringement.
I suspect that C would play identical moves to Fruit 2.1, given identical depths, which could be grounds for legal investigation of some sort.

It is, of course, possible to fool the legal system such that it would be difficult or impossible to prove infringement even from source code that really does infringe because of the methodology of information transfer (e.g. editing the original code again and again until it is no longer recognizable as the original code). I don't know what to do about that.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: On Dalke

Post by hyatt » Wed Feb 29, 2012 6:56 pm

User923005 wrote:
syzygy wrote:
User923005 wrote:I hold no such position in either case. These are what is know as "straw man" arguments.
Mine weren't even arguments, let alone straw man arguments. I was only trying to make sense of your arguments.

You seem to find your string operations example very important:
User923005 wrote:As a suggested exercise, examine the C and Assembly language versions of the string operations I put here in a previous post in this thread.

You will see that the compiler has removed all the syntactic sugar from the code and left only the bare algorithm behind as assembly languge. Examine the C code carefully. I happen to know that these versions were produced independently by various authors at the same time. You can see by looking at the C code, that the code is not identical. While it is similar, this is certainly due to the task at hand (there are only so many ways that you can accomplish that simple string task). However, upon running the compiler against the C code, the syntactic sugar is squeezed out, leaving the bare assembly language algorithm behind. I think you will see upon examination of the assembly language that the assembly language produced is for all intents and purposes identical.

It is important to understand that the algorithm is not protected by a copyright -- only the implementation. This is, of course, exactly as it should be.
So what exactly does this prove then?

I think what you are trying to say is that for this example of yours, there is no "copyrightable" distinction between the object code obtained from compiling the first version and the object code obtained from compiling the second version. I think you are even implying that essentially any source code implementing these string operations will compile to essentially the same object code. Your conclusion seems to be that based on the object code, you can't say anything regarding the originality of the source code. This is because all originality was weeded out by the compiler. The object code corresponds to the algorithm and is unprotectable by copyright.

So far this is all fine and dandy. I do not disagree. I don't think anyone disagrees.

However, what is the point of your examples? I must assume that you are not mentioning them for nothing. Indeed, you seem to be putting great weight on them.

I can make sense of why your are giving these examples if I assume that you consider these examples to be representative of all algorithm / source code / object code relationships, or at least for those related to chess programming. If these examples are indeed representative, that would mean that object code in general is free from the copyright on the source code from which the object code was compiled. However, in general, that is complete nonsense.

The other possibility is that you were NOT trying to imply that the string operation examples were representative. In that case, I can only conclude that you were bringing them forward as a straw man argument.
The string examples show very plainly that different source {IOW, source that is coded by different people but using the same algorithm} which is obviously not a source level copy can produce identical assembly language because the algorithm is the same. In the case of Rybka, even that cannot possibly be a perfect match because the underlying representation is different. The "semantic equivalence" that has been demonstrated is nothing more than showing similar algorithms for small snippets of code, which is exactly the same thing that we would see if we examined, for instance, two LMR implementations or two alpha-beta implementations which came from studying the same original versions.

Since everyone who writes a high end program uses LMR, can we therefore conclude that all of them have also violated the second rule?

Don't make lame attempts at being disingenuous. LMR is not a "function". It is a few lines of code spread over thousands of lines of code. And you know that. ICGA rule two is not about using the ideas from other programs. It is about being DERIVED from other programs by such things as copy/modify (non-literal copying) or actual copy/paste (literal copying). Your string functions are pointless. If you look at C that is only a few lines long, the chances of duplication are high. How many lines are in a chess program evaluation??? Or in the search? Or in the move ordering? or in move generation???

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: On Dalke

Post by hyatt » Wed Feb 29, 2012 6:58 pm

User923005 wrote:
syzygy wrote:
User923005 wrote:If (indeed) Rybka has copied algorithms {and I agree, it has} then, by the letter of the law we might say that Rybka is guilty of violating point 2 of the ICGA agreement.
Nobody involved in all of this has ever stated that this rule prohibited the copying of ideas. Yet you keep repeating this point. I'll refrain from trying to guess why your are doing this.
I keep repeating this because it is exactly what has been demonstrated.
We are all in agreement that we have not shown the source to be identical.

So? If you copy a book written in English, and translate it to Spanish, can you sell it as your own work? It certainly won't be identical to the original. This is a bogus (strawman) argument.

We certainly proved that the early versions of Rybka were IDENTICAL copies of Crafty... But with fruit, the board representation has changed. Doesn't mean a thing with regard to copying or not.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: On Dalke

Post by hyatt » Wed Feb 29, 2012 7:01 pm

User923005 wrote:
hyatt wrote:
User923005 wrote:Algorithms, indeed, do fall under patent protection. If an algorithm has a patent, it cannot be used without permission.

I doubt if Fabian or Robert have patented any of their algorithms, but I am certainly ready to be corrected on that.
Computer algorithms can not be patented. Been the case for 30+ years now.
Strike that, reverse it.
As far as different ways of doing things, the C library changes all the time. Check out the change to strcpy() that broke the 64 bit adobe flash player we all use. Things change regularly.
As a suggested exercise, examine the C and Assembly language versions of the string operations I put here in a previous post in this thread.

You will see that the compiler has removed all the syntactic sugar from the code and left only the bare algorithm behind as assembly languge. Examine the C code carefully. I happen to know that these versions were produced independently by various authors at the same time. You can see by looking at the C code, that the code is not identical. While it is similar, this is certainly due to the task at hand (there are only so many ways that you can accomplish that simple string task). However, upon running the compiler against the C code, the syntactic sugar is squeezed out, leaving the bare assembly language algorithm behind. I think you will see upon examination of the assembly language that the assembly language produced is for all intents and purposes identical.

It is important to understand that the algorithm is not protected by a copyright -- only the implementation. This is, of course, exactly as it should be.

I think you will also find that other people's alpha beta implementations, their LRM implementations, their null move implementations, etc. closely resemble one another when viewed from the level of the generated assembly language. A close inspection will reveal, for basically almost any pair of programs that implement these things, "semantic equivalence" which means that the programs are performing the same steps. Of course, that is *exactly* what an algorithm is.
That's a challenge one can address directly and simply. How about we use (say) Crafty and Fruit? The easiest comparison would be to compile just the search function for both, and then use hexrays to decompile it so that it would be easier for non-asm folks to read. If the asm is "close" the decompiled C should be "close" agreed? You might be in for a HUGE shock, however.

So, the logical step is to throw out every result from every ICGA contest that has ever been held. In addition, a press release should be sent to various publications, informing one and all that all persons who have ever entered an ICGA event are both liars and cheaters.

It is clear to me that many so-called computer scientists have no idea what an algorithm is. The *EXACT* steps of an algorithm are *NOT* protected by copyright. It is only the actual implementation that is protected. It is not like a story book where specific plot details cannot be identical.

Keep in mind that none of this is any sort of proof that Vas is innocent. It is only a demonstration that what the evidence *proves* is that Vas has done what everyone else has done. That is to say, he has used the algorithms of other programs in his program. This is, of course, a very good idea because it is stupid to reinvent the wheel every time you build a car.

I do not discount the possiblity that Vas has done something bad. He may well have. However, the evidence presented is entirely unconvincing to me.
The decompiled C may not resemble the original source very closely. Especially things like switch statements and other control flow ideas will be mangled to whatever the compiler thought was fastest. Tail recursion may have been eliminated. We are sure to see constant folding and constant lifting, etc. So a decompiler may produce code that produces equivalent results but it will not produce the same source in the case of compiled high level languages. Things that use pcode interpreters like Java and dotnet languages will produce output that is far more similar to the original upon decompilation.

First, my statement about "You can not patent a computer algorithm" is not a guess. It is a simple statement of fact. Look up US patent law if you wish, or I can give you the exact quote if you prefer.

Second, do you know anything about any sort of compiler, java or not? One can optimize Java just as well as one can optimize any other programming language. The Java virtual machine is still, from the outside, "a computer that executes instructions." Don't know where you get some of your comments, but you might want to consider abandoning your source and finding one that is better...

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: On Dalke

Post by User923005 » Wed Feb 29, 2012 8:23 pm

hyatt wrote: First, my statement about "You can not patent a computer algorithm" is not a guess. It is a simple statement of fact. Look up US patent law if you wish, or I can give you the exact quote if you prefer.

Second, do you know anything about any sort of compiler, java or not? One can optimize Java just as well as one can optimize any other programming language. The Java virtual machine is still, from the outside, "a computer that executes instructions." Don't know where you get some of your comments, but you might want to consider abandoning your source and finding one that is better...
According to:
http://en.swpat.org/wiki/United_States_ ... ark_Office
We have this:
"According to a 2004 paper by Bessen and Hunt, the USPTO approves about 70 software patents per day.(see page 47) Other sources have said that in 2006 the USTPO granted just over 40,000 software patents,[1] which is 110 per day, seven days per week.

The USPTO is an agency of the US government's Department of Commerce. "

I know someone with a software patent (which, by the way, I do not approve of).

See also:
http://en.wikipedia.org/wiki/Software_patent
http://www.bitlaw.com/software-patent/history.html

As to decompilation of Java and dotnet, I have done it many times and the result is incredibly similar to the original code, but without the original comments. Often, even the original variable names are retained.
Give jad a try on java classes:
http://en.wikipedia.org/wiki/JAD_%28JAva_Decompiler%29

Give reflector a try on dotnet:
http://www.reflector.net/

The results are fabulous and I can often compile and run the results with very little tinkering. Reflector even writes out the project files for you.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: On Dalke

Post by User923005 » Wed Feb 29, 2012 8:45 pm

hyatt wrote: Don't make lame attempts at being disingenuous. LMR is not a "function". It is a few lines of code spread over thousands of lines of code. And you know that. ICGA rule two is not about using the ideas from other programs. It is about being DERIVED from other programs by such things as copy/modify (non-literal copying) or actual copy/paste (literal copying). Your string functions are pointless. If you look at C that is only a few lines long, the chances of duplication are high. How many lines are in a chess program evaluation??? Or in the search? Or in the move ordering? or in move generation???
LMR could be coded as a function. It is a small amount of code, as are other things that everyone borrows from Fruit and Stockfish. Of course, these tiny patches are very significant in making programs play stronger chess.
The documents against Rybka used to show ICGA violations used programs that never competed in any ICGA event (Rybka 1.0, Rybka 1.6.1). While there may have been some wrongdoing involved with these particular programs (I remain unconvinced at least in the Rybka 1.0 case) it has no bearing on the ICGA events where these particular programs did not compete.

The patches of similarity that were demonstrated were no bigger than an LMR implementation.

The string algorithms demonstrate clearly that different implementations of the same algorithm that are obviously original can be made to look identical by the action of the compiler in stripping off all of the high level language's syntactic sugar.

There are hundreds of thousands of lines of code in a chess program. Perhaps even millions in some of them. The binary for Rybka is about 8 megabytes, and the size of a binary is typically smaller than the size of the original code base, so we can conclude that Rybka has an enormous code base. The patches of similarity that were demonstrated from programs that never even competed in any ICGA even were tiny, trivial patches. Even those little patches are in dispute by intelligent chess programmers like Miguel Ballicora, Chris Whittington and Ed Schroder.

Rybka was convicted on flimsy evidence gathered from programs that did not even compete in the contests in question. It makes no sense whatsoever.

syzygy
Posts: 148
Joined: Sun Oct 16, 2011 4:21 pm

Re: On Dalke

Post by syzygy » Wed Feb 29, 2012 9:44 pm

User923005 wrote:
(1) it is impossible to prove that I did not write it myself from scratch, or is it because
Nothing is impossible.
Ok, I should have added "on the basis of the compiled object code". I will take this as your answer:
User923005 wrote:I would not claim that C does not infringe. However, I would claim that disassembly would not be suitable to use as proof of infringement.
It seems to be your position that object code is not suitable to use as proof of infringement.

This is of course wrong. The object code itself will normally be infringing, i.e. the Word.exe that I produced and Microsoft's Word.exe share a sufficient number of creative aspects (my only defense left being that of independent creation). That two works (e.g. two object code files) share creative aspects in the sense of copyright law BY DEFINITION can be determined based on the works themselves. If no creative aspect can be seen in a particular work, there is simply no copyright on the work (or rather, it is not a "work" in the sense of copyright law).
User923005 wrote:
However, in such a case (i.e. common creative aspects in the expression of the algorithm) it is the alleged infringer who bears the burden of proof for showing that there was independent creation.
The burden of proof lies upon the accusers and not the accused.
The burden of proof for a defense lies upon the one relying on the defense. The precise legal details will depend on the jurisdiction, but one way or another sufficient similarity between an allegedly copied work and its alleged original will effectively cause the burden of proof for showing independent creation to lie on the alleged copier.
I do realize that there is a huge clump of people who claim that Vas has engaged in "non-literal copying".
I call that using the same algorithms.
So your argument goes like this:
- People claim that Vas has engaged in "non-literal copying".
- I call that using the same algorithms.
- Therefore people claim that Vas has engaged in using the same algorithms.
- Using the same algorithms is fine.
- Therefore those people have no case against Vas.

I can do the same.
- People claim that person X has engaged in cutting the throat of person Y.
- I call that using the same algorithms.
- Therefore people claim that person X has engaged in using the same algorithms.
- Using the same algorithms is fine.
- Therefore those people have no case against person X.

You're making up new meanings for known terms.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: On Dalke

Post by User923005 » Wed Feb 29, 2012 10:54 pm

syzygy wrote:
User923005 wrote:
(1) it is impossible to prove that I did not write it myself from scratch, or is it because
Nothing is impossible.
Ok, I should have added "on the basis of the compiled object code". I will take this as your answer:
User923005 wrote:I would not claim that C does not infringe. However, I would claim that disassembly would not be suitable to use as proof of infringement.
It seems to be your position that object code is not suitable to use as proof of infringement.
I do not make this claim and I have never made this claim. However, to use object code as proof of infringement, there should be long stretches of identical code. Otherwise, we must examine the source code to know for sure.
This is of course wrong. The object code itself will normally be infringing, i.e. the Word.exe that I produced and Microsoft's Word.exe share a sufficient number of creative aspects (my only defense left being that of independent creation). That two works (e.g. two object code files) share creative aspects in the sense of copyright law BY DEFINITION can be determined based on the works themselves. If no creative aspect can be seen in a particular work, there is simply no copyright on the work (or rather, it is not a "work" in the sense of copyright law).
User923005 wrote:
However, in such a case (i.e. common creative aspects in the expression of the algorithm) it is the alleged infringer who bears the burden of proof for showing that there was independent creation.
The burden of proof lies upon the accusers and not the accused.
The burden of proof for a defense lies upon the one relying on the defense. The precise legal details will depend on the jurisdiction, but one way or another sufficient similarity between an allegedly copied work and its alleged original will effectively cause the burden of proof for showing independent creation to lie on the alleged copier.
The prosecution has to prove the defendant is guilty.
I do realize that there is a huge clump of people who claim that Vas has engaged in "non-literal copying".
I call that using the same algorithms.
So your argument goes like this:
- People claim that Vas has engaged in "non-literal copying".
- I call that using the same algorithms.
- Therefore people claim that Vas has engaged in using the same algorithms.
- Using the same algorithms is fine.
- Therefore those people have no case against Vas.
I claim that unless the code is identical, it can be difficult to tell is someone has written their own version or done edits on the original source code. Writing their own version that does the same thing is not wrong. Using the original code without permission is wrong. I do not see how a few tiny snippets of assembly that partially match and partially do not match can be used to prove what you claim. In fact, I find your position literally absurd.
I can do the same.
- People claim that person X has engaged in cutting the throat of person Y.
- I call that using the same algorithms.
- Therefore people claim that person X has engaged in using the same algorithms.
- Using the same algorithms is fine.
- Therefore those people have no case against person X.

You're making up new meanings for known terms.
You are convicting someone with the flimsiest imaginable evidence, from programs that did not even engage in the contest for which the violations are claimed.
And I don't approve of throat cutting.

syzygy
Posts: 148
Joined: Sun Oct 16, 2011 4:21 pm

Re: On Dalke

Post by syzygy » Wed Feb 29, 2012 11:37 pm

User923005 wrote:However, to use object code as proof of infringement, there should be long stretches of identical code. Otherwise, we must examine the source code to know for sure.
This is simply false if we are talking about copyright infringement by the object code. For there to be copyright infringement two requirements must be met:
(1) the alleged copy must have a sufficient number of creative aspects in common with the alleged original;
(2) not a case of independent creation.

Requirement (1) is a property of the alleged copy itself (in this case the object code). It is NOT related to how the copy was created, i.e. from what source code the object code was actually generated. Whether requirement (1) is complied with, can be determined BY DEFINITION on the basis of the object code and the alleged original work (where the alleged original work is in fact the Fruit source code).

For (2) obviously it is relevant from what source code the allegedly infringing object code was generated (and how that source code came into existence), but once we get there it is completely reasonable to shift the burden of proof to the alleged infringer.
I do not see how a few tiny snippets of assembly that partially match and partially do not match can be used to prove what you claim. In fact, I find your position literally absurd.
This is quite funny given that I have not claimed any such thing. I only explain that looking at the assembly is THE WAY to determine whether requirement (1) is complied with.

In your (trivial) string operation example, the act of compilation weeded out all creative aspects from the resulting assembly. Copying the object code is not infringing, since there is no copyright on the object code. Requirement (1) cannot be met in that case.

Post Reply