Rebel wrote: In my eyes we had an agreement to have a debate about the AFC test ... in email and in principle meant for publication.
I don't recall any such "agreement"
per se. See below.
Rebel wrote:And if memory served me well (Chris may correct me) you quit after the second round without giving a reason.
We had one preliminary email exchange [thus I quit after the first round, not the second], with each side (ostensibly) trying to write the opponent's views. I am pretty sure I gave a reason for quitting. Just give me a few minutes while I search your website to find my email....
OK, here we go, intertwined with some emails from/concerning Dalke/Riis, on Feb 23 2012 you "invited [me] into such a debate" with ChrisW regarding AFC. The email was addressed to me and Dalke, and the CC list was Riis and Whittington (by this point, I think Dalke and I had broke off contact in any case, while Riis was more interested in my work on Losing Chess). Approximately one hour also later you sent an email regarding Rybka/Fruit and a statement of Zach's saying:
I prefer to end our conversation after reading your answer. I replied:
I think I prefer to end our conversations also. Regarding the AFC debate, my words were:
I would prefer to have any conversation privately with ChrisW (I am willing to have other interested programmers on CC, but not too much beyond that), and then we could each write a summary (of a few pages) on how we understand the issue, these being made public. But I'm open to other schemes also. I guess you are interpreting this as an "agreement" to have such a debate, when to me I was that I was
in principle open to such a discussion, but wanted to know the ground rules.
Indeed, two days later (Feb 25), you wrote to me with Zach in CC. The email was addressed to him, and you wrote that you and I had
more or less agreed to have an upcoming debate in email about the [AFC]. There was then some cross-email with EdS/ChrisW/MarkW regarding Dalke, a link that ChrisW liked, ... Anyway, on Feb 27 I sent an email regarding ChrisW's design:
Often a good way to start is for participants to each present a short paper of what they believe the position of the other side to be..., saying that I agreed, that I was busy [visa applications] but hoped to have maybe 2 pages on this within a week. On Feb 28, ZachW said he would like to listen in (if nothing else). On March 2nd, EdS announced the webpage was ready for our debate. On March 6th-7th, EdS and I exchanged some emails regarding the CSVN situation. On Mar 11, EdS sent his "idea" of putting Fruit eval in Rybka or vice-versa. On Mar 12, I apologised to ChrisW (no one else included on the email) for being slow with AFC, saying that my LosingChess research had recently undergone two quantum leaps (and the Thinker investigation had been going on the previous 2 weeks). ChrisW responded, saying that he would send his to EdS for simultaneous revelation. On March 19, I sent 4 paragraphs of material to EdS (below). I think I had just started on vacation (and bought an apartment).
Mark Watkins (channeling Chris Whittington) wrote:First, EVAL_COMP. It is subjective, which is not optimal, but it is what it is. However, listing what choices were made, and maybe why, is almost a necessity. Why was mobility split up piece-by-piece? This multiplies the R/F overlap. OTOH, why wasn't Rybka/Fruit mobility considered semantically the same (which would lower it)? Maybe these choices cancel out in the end, but who can tell in the current state of things? The same is true of how the programs were chosen. All it says is that some of them were suggested by Panel members, as having some Fruit elements. What does mean? Almost any engine has *some* Fruit overlap. State explicitly why engines were chosen, and why others were rejected. The strength issue should also be addressed. If you need to add post-Fruit open-source engines that aren't too much like it, then do it.
Next, filtration. There are three factors here: efficiency, external factors, and public domain. But for the first of these, I almost turn on its head. You largely filter out Rybka's use of bitboards, on the grounds that there is an efficiency issue. The same could be true for other points, like how to define backward pawns. This fails to give Rajlich enough credit here. And if some components appear in the same order in both programs, while others do not, the latter can claimed to be some sort of "efficiency" issue. This is a weakness, in that the meta-comparison is statistical, and so what goes in and what stays out always needs accounting. Everything said here can also be transferred to a similar argument about Elo dictating how Rybka was made. Rajlich saw that Fruit's simplistic eval with bug-free search had great potential, and took the ball from there and ran.
External factors. Once you've decided to have a low-cost eval, this tends to affect all the parts therein. Every component will naturally tend to the simplistic, rather than to the complex. If it can't be computed fast, you just ignore it (maybe use a surrogate), or make an approximation. The list of comparators in EVAL_COMP also has interconnections. First, it would be best if the relative importance of each feature had a measure somehow. As it is, all items count the same, which is a rather crude approach. Next, once you choose to use linear mobility on one piece, it's no surprise that you would use it on another. The same is true for other features -- the handling of open and semi-open files can essentially get double-counted if the same thing is done twice. A concluding point is that features can also have interactions in a larger scope. For example, the amount of work that gets done in passed pawns can depend on a number of factors, such as whether/when the search has extensions for them. Looking at the eval-only component is a misnomer.
Finally, public domain filtration. It makes a big difference whether a common Fruit/Rybka element appears in many other places. To make the point, let's look at the data from Don Dailey's similarity detector by Adam Hair. Suppose there are three moves, played by 60-30-10% of the engines. If here A and B match with the 60% move, it is almost irrelevant, and should be filtered. OTOH, if they both play the 10% move, it's significant. The 30% move is borderline. This is kinda like in comparing multiple-choice tests. If two students both get 95 of 100 correct, you look at the 5 wrong answers to judge whether they might have copied, not at the 95 that they got correct. The language in law pertains to "[something] that is, if not standard, then commonplace", and most of the relevant EVAL features in Fruit fit this description. Once these are eliminated, *then* a comparison can be made between it and Rybka.
On March 23, EdS wrote to say that he had forwarded my writing to ChrisW, and that he was busy the next few days. On March 26, EdS sent out an email to me, ChrisW, Zach, and Riis. The email also contained some (irrelevant) derogation against Hyatt for some TalkChess post he had recently made.
Briefly here are ChrisW's notion of my views. (1) That there was no real copy at code or source level, rather theft of recipe. (2) Weights are not copied, but they are however connected. (2b) Connections are indistinguishable from independent development beyond a certain complexity level (
This is a hard pill to swallow and he would rather not). (3) Full correspondence of features does not exist, some are missing at each end. (4a) Overall, the correspondences are too great. (4b) Wishes that a more effective and intelligent criticism of EVAL_COMP had been delivered before finalising it. (4c) Knows EVAL_COMP is flawed, but has belief in Rajlich's guilt. The whole thing was about 10 sentences in total (written in terse CW style, rather than my logorrhea).
ChrisW can either post the whole thing, or ask me to do so if desires this but no longer has the email. Let's just say that this is not what I expected from the "conversation", especially the various psychoanalyses. I did not find this to be a reasonable "belie[f] of the position of the other side", but rather something more like a wishful projection (particularly 4b/4c). On Apr 3rd, while in Hong Kong airport, I told EdS I would be busy for the next 2 weeks. On Apr 16th, I completed the swapping of Fruit/Rybka evals, and also sent the email copied below. EdS responded with:
I am fine with a switch in the discussion. Let's talk about similarity detector and ponder-hits... (which we did over the next month or so). As far as I can tell, he never opened the AFC debate subject again, until now.
Mark Watkins wrote:Sorry I have taken so long to write. After some thought, I really don't think my taking part in this debate is likely to lead to the desired goals, and so I think I am not going to participate in it. I had come to this conclusion a week and a half ago, but was too sick last week to type this email coherently.
My reason for this conclusion is largely that I don't see any real "resolution" coming about. At best, a few of us decide that we disagree on certain points.
When I was in Bristol two weeks ago, two of my colleagues had mentioned they had seen Rybka/Fruit on various "science geek" websites (one knew that I was involved, while for the other it was not clear). Upon looking at these sites, I realised that the discussion has been too distorted by the ChessBase report.
For instance, in a judicial case, the accuser always gets to present his case first. This is not a random decision -- if the defendant were to go first, he would often talk about extraneous things, and the plaintiff would spend a bunch of time defusing non-issues. And after he finally got to speak, the defendant would *still* get a defense to the claims. This is essentially what has happened here. By ChessBase firstly choosing not to covering the issue in June 2011, and then not even properly giving the accusations (or verdict) when rolling out the 4-part defense of Riis, they have (unwittingly or not) made it essentially impossible to address the major issues in any coherent way.
I don't think agreement on a few AFC issues between minor characters can change this. Rather take part in the AFC debate, I have instead decided to take up a suggestion of Ed. Rajlich's only defense was to cite ponderhit numbers, and Riis mentioned these in his writings also. This would also help to overcome the "subjectivity" question, and indeed move-matching is already noted in ICGA Rule #2.
Ed suggested that one use the Rybka numerology in Fruit (or vice-versa). One reason [mine, not Ed's] for this is that Fruit 2.1 was a development snapshot [no tuning applied], while Rybka was a production output. My recollection is that Fruit21/Rybka10 was about 56% on the Dailey similarity detector. The CSVN limit is 60%, though correspondence from Ed seems to suggest that is essentially the "absolute" bound, and 55% was already "suspicious" [note that RobboLito/Rybka3 is just under 60%].
Including Rybka 2.3.2a and Rybka 1.6 (to show pre-Beta correspondence, if any) will also be done. Except for Rybka 1.6, the results should be reproducible.
I think this will be a superior method of procedure, rather than debate AFC.