Did Zuckerberg send those emails to Ceglia?
One way to find out is to study the writing styles.
With a quick search on the internet I found this email of Mark Zuckerberg's of which will be called email "Auth", standing for authentic.
I then saved the text of all emails alleged to be sent by Zuckerberg to Ceglia, of which are called "Acc", standing for "Accused". There are also emails by Ceglia himself to Zuckerberg. I compiled these emails together and called it "Ceglia".
A common method to study the style difference between say Shakespare's and Jack London's is to compare the frequencies of the "meaningless" words. By "meaningless", it means words such as "such", "as", "you", "any", "and", etc. Researchers believe that the frequencies of these "meaningless" words are characteristics of different authors.
Contrary to my eye-balling conclusion, the "Accused" emails are closer to "Authentic" than to "Ceglia" in writing style. How much closer? It is about 500 times closer to Zuckerberg's authentic email than to the Ceglia's email.
Let's take a closer look at the data, the differences are represented by numbers in the "Mean" column, the larger the number, the bigger the deviation between the
two writing styles. The top "acc_zuck" row has the smallest number. In other words, the difference of "Accused" and "Authentic" email are closest in writing style.
If you look at the "Mean" in the 2nd and 3rd row, the two values are very similar and much larger than row number one. In other words, "acc_ceglia", or the difference between "Accused" and "Ceglia" email, is as much as the difference between "Authentic" and "Ceglia".
All these numbers point to one direction, that the alleged email were from Zuckerburg, as far as the writing style is concerned.
How significant the results are? It depends on if the sample size (meaning the number of email) is considered large enough.
Summary Statistics 1
Results 12:40 Friday, April 8, 2011
The MEANS Procedure
Variable Mean Std Dev Minimum
-----------------------------------------------------------
acc_zuck 0.000066864 0.000139743 7.340421E-10
acc_ceglia 0.0011288 0.0077861 4.709518E-10
zuck_ceglia 0.0011347 0.0079904 2.0894707E-6
-----------------------------------------------------------
Variable Maximum N
-------------------------------------------
acc_zuck 0.000823812 57
acc_ceglia 0.0588287 57
zuck_ceglia 0.0603857 57
-------------------------------------------