Tuesday, April 12, 2011

Did Facebook's Zuckerberg send those email to Ceglia?

Did Zuckerberg send those emails to Ceglia? 

One way to find out is to study the writing styles. 

With a quick search on the internet I found this email of Mark Zuckerberg's of which will be called email "Auth", standing for authentic.

I then saved the text of all emails alleged to be sent by Zuckerberg to Ceglia, of which are called "Acc", standing for "Accused". There are also emails by Ceglia himself to Zuckerberg. I compiled these emails together and called it "Ceglia".

A common method to study the style difference between say Shakespare's and Jack London's is to compare the frequencies of the "meaningless" words. By "meaningless", it means words such as "such", "as", "you", "any", "and", etc. Researchers believe that the frequencies of these "meaningless" words are characteristics of different authors.

Contrary to my eye-balling conclusion, the "Accused" emails are closer to "Authentic" than to "Ceglia" in writing style. How much closer? It is about 500 times closer to Zuckerberg's authentic email than to the Ceglia's email.


Let's take a closer look at the data, the differences are represented by numbers in the "Mean" column, the larger the number, the bigger the deviation between the
two writing styles. The top "acc_zuck" row has the smallest number. In other words, the difference of "Accused" and "Authentic" email are closest in writing style.


If you look at the "Mean" in the 2nd and 3rd row, the two values are very similar and much larger than row number one. In other words, "acc_ceglia", or the difference between "Accused" and "Ceglia" email, is as much as the difference between "Authentic" and "Ceglia".


All these numbers point to one direction, that the alleged email were from Zuckerburg, as far as the writing style is concerned.

How significant the results are? It depends on if the sample size (meaning the number of email) is considered large enough.  


                       Summary Statistics                      1
                            Results  12:40 Friday, April 8, 2011

                      The MEANS Procedure

  Variable               Mean         Std Dev         Minimum
  -----------------------------------------------------------
  acc_zuck        0.000066864     0.000139743    7.340421E-10
  acc_ceglia        0.0011288       0.0077861    4.709518E-10
  zuck_ceglia       0.0011347       0.0079904    2.0894707E-6
  -----------------------------------------------------------

          Variable            Maximum               N
          -------------------------------------------
          acc_zuck        0.000823812              57
          acc_ceglia        0.0588287              57
          zuck_ceglia       0.0603857              57
          -------------------------------------------

3 comments:

  1. This is very insightful, but the font you use sucks so bad nobody can read it.

    ReplyDelete
  2. Sanjay, my monitor was set to use large fonts. Thanks for pointing it out for me. It's now in the right size.

    For those who are in for more statistics, the Pearson is around 9.5 with 56 degree of freedom for the difference between "Authentic" and "Accused", can't be any better.

    ReplyDelete
  3. Very interesting indeed!

    However, if Ceglia were to forge these emails, wouldn't it make sense that he copy Zuckerberg's writing style? Given the astronomical amounts in play, he could have hired experts. I wouldn't be surprised if his lawyers were in on it too...

    ReplyDelete