Monday, December 5, 2011

Who is your 2012 GOP candidate?

There are many ways to analyze GOP candidates to see where they stand on various issues. I have found a matrix ready for analysis at http://psudo.us/sho/GOP.html. And that is what I am going to use. 

Notice a large proportion of data is missing. 

Unlike some other multivariate methods, Nonmetric Multidimensional Scaling (MDS) is able to handle missing data. MDS "fits" the high dimensional data into a lower dimensional space by preserving the relative distances between data points as much as possible. When used properly, it is a great visualization tool. 

One way to validate if the dimension reduction is reasonable is to check with the badness-of-fit. The results show all fits have a badness-of-fit value less than 0.1, which is excellent.


Below is a matrix of scatter plots using MDS outputs. There are five categories of issues or policies and 5 scatter plots. The first plot is the overall stance. You can now identify who's who from the plots. Enjoy!






Thursday, December 1, 2011

Monday, November 28, 2011

Are these brokerages really any better or worse than each others?

 Globe and Mail ranked Canadian brokerages by the following table:


 RankBrokerCosts (/25)Trading (/25)Tools (/20)Account Info (20)Innovation (/10)Total (/100)
1Qtrade Investor16211518777
2Virtual Brokers23.518.51110871
3BMO InvestorLine1016.516.518.56.568
4Scotia iTrade1816.51313767.5
5RBC Direct Investing1017.51816566.5
6Credential Direct12.51614.518566
7TD Waterhouse101718114.560.5
8CIBC Investor's Edge14.513.5158657
9Disnat (Classic)10.511.515153.555.5
10Questrade1613.510.596.555.5
11National Bank Direct Brokerage8.510.51416.53.553
12HSBC InvestDirect1013610140



If the Globe and Mail scores are representative of the collective opinions of the traders and investors, are any of the brokerages any better or worse than the others?


In other words, are the scores really different? Let's put all scores in one column and the brokerage name in the other, and use Tukey's multiple comparison test to find out. Actually, let's use all 3 -- Bon Ferroni, Tukey and Scheffe.


Save the above table as z:\brokers.csv, and run the following SAS code. We found all 3 tests put all the brokerages into a single group. They are all the same by standards of Globe and Mail.



filename in 'z:\brokers.csv';
data mydata.brokers;
length broker $32;
infile in dsd dlm="," missover firstobs=2;
input Rank Broker $ Costs Trading Tools AcctInfo Innovation total;
run;


data mydata.brokers2(drop= Costs Trading Tools AcctInfo Innovation i);
set mydata.brokers(drop=rank total);
array attrs (*) _numeric_;
do i=1 to dim(attrs);
score=attrs(i);
output;
end;
run;


proc glm data=mydata.brokers2;
class broker;
model score = broker /solution e;
means broker /bon tukey scheffe alpha=0.05 ;
run;

Friday, September 16, 2011

Gold price has been following a pattern lately


Staring Aug 30, 2011, the Gold Future price has been strictly following a "Long Green", "Narrow Range", and then "Long Red" 3-day candle stick pattern. It the pattern persists, Monday Sep 19, will again be a narrow range day, and Sep 20, Tuesday will be a long Red (down) day.

What is this pattern saying about the gold price? One way - "Unnatural". 

Saturday, April 16, 2011

Mac vs Windows and Safari vs IE - Are Firefox and Chrome more likely to run on Windows than Mac?

I don't have lots of web stats. But here is what I do have. The following is a contingency table from my over simplified visitors stats offered by blogger.com.


OS \ Browser | Firefox + Chrome | Safari or IE
-----------------------------------------------
Mac OS X     | 1789             | 922
-----------------------------------------------
Windows      | 688              | 591


The Odds Ratio for Windows users to use a non-IE browser over Mac OS X users to use a non-Safari browser is about 1.7  To be sure, let's calculate the 95% confidence interval. 


C.I. = 1.7 plus/minus 1.96 x sqrt( 1/1789 + 1/688 + 1/922 + 1/591)


The 95% CI for Odds Ratio is approximately between 1.2 and 2.1. 
Conclusion: Mac users are more likely to use Safari than Windows users to use IE.
But some of you have already known that.


For Stats readers, below is the LR Statistics of a saturated loglinear model. The high order
term has a small p-value



                 Analysis Of Maximum Likelihood
                      Parameter Estimates

                                    Wald
          Parameter           Chi-Square    Pr > ChiSq


          Intercept               980.95        <.0001
          Windows                   6.72        0.0095
          Thirdparty                0.78        0.3772
          Windows*Thirdparty        5.63        0.0177

Tuesday, April 12, 2011

Charlie's Blog: Did Zuckerberg send those email to Ceglia?

Charlie's Blog: Did Zuckerberg sent those email to Ceglia?: "Did Zuckerberg send those emails to Celgia? One way to find out is to study the writing styles. With a quick search on the inte..."

Did Facebook's Zuckerberg send those email to Ceglia?

Did Zuckerberg send those emails to Ceglia? 

One way to find out is to study the writing styles. 

With a quick search on the internet I found this email of Mark Zuckerberg's of which will be called email "Auth", standing for authentic.

I then saved the text of all emails alleged to be sent by Zuckerberg to Ceglia, of which are called "Acc", standing for "Accused". There are also emails by Ceglia himself to Zuckerberg. I compiled these emails together and called it "Ceglia".

A common method to study the style difference between say Shakespare's and Jack London's is to compare the frequencies of the "meaningless" words. By "meaningless", it means words such as "such", "as", "you", "any", "and", etc. Researchers believe that the frequencies of these "meaningless" words are characteristics of different authors.

Contrary to my eye-balling conclusion, the "Accused" emails are closer to "Authentic" than to "Ceglia" in writing style. How much closer? It is about 500 times closer to Zuckerberg's authentic email than to the Ceglia's email.


Let's take a closer look at the data, the differences are represented by numbers in the "Mean" column, the larger the number, the bigger the deviation between the
two writing styles. The top "acc_zuck" row has the smallest number. In other words, the difference of "Accused" and "Authentic" email are closest in writing style.


If you look at the "Mean" in the 2nd and 3rd row, the two values are very similar and much larger than row number one. In other words, "acc_ceglia", or the difference between "Accused" and "Ceglia" email, is as much as the difference between "Authentic" and "Ceglia".


All these numbers point to one direction, that the alleged email were from Zuckerburg, as far as the writing style is concerned.

How significant the results are? It depends on if the sample size (meaning the number of email) is considered large enough.  


                       Summary Statistics                      1
                            Results  12:40 Friday, April 8, 2011

                      The MEANS Procedure

  Variable               Mean         Std Dev         Minimum
  -----------------------------------------------------------
  acc_zuck        0.000066864     0.000139743    7.340421E-10
  acc_ceglia        0.0011288       0.0077861    4.709518E-10
  zuck_ceglia       0.0011347       0.0079904    2.0894707E-6
  -----------------------------------------------------------

          Variable            Maximum               N
          -------------------------------------------
          acc_zuck        0.000823812              57
          acc_ceglia        0.0588287              57
          zuck_ceglia       0.0603857              57
          -------------------------------------------