Charlie's Blog

Monday, December 5, 2011

Who is your 2012 GOP candidate?

There are many ways to analyze GOP candidates to see where they stand on various issues. I have found a matrix ready for analysis at http://psudo.us/sho/GOP.html. And that is what I am going to use.

Notice a large proportion of data is missing.

Unlike some other multivariate methods, Nonmetric Multidimensional Scaling (MDS) is able to handle missing data. MDS "fits" the high dimensional data into a lower dimensional space by preserving the relative distances between data points as much as possible. When used properly, it is a great visualization tool.

One way to validate if the dimension reduction is reasonable is to check with the badness-of-fit. The results show all fits have a badness-of-fit value less than 0.1, which is excellent.

Below is a matrix of scatter plots using MDS outputs. There are five categories of issues or policies and 5 scatter plots. The first plot is the overall stance. You can now identify who's who from the plots. Enjoy!

Thursday, December 1, 2011

Thanksgiving online shopping wait time (2011)

There seems to be two groups of merchants. Those that have very large traffic with less capacity and those that have light traffic with bigger capacity.

Monday, November 28, 2011

Are these brokerages really any better or worse than each others?

Globe and Mail ranked Canadian brokerages by the following table:

Rank	Broker	Costs (/25)	Trading (/25)	Tools (/20)	Account Info (20)	Innovation (/10)	Total (/100)
1	Qtrade Investor	16	21	15	18	7	77
2	Virtual Brokers	23.5	18.5	11	10	8	71
3	BMO InvestorLine	10	16.5	16.5	18.5	6.5	68
4	Scotia iTrade	18	16.5	13	13	7	67.5
5	RBC Direct Investing	10	17.5	18	16	5	66.5
6	Credential Direct	12.5	16	14.5	18	5	66
7	TD Waterhouse	10	17	18	11	4.5	60.5
8	CIBC Investor's Edge	14.5	13.5	15	8	6	57
9	Disnat (Classic)	10.5	11.5	15	15	3.5	55.5
10	Questrade	16	13.5	10.5	9	6.5	55.5
11	National Bank Direct Brokerage	8.5	10.5	14	16.5	3.5	53
12	HSBC InvestDirect	10	13	6	10	1	40

If the Globe and Mail scores are representative of the collective opinions of the traders and investors, are any of the brokerages any better or worse than the others?

In other words, are the scores really different? Let's put all scores in one column and the brokerage name in the other, and use Tukey's multiple comparison test to find out. Actually, let's use all 3 -- Bon Ferroni, Tukey and Scheffe.

Save the above table as z:\brokers.csv, and run the following SAS code. We found all 3 tests put all the brokerages into a single group. They are all the same by standards of Globe and Mail.

filename in 'z:\brokers.csv';
data mydata.brokers;
length broker $32;
infile in dsd dlm="," missover firstobs=2;
input Rank Broker $ Costs Trading Tools AcctInfo Innovation total;
run;

data mydata.brokers2(drop= Costs Trading Tools AcctInfo Innovation i);
set mydata.brokers(drop=rank total);
array attrs (*) _numeric_;
do i=1 to dim(attrs);
score=attrs(i);
output;
end;
run;

proc glm data=mydata.brokers2;
class broker;
model score = broker /solution e;
means broker /bon tukey scheffe alpha=0.05 ;
run;

Friday, September 16, 2011

Gold price has been following a pattern lately

Staring Aug 30, 2011, the Gold Future price has been strictly following a "Long Green", "Narrow Range", and then "Long Red" 3-day candle stick pattern. It the pattern persists, Monday Sep 19, will again be a narrow range day, and Sep 20, Tuesday will be a long Red (down) day.

What is this pattern saying about the gold price? One way - "Unnatural".

Saturday, April 16, 2011

Mac vs Windows and Safari vs IE - Are Firefox and Chrome more likely to run on Windows than Mac?

I don't have lots of web stats. But here is what I do have. The following is a contingency table from my over simplified visitors stats offered by blogger.com.

OS \ Browser | Firefox + Chrome | Safari or IE
-----------------------------------------------
Mac OS X | 1789 | 922
-----------------------------------------------
Windows | 688 | 591

The Odds Ratio for Windows users to use a non-IE browser over Mac OS X users to use a non-Safari browser is about 1.7 To be sure, let's calculate the 95% confidence interval.

C.I. = 1.7 plus/minus 1.96 x sqrt( 1/1789 + 1/688 + 1/922 + 1/591)

The 95% CI for Odds Ratio is approximately between 1.2 and 2.1.
Conclusion: Mac users are more likely to use Safari than Windows users to use IE.
But some of you have already known that.

For Stats readers, below is the LR Statistics of a saturated loglinear model. The high order
term has a small p-value

   Analysis Of Maximum Likelihood
   Parameter Estimates

   Wald
   Parameter Chi-Square Pr > ChiSq

   Intercept 980.95 <.0001
   Windows 6.72 0.0095
   Thirdparty 0.78 0.3772
   Windows*Thirdparty 5.63 0.0177

Tuesday, April 12, 2011

Charlie's Blog: Did Zuckerberg send those email to Ceglia?

Charlie's Blog: Did Zuckerberg sent those email to Ceglia?: "Did Zuckerberg send those emails to Celgia? One way to find out is to study the writing styles. With a quick search on the inte..."

Did Facebook's Zuckerberg send those email to Ceglia?

Did Zuckerberg send those emails to Ceglia?

One way to find out is to study the writing styles.

With a quick search on the internet I found this email of Mark Zuckerberg's of which will be called email "Auth", standing for authentic.

I then saved the text of all emails alleged to be sent by Zuckerberg to Ceglia, of which are called "Acc", standing for "Accused". There are also emails by Ceglia himself to Zuckerberg. I compiled these emails together and called it "Ceglia".

A common method to study the style difference between say Shakespare's and Jack London's is to compare the frequencies of the "meaningless" words. By "meaningless", it means words such as "such", "as", "you", "any", "and", etc. Researchers believe that the frequencies of these "meaningless" words are characteristics of different authors.

Contrary to my eye-balling conclusion, the "Accused" emails are closer to "Authentic" than to "Ceglia" in writing style. How much closer? It is about 500 times closer to Zuckerberg's authentic email than to the Ceglia's email.

Let's take a closer look at the data, the differences are represented by numbers in the "Mean" column, the larger the number, the bigger the deviation between the
two writing styles. The top "acc_zuck" row has the smallest number. In other words, the difference of "Accused" and "Authentic" email are closest in writing style.

If you look at the "Mean" in the 2nd and 3rd row, the two values are very similar and much larger than row number one. In other words, "acc_ceglia", or the difference between "Accused" and "Ceglia" email, is as much as the difference between "Authentic" and "Ceglia".

All these numbers point to one direction, that the alleged email were from Zuckerburg, as far as the writing style is concerned.

How significant the results are? It depends on if the sample size (meaning the number of email) is considered large enough.

Summary Statistics 1

Results 12:40 Friday, April 8, 2011

The MEANS Procedure

Variable Mean Std Dev Minimum

-----------------------------------------------------------

acc_zuck 0.000066864 0.000139743 7.340421E-10

acc_ceglia 0.0011288 0.0077861 4.709518E-10

zuck_ceglia 0.0011347 0.0079904 2.0894707E-6

-----------------------------------------------------------

Variable Maximum N

-------------------------------------------

acc_zuck 0.000823812 57

acc_ceglia 0.0588287 57

zuck_ceglia 0.0603857 57

-------------------------------------------