Tuesday, September 13, 2011

What distribution does the stock market follow?

Fig WIG20 and daily changes

Przemek Biecek's "Na przełaj przez Data Mining z pakietem R" (Across Data Mining with the R package)  is a fascinating tutorial for people who would like to quickly implement selected data mining concepts in R.

Today I've browsed though the chapter about analyzing the distributions of daily changes of equity prices and applied it to the WIG20 index of the Warsaw Stock Exchange.

No surprises here :)

> summary(daily_changes)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.1416000 -0.0100500  0.0002639  0.0006617  0.0113000  0.1479000 

> skewness(daily_changes)
[1] -0.04857761

> kurtosis(daily_changes)
[1] 4.414738

Daily changes do not follow normal distribution:

> mec <- mean(daily_changes)
> sdc <- sd(daily_changes)
> ad.test(daily_changes,pnorm,mec,sdc)

        Anderson-Darling GoF Test

data:  daily_changes  and  pnorm 
AD = 61.0893, p-value = 1.288e-07
alternative hypothesis: NA 

Fig. observed distribution of the daily changes of WIG20 vs fitted normal, Cauchy, Laplace and Stable distributions

Among the tested distributions, the best results can be obtained with stable distribution with the following parameters:

 Stable Distribution

Estimated Parameter(s):
       alpha         beta        gamma        delta 
1.480000e+00 6.700000e-02 1.101051e-02 6.523883e-05 

Fig. Fitting of the stable distribution

> ad.test(daily_changes,pstable,sf_a,sf_b,sf_g,sf_d)

        Anderson-Darling GoF Test

data:  daily_changes  and  pstable 
AD = 1.361, p-value = 0.2134
alternative hypothesis: NA 

Hence, even that we can expect daily changes to remain in the (-5.5%, +5.5) range for more than 95% of the time, sometimes we can get a nasty surprise...

> stabledist::pstable(0.055,sf_a,sf_b,sf_g,sf_d)-stabledist::pstable(-0.055,sf_a,sf_b,sf_g,sf_d) # prob. of (-5.5%, +5.5%)

> stabledist::pstable(-0.1,sf_a,sf_b,sf_g,sf_d)  # probability of -10% or less

It is still worth keeping in mind that short term distribution can be highly distorted (see here and here) and affected by the recent volatility.

You can see the complete source code here.

No comments: