Sunday, November 25, 2012

Open source scientific computing: R, GNU Octave and.. Julia

I must admit a couple of year ago I felt in love at the first sight with R :)

R

For R is simply about creating algorithmic representations of numerical formulas.

R language is beautifully pure. You do not have to worry about all the overhead present in general-purpose languages like C++ or Java. There is no user interface to design either.

Probably the most serious R competitor is Matlab.

Since Matlab is a proprietary software, the development of cutting-edge tools seems to lag behind R.

However, there is an open source Matlab alternative available (for Linux and OS X, but unfortunately not for Windows): GNU Octave.



And the number of packages for GNU Octave has been (rather slowly...) raising: http://octave.sourceforge.net/packages.php
(although it seems development of many packages stopped in 2009)

I wasn't able to find an actively maintained IDE for GNU Octave (similar to R studio for R).



At least you can easily get Octave syntax highlighting in gedit.

The biggest shortcoming of both R and GNU Octave is probably their debugging capability. In case of R you can try Revolution R Enterprise from Revolution Analytics.

Revolution Analytics Enterprise Statistical Computing & Predictive Analysis using Open Source R


To familiarize yourself with GNU Octave you can start with Introduction to Octave. Much more details can be found in Octave online documentation.

R users should probably consult a list of similarities and differences between R and Octave, available in R for Octave users.

There is also a number of blogs about GNU Octave, conveniently aggregated at http://planet.octave.org/

And when you need a physical guide, you can read "GNU Octave Beginner's Guide" by Jesper Schmidt Hansen.

The most recent newcomer to the open source scientific computing area is Julia.

According to the information on the Julia's site:

"Julia is a high-level, high-performance dynamic programming language for technical computing (...)  
It provides a sophisticated compiler, distributed parallel execution... Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing. (...) 
"The syntax of Julia is similar to MATLAB®"

Julia is presumably faster than both R and Matlab and GNU Octave.

But its future is uncertain at this moment...

You can find some links to Julia resources at: http://www.statalgo.com/julia/

Friday, November 2, 2012

Downloading stooq market data in bulk

A couple of weeks ago I noted that there is some problem with reading market data from stooq.com using the method described in one of my previous posts.

It seems that after a number of consecutive successful downloads, stooq blocks access to single files from a given IP.

Probably downloading a large number of single files may be recognized as an unwelcome activity endangering accessibility of the site - a kind of a small DoS attack ;)

Nevertheless, there is another way to get the stooq market data without risking being blocked - you can download the complete database of stooq data or its section from the historical data archive: http://stooq.com/db/h/

One small hurdle - files in stooq historical market data archive are zipped, so you need to deal with compression. Fortunately, handling zip files is easily available in R :)

# download the complete archive

download.file("http://s.stooq.com/db/h/d_all_txt.zip",
              "stooq data.zip")

# unzip a selected file
unzip("stooq data.zip",file="data/daily/pl/wse indices/wig20.txt")

# access data from the unzippped file
quotes <- read.csv("data/daily/pl/wse indices/wig20.txt",header=TRUE)

After that you have the needed data loaded:

> head(quotes)
      Date  Open  High   Low Close Volume OpenInt
1 19910416 100.0 100.0 100.0 100.0    325       0
2 19910423  95.7  95.7  95.7  95.7   5905       0
3 19910430  93.5  93.5  93.5  93.5   7162       0
4 19910514  92.9  92.9  92.9  92.9  18300       0
5 19910521  95.5  95.5  95.5  95.5  14750       0
6 19910528  94.6  94.6  94.6  94.6  31440       0

When you know the name but not the exact path of the desired data file inside the zip, you can find it like that:

> (idx <- grep("wig20.txt",zip.content[[1]]))
[1] 6908
> 
> as.character(zip.content[[1]])[idx]
[1] "data/daily/pl/wse indices/wig20.txt"

That's it!


Thursday, November 1, 2012

How to play correlation?

On August 31st, I wrote about correlation characteristics of some currencies and stocks.

I haven't previously mention connections between stock indexes.

It is worth to note that the correlation relationship between indexes is disturbed by different hours when markets are open. For example, the Frankfurt Stock Exchange closes at 15:35 UTC, while the New York Stock Exchange trades till 20:00 UTC. You can partially reduce this discrepancy by using index futures which trade longer hours.

Nevertheless the average n=25 correlation between daily changes of S&P500 and DAX index of the German stock exchange is around 0.25.

However, at the end of August it stood at 0.797, or very close to its historical maximum:


Fig. S&P500 - DAX n=25 correlation of daily changes, 2012-08-31

It has retraced since then to a little modest, but still pretty strong 0.59:

Fig. S&P500 - DAX n=25 correlation of daily changes, 2012-10-31

In the meanttime the value of S&P500 moved from 1406.58 to 1412.16 (+0.39%), while DAX changed from 6970.79 to 7260.63 (+4.07%).

Fig. S&P500 and DAX 1M, 2012-08-31, source: stooq.com

Fig. S&P500 and DAX 3M, 2012-10-31, source: stooq.com

Correlation is a mean reverting process. But taking advantage of it is not an easy task...
Fig. S&P500 and DAX 25 days correlations