Friday, November 2, 2012

Downloading stooq market data in bulk

A couple of weeks ago I noted that there is some problem with reading market data from stooq.com using the method described in one of my previous posts.

It seems that after a number of consecutive successful downloads, stooq blocks access to single files from a given IP.

Probably downloading a large number of single files may be recognized as an unwelcome activity endangering accessibility of the site - a kind of a small DoS attack ;)

Nevertheless, there is another way to get the stooq market data without risking being blocked - you can download the complete database of stooq data or its section from the historical data archive: http://stooq.com/db/h/

One small hurdle - files in stooq historical market data archive are zipped, so you need to deal with compression. Fortunately, handling zip files is easily available in R :)

# download the complete archive

download.file("http://s.stooq.com/db/h/d_all_txt.zip",
              "stooq data.zip")

# unzip a selected file
unzip("stooq data.zip",file="data/daily/pl/wse indices/wig20.txt")

# access data from the unzippped file
quotes <- read.csv("data/daily/pl/wse indices/wig20.txt",header=TRUE)

After that you have the needed data loaded:

> head(quotes)
      Date  Open  High   Low Close Volume OpenInt
1 19910416 100.0 100.0 100.0 100.0    325       0
2 19910423  95.7  95.7  95.7  95.7   5905       0
3 19910430  93.5  93.5  93.5  93.5   7162       0
4 19910514  92.9  92.9  92.9  92.9  18300       0
5 19910521  95.5  95.5  95.5  95.5  14750       0
6 19910528  94.6  94.6  94.6  94.6  31440       0

When you know the name but not the exact path of the desired data file inside the zip, you can find it like that:

> (idx <- grep("wig20.txt",zip.content[[1]]))
[1] 6908
> 
> as.character(zip.content[[1]])[idx]
[1] "data/daily/pl/wse indices/wig20.txt"

That's it!


No comments: