Friday, November 2, 2012

Downloading stooq market data in bulk

A couple of weeks ago I noted that there is some problem with reading market data from stooq.com using the method described in one of my previous posts.

It seems that after a number of consecutive successful downloads, stooq blocks access to single files from a given IP.

Probably downloading a large number of single files may be recognized as an unwelcome activity endangering accessibility of the site - a kind of a small DoS attack ;)

Nevertheless, there is another way to get the stooq market data without risking being blocked - you can download the complete database of stooq data or its section from the historical data archive: http://stooq.com/db/h/

One small hurdle - files in stooq historical market data archive are zipped, so you need to deal with compression. Fortunately, handling zip files is easily available in R :)

# download the complete archive

download.file("http://s.stooq.com/db/h/d_all_txt.zip",
              "stooq data.zip")

# unzip a selected file
unzip("stooq data.zip",file="data/daily/pl/wse indices/wig20.txt")

# access data from the unzippped file
quotes <- read.csv("data/daily/pl/wse indices/wig20.txt",header=TRUE)

After that you have the needed data loaded:

> head(quotes)
      Date  Open  High   Low Close Volume OpenInt
1 19910416 100.0 100.0 100.0 100.0    325       0
2 19910423  95.7  95.7  95.7  95.7   5905       0
3 19910430  93.5  93.5  93.5  93.5   7162       0
4 19910514  92.9  92.9  92.9  92.9  18300       0
5 19910521  95.5  95.5  95.5  95.5  14750       0
6 19910528  94.6  94.6  94.6  94.6  31440       0

When you know the name but not the exact path of the desired data file inside the zip, you can find it like that:

> (idx <- grep("wig20.txt",zip.content[[1]]))
[1] 6908
> 
> as.character(zip.content[[1]])[idx]
[1] "data/daily/pl/wse indices/wig20.txt"

That's it!


2 comments:

markson said...

Its business is stretched out to government and non-government associations too. So if information is that much commendable, for what reason ought not the organizations bank on the information? machine learning course

Tanika Co Valda said...

Great Article
Data Mining Projects IEEE for CSE
Project Centers in Chennai

JavaScript Training in Chennai
JavaScript Training in Chennai