Wednesday, August 1, 2012

Random in, random out or how NOT to use artificial neural networks for prediction

In May last year I experimented a little with Google's machine learning tool - Google Prediction API.

One - I must admit stupid - experiment was to try to make Google Prediction API to forecast stock prices.

Quite unsurprisingly, all I received from Google Prediction API was a "flat line" at the level of the average stock price - meaning that the Google's machine learning engine was unable to estimate how the price will change.

The stock price returns/changes follow a random path which distribution is concentrated around zero, with fat and long tails (see: "Comparing probability distributions" and "Characterizing financial assets by Power Law alpha exponent")

Chart: Density of FW20(*) price changes
(*) FW20 - futures contract based on WIG20 index of the Warsaw Stock Exchange

There is a lot of noise that makes forecasting difficult.

I recently performed another experiment that confirms that observation. I tried to use artificial neural networks to find predictive patterns in raw returns series.

To simplify the task, I was trying to predict the direction of the changes, only.

In some cases, the proposed prediction model was able to correctly forecast between 70% and 100% of the changes. Pretty good, isn't it?

Chart: Actual (black) vs predicted changes (red)

Unfortunately, when you increase the number of tests, the performance rapidly deteriorates.

Ultimately, it settles at the 0.5 efficiency level - i.e. it is indistinguishable from random guesses:

> summary(random.fit)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000000 0.4285714 0.4285714 0.4957143 0.5714286 1.0000000 

> summary(fit.ratio)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000000 0.4285714 0.5714286 0.5085714 0.5714286 0.8571429

Chart: Change direction match - ANN vs random model

It does not necessarily mean, that neural networks are worthless in modeling financial markets.

But the described here approach is incorrect.

One have to pre-process financial market data before feeding them into models such as ANN.

[ R code ]


No comments: