Sunday, December 15, 2013

Sommers'D - two dirty implementations: R vs F#

A couple of days ago I started playing with F#.

Although I'm VERY far from being skillful F# programmer, I am slowly moving forward.

Not far ago I implemented one of the measures of association in R - Somers' D.

Somers' D is sometimes used for testing concordance of internal and external risk scores / rankings.

I had some problems with finding the right formula for Somers' D Asymptotic Standard Error, and when I finally found the solution I didn't have much energy to clean up my R code ;)

I thought, using this dirty code as a base for Somers' D implementation in F# may bring interesting results. My intention was not to give R too large a head start over F#.

Still, the differences are visible in many places...

First of all, I have been pretty surprised that basic matrix operations are not available in the core F# version.

It is necessary to add F# PowerPack to work with matrices.

Even then, working with matrices in F# does not seem so natural as in R (or Matlab). Or, probably, I still know too little about F#.

A couple of examples:

constructing matrix in R:

  1. PD    <- c(0.05,0.10,0.50,1,2,5,25)/100
  2. total <- c(5,10,20,25,20,15,5)/100
  4. defaulted    <- total*PD
  5. nondefaulted <- total*(1-PD)
  7. <- sum(total)
  9. portfolio <- rbind(defaulted,nondefaulted)/n

constructing matrix in F#:

  1. #r "FSharp.PowerPack.dll"
  3. let PD             = [ 0.05; 0.10; 0.50; 1.00; 2.00; 5.00; 25.00 ]
  4. let counterparties = [ 5.; 10.; 20.; 25.; 20.; 15.; 5]
  6. let groups = PD.Length // risk groups no.
  8. let div100 x = x / 100.0
  10. let PDprct = [ for x in PD do yield div100 x ]
  11. let CPprct = [ for x in counterparties do yield div100 x ]
  13. let n = CPprct |> Seq.sum
  15. let defaulted    = [ for i in 1..groups do yield CPprct.[i-1]*PDprct.[i-1]/]
  16. let nondefaulted = [ for i in 1..groups do yield CPprct.[i-1]*(1.0-PDprct.[i-1])/]
  18. let x = matrix [ defaulted; nondefaulted ]

calculating WR/DR in R:

  1. wr <- n^2-sum(sapply(1:nrow(x)function(i) sum(x[i,])^2))

calculating WR/DR in F#:

  1. let xr = x.NumRows
  3. let rowSum (x : matrix) (i : int) = Array.sum (RowVector.toArray (x.Row(i-1)))
  5. // WR / DR
  7. let wr =
  9.     let mutable mat_sum = 0.0
  11.     for i in 1..xr do
  12.         let row2  = rowSum x i ** 2.0
  13.         mat_sum   <- mat_sum + row2
  15.     n ** 2.0 - mat_sum

Later it gets a little better, but...

'A' function in R:

  1. <- function(x,i,j) {
  3.   xr <- nrow(x)
  4.   xc <- ncol(x)
  6.   sum(x[1:xr>i,1:xc>j])+sum(x[1:xr<i,1:xc<j])
  8. }

'A' function in F#:

  1. let A (x : matrix) i j =
  3.     let xr = x.NumRows
  4.     let xc = x.NumCols
  6.     let rowIdx1 = List.filter (fun x -> x>i) [ 1..xr ]
  7.     let colIdx1 = List.filter (fun x -> x>j) [ 1..xc ]
  9.     let rowIdx2 = List.filter (fun x -> x<i) [ 1..xr ]
  10.     let colIdx2 = List.filter (fun x -> x<j) [ 1..xc ]
  12.     let mutable Asum = 0.0
  14.     for r_i in rowIdx1 do
  15.         for r_j in colIdx1 do
  16.             Asum <- Asum + x.[r_i-1,r_j-1]
  18.     for r_i in rowIdx2 do
  19.         for r_j in colIdx2 do
  20.             Asum <- Asum + x.[r_i-1,r_j-1]
  22.     Asum

As I've mentioned at the beginning of the post - both codes are "dirty". Also, I definitely know R better than F# (even if it may not be apparent from the R code above ;)

Still, F# seems to require more coding and many "simple" operations (matrices...) may not be so easy in F# in comparison to R.

I'm still to find where F# excels :)

[ dirty R code ]

[ dirty F# code ]

Wednesday, December 11, 2013

F# - even the longest journey begins with a single step. And there may be bumps along the way

Don't take me wrong. I have just started familiarizing myself with F# - a fairly new functional programming language developed with heavy involvement of Microsoft.

My intention has been to examine, whether F# can be used for various tasks I usually perform with R (

As for now, F# looks pretty strange.

It is different in many ways from standard programming languages like C/C+. It is also different from R.

Learning it seems like solving a series of logic puzzles, at this stage.

My (very early) F# code is definitely not optimal, but it may give a hint of what may come later.

Take for example a simple function for calculating return on investment in a bond, used in my previous post.

In R, the function looks like that:

  1. # expected (discounted) return
  2. pv <- function(fa,n,cr,rf) {
  3.   -fa+sum(sapply(1:n, function(i) (fa*cr)/(1+rf)^i))+fa/(1+rf)^n
  4. }

You can see the code in context here:

Meanwhile, my F# equivalent is:

At least both functions return the same result :)

The nice thing about F# is that, although Microsoft did not include it in the free Visual Studio Express 2013, there is an online version of the F# available. You can write and test your F# code there.

OK, why F# may look strange? Just a couple of observations:
  • calculating power for floats and integers is handled differently - pown for integers and ** for floats
  • once a function is used with one type of argument - say int - you cannot use it again with any other type - say float
  • separate operations for adding a single element at the beginning of a list (::) and for joining the lists (@)
  • some symbol combinations (example: !!!), while it is possible to define the operations they perform, cannot be used between arguments, i.e. !!! 2 3 is fine, while 2 !!! 3 is not
I would like to stress again, that I am at the very beginning of my journey with F#. 

The peculiarities of F# have not discouraged me so far. I'd say, it is quite the opposite. They have increased my hunger for learning fore about this bizarre creature ;)

Tuesday, December 10, 2013

When interest rates go to infinity

CAUTION: It is my first post about corporate debt so excessive simplifications and mistakes are highly probable. Comments welcome.


The standard formula for calculating bond return with 3 year maturity and annual coupon of 5% tells us, that we should expected discounted return of around 13.6%, given the extremely low current "risk free" interest rate.

> pv(fa=100,n=3,cr=0.05,rf)
[1] 13.59618

Do you think it is adequate for the risk we are taking?

Actually it depends :)

Fig.: Probability of Default vs. Interest Rate curve

If we are unlucky and our bond defaults, we may actually lose approx. between 46% and 60% of our investment (assuming RR=37% and RT=1Y, see below).

> de(fa=100,di=0,cr=0.05,rv=0.4,rf,rl=1) # default in year one; no coupon payments
[1] -60.17087
> de(fa=100,di=1,cr=0.05,rv=0.4,rf,rl=1) # default after first coupon payment
[1] -55.36236
> de(fa=100,di=2,cr=0.05,rv=0.4,rf,rl=1) # default after second coupon payment
[1] -50.5744
> de(fa=100,di=3,cr=0.05,rv=0.4,rf,rl=1) # default after third coupon payment
[1] -45.80689

Three critical factors here are Probability of Default (PD), Recovery Rate (RR) and Resolution Time (RT).

The first tells us, how likely we are to lost all or part of our initial investment. 

The second - what part of the investment we could get back.

The third - when can we expect some of our money back after the default.

Average may be misleading here. The default rate for speculative bonds surpassed 11% in the period. In addition, intensity of defaults varies between geographies and industries.

According to Moody's, Resolution Time can take between 6 months and more than 3 years.

Let's focus on the Probability of Default - i.e. freeze all the other parameters: bond maturity = 3 years, Recovery Rate = 37%, Resolution Time = 1 year, and risk free (RF) interest rate = 0.429%.

The 5% annual coupon on our bond implies its Probability of Default at around 5.5%.

This estimation method used means that if we would have a large portfolio of identical bonds with equal and constant PD of 5.5% and annual coupon of 5%, we would finish our investment with (discounted) return of zero - i.e. we have treated our coupon as zero profit interest rate.

PD of 5.5% is clearly above the average historical default rate as recorded by Standard&Poor's. Hence if we believe the actual PD will be lower, say 2%, we will make a profit. Zero profit interest rate at PD equal 2% is 2.2%, so the difference (spread) between our coupon and risk level is 2.8 pp. 

The table below shows the relation between PD and zero profit interest rates required for PDs between 0% and 10%:

          PD    IR
  [1,]   0.00  0.005
  [2,]   0.01  0.013
  [3,]   0.02  0.022
  [4,]   0.03  0.031
  [5,]   0.04  0.039
  [6,]   0.05  0.048
  [7,]   0.06  0.057
  [8,]   0.07  0.066
  [9,]   0.08  0.075
 [10,]   0.09  0.085
 [11,]   0.10  0.094

Reminder: debt maturity, RR, RT and RF are still frozen

Clearly, when default rate increases, we should ask for the higher interest rate. However, as the chart at the beginning shows, the situation starts to be pretty hilarious after reaching some PD level. Around PD of 60%, we need to ask for 100% interest. And even further the required zero profit interest rate goes into infinity...

[ R code used ]

Friday, November 29, 2013

Encrypting files with AES in R

The recent news about the widespread NSA electronic spying renewed people interest in securing their data.

R seems an unlikely tool for data encryption, but actually can be used for this purpose.

The digest package provides implementation of two crucial cryptographic algorithms:

By joining these two algorithms, it is possible to write a rudimentary code performing encryption/decryption of virtually any file.

The code first creates a 256-bit hash from an arbitrary key-phrase, using SHA-256. This hash key is used as a key to encrypt/decrypt a file using AES-256.

An additional feature of the code is in-memory compression of the data before encryption.

Also, since AES-256 requires 16-bytes blocks of data, some random information may be glued to the file before encryption. 

Definitely this is not the best-in-class implementation of the symmetric encryption, but for many purposes it may be good enough.

A small warning: in-memory decompression may return an error when one tries to decrypt a file with incorrect key-phrase.

Thursday, September 26, 2013

Elementary Expectation Maximization algorithm with R code

I have recently found a short article describing the Expectation Maximization algorithm.

The Expectation Maximization algorithm enables parameter estimation in probabilistic models with incomplete data.

It is closely related with Maximum Likelihood Estimation (MLE), but while MLE requires complete data, Expectation Maximization works with some latent (unobserved) variables.

The values of the parameters being estimated with the Expectation Maximization algorithm converge at the values calculated with MLE.

For example, for a model with 10 (unfair) coin selections and 100 tosses per selection we may get:

> heads
 [1] 11 14 72 66 63 73 65  9 65 68
> coin
 [1] 1 1 2 2 2 2 2 1 2 2

> # actual probabilities
> head.prob1.init
[1] 0.124
> head.prob2.init
[1] 0.704

> # MLE estimates (we know which coin was selected in every turn)
> head.prob1.est
[1] 0.113
> head.prob2.est
[1] 0.674

> # Maximum Expectation estimates (coin selection is unknown)
> prob1.maxexp
[1] 0.113
> prob2.example
[1] 0.674

The above mentioned tutorial explains the basics of the Expectation Maximization algorithm well enough, so there is no need to repeat its content here.

What the article is missing is a code showing actual implementation of the simple version of this algorithm.

Hence I have written a short R code, demonstrating what's exactly going on here.

Enjoy! :)


Two other earlier introductory posts:

For Dummies: Intro to Generalized Method of Moments

Cointegrated drunk and her dog visualized

Friday, July 5, 2013

Trust no one?

I'd written quite extensively about the dark side of the Polish discretionary investment funds (the term in vogue is absolute return, now), but I had some hope in the emerging class of Polish quantitative funds.

So far, they rather disappoint, what frankly I much regret :(

The largest of them, UniSystem 1, managed by Union Investment, is down -8.5% YTD.


It has been falling systematically since the opening of the fund to the public in late 2012.

Unfortunately, it doesn't quite look like a simulated results presented in the fund's marketing materials:

One may say, that the fund is just following the trend:

Source: stooq
Note: fund certificates traded on open market are highly discounted (by some 18%)

Indeed, UniSystem 1 seems highly correlated with WIG20, the main index of the Warsaw Stock Exchange.

It is a little strange for a fund marketed as a market-neutral vehicle, aimed at various markets around the world, and willing to benefit from geographical diversification.

There may be something wrong with the investment models the fund employs.

The poor performance of UniSystem 1 has probably been the reason for cancellation of the new issue of the fund's certificates.

The disappointing fund performance is quite sad, especially taking into account that heavily beaten in recent years SuperFund seems to be finally catching some breath this year. Even if it is still behind S&P500:

Source: SuperFund

Nevertheless, I still hope, UniSystem 1 can turn around, and at least stabilize. The case of Provide Able 2 Trend may be a faint hint of such possibility:


Thursday, July 4, 2013

Creating some real value behind digital currencies

While algorithmic trading is moving into the cloud, the cloud is moving into stock exchange :)

Deutsche Boerse is going to start Cloud Exchange, when it will be possible to trade processing power and storage.

It may be interesting to know what entities will be buying computing resources, and if the demand for the resources increases in volatile or in calm times on the markets.

As I wrote a little ago, it may be more difficult to generate higher investment returns in times of low market volatility.

Also, the computation exchange may give some more tangible value to Bitcoin and other crypto or rather digital currencies, that depend on spending processing power on solving difficult mathematical problems. It may even create some possibilities for arbitrage between the price of processing power and the price of the crypto currencies.

Still, I dislike the idea of spending processor cycles and hence burning electricty just for solving some artificial riddles, instead of doing some beneficial computing-intensive calculations, like Folding@Home.

Maybe the introduction of Cloud Exchange will be the first step for creating some more solid fundations for digital currencies.

Saturday, June 29, 2013

Creatively using statistics for your advantage

As I have posted yesterday, the Polish government is trying to shake up the social security system to save the budget. Largely in a Hungarian way.

I would like to stress that I am not completely happy about the way the Polish private pension system operates. Much can and should be changed.

Nevertheless, the currently proposed "reform" is mostly a cash grab.

The government claims, that the private pension funds are expensive for they collected close to 17 billion PLN (*) in fees.

First, these fees were sanctioned by the Polish government.

Second, it is worth comparing these 17 billion PLN (*) to 43,7 billion PLN cost of the public social security system - ZUS - over the same period.

If you compare the costs of the public social security system to the assets of the private funds, you may be surprised that the former constitutes as much as 17% of the later. 

Hence, one can "infer" that it is possible to save some 45 billion PLN over 10 years by liquidating ZUS ;)

I agree, that these comparisons and conclusions are a little bit "too creative", but similarly inventive are some of the juxtapositions and proposals presented by the Polish government.

Not that the government is the first or the only entity using statistics creatively...

(*)  maybe my spreadsheet software has some bugs, but when it sums up the OFE fees presented by the government, I get a little more than 16 billion PLN not 17 billion PLN; but what is a billion among the friends...

Friday, June 28, 2013

Social security systems are dying

The Polish government's recently presented plans to "encourage" people to completely move away from the private pension funds (so called OFE) introduced in 1999, to the state social security system ZUS.

I would definitely agree, that OFE are not perfect retirement saving solution. They are too expensive (they take too high management fees) and they are seriously limited in ways they can invest their assets - mostly in government bonds. In connection with the government-sanctioned benchmarking system, they chase the same investment strategies, which is clearly visible in the very high correlation between the funds.

Nevertheless, their overall performance over the 14 years they have operated is not so bad. As for late June 2013, their average annualized return is around 9%.

What's more, they have clearly beaten the main Warsaw Stock Exchange index - WIG20:

Source: stooq

Yes, the private part of the Polish retirement system requires some changes. But the current government proposal is simply bad.

It was designed as a way to prevent budget from exploding due to the rapidly raising deficit. It most probably will give the Finance Minister some relief.

But it will not solve the problems in the slowly dying social security system.

Demographics, employment and migration trends will surely destroy the current system.

Poland is not the only country facing similar challenges. Take a look on Japan... 

Saturday, May 18, 2013

HFT invited to a land of new opportunities

The markets are broken.

I've thought, nothing will beat an epic fall of Accenture in May 2010. From $40 to $0.01.

But we've received Andarko yestarday... $90 to $0.01 in mere 45 milliseconds!

Source: Nanex

The fall in Andarko price probably affected the value of the S&P500 index by 1 point.

I wonder whether anyone either sold some ES futures just before manipulating the Andarko price, or bought them just after the mini flash crash?

A single point is not much, but may be tempting enough.

This reminds me: the Warsaw Stock Exchange (WSE) has recently changed its equity futures settlement price rules. While the settlement price of the index futures is based on the average of the index over some time, the final settlement price of equity futures depends on the last price of the underlying stock.

Since WSE has been running on the NYSE's UTP for a month now, new HFT players shouldn't have any problems with adapting and making the prices right ;)

Recent (mostly) HFT-related posts:

HFT-related news:

Friday, May 17, 2013

Mission accomplished: S&P500 = 1666.66

Source: Google Finance

It was 666 in March 2009:

Source: Stooq

That makes 1000 points, or 150% in 4 years.

But for opening short position, I need much more than faint signals.

I'm sure, it will come - in one way, or another...

Thursday, May 16, 2013

I will come back to that later

Some time ago I wrote about generating safe investment strategy by combining many low correlated investment models.

Recently I've been working on a family of models based on one common concept - short term return prediction.

Seems, it may actually work :)

I have run a number of test and so far results are encouraging:

The above chart presents strategy returns over 250 sessions between 2011-11-23 and 2013-04-12.
(some sessions are missing for lack of common data for some days)

Lines of different colors represent sub-models used. The thick purple line represents the combined model.

The correlation between some sub-models is pretty high - that may be used for reduction of the number of sub-models used.

> cor(test.arr[,,"return"])
             M1        M2        M3        M4        M5      mean
M1    1.0000000 0.3565054 0.2471695 0.3712636 0.1614239 0.4081393
M2    0.3565054 1.0000000 0.1567697 0.1935882 0.1805843 0.2852754
M3    0.2471695 0.1567697 1.0000000 0.5065645 0.4060609 0.5138838
M4    0.3712636 0.1935882 0.5065645 1.0000000 0.3663719 0.6328662
M5    0.1614239 0.1805843 0.4060609 0.3663719 1.0000000 0.5846763
mean  0.4081393 0.2852754 0.5138838 0.6328662 0.5846763 1.0000000

The returns of all sub-models were positive over the test period:

> stats
          M1      M2      M3      M4      M5      mean             
total     0.7488  0.5927  1.1036  0.5844  1.0368  1.0415
mean      0.0029  0.0023  0.0044  0.0023  0.0041  0.0041
drawdown -0.1331 -0.1181 -0.1674 -0.1461 -0.1563 -0.1804

The combined model was positive for all test periods and assets used for tests so far.

The individual returns are low, but it seems it is possible to safely leverage the strategy 2-3x. I haven't completed yet the analysis of the frequency of portfolio re-balancing yet, too.

As I've mentioned in the title of this post, I'm going to return to this strategy and probably its sibling a little bit later...

Friday, May 10, 2013

Illiquidity trap

Another difficult year for some investment funds

Some Polish absolute return investment funds are facing a though year - again.

Investor FIZ is down -12.96% and Opera FIZ felt -12.43% YTD.

This may not sound too bad for the recent investors in some of the John Paulson's funds.

However, if you take a longer perspective, the numbers start to look much more ugly.

Investor FIZ is down -68.71% from its peak in October 2007, while Opera FIZ decreased by -80.45% from June 2007.

I understand that substantial drawdowns may occasionally happen to even the best managers.

They are often connected either with elevated market volatility or slow market convergence process.

To demonstrate the later case let's take a look on a potential a pair trading involving BRENT and WTI crudes:

Source: Stooq

The fundamental price difference between them is mostly connected with transportation costs. Hence, the spread should be pretty stable and one should be able to quite easily arbitrage the temporary premium or discount in the price of one of these commodities.

However, as you can see above, some other factors can "temporarily" push the spread to the extremes.

If one used historical price data to model the possible spread distribution and opened opposite positions (long WTI, short BRENT) based on such a model in mid 2011, the widening spread could have either caused significant drawdown or forced the over-leveraged investor to close the trade.

But it was always possible to close such a trade! And it narrowed eventually, enough to produce a profit for the patient investor.

Unfortunately, there is very little hope for Opera FIZ and Investor FIZ investors...

The value of Opera FIZ certificates has been flat for a couple of years. And we should expect the same from Investors FIZ... :(

The reason is quite simple.

A significant part of these funds' assets is trapped in illiquid investments.

In the Investor FIZ portfolio one may find such "gems" as Tamex:

Source: Stooq

Tamex is listed on the NewConnect alternative market, and sometimes its shares are not traded at all for dozens of days.

Another "interesting" case is Indian Spice Mobile:

According to the latest Investor FIZ financial statement, the fund has Spice Mobile shares worth over PLN 6.3 million. Meanwhile the company's 30 day average volume is... PLN 6,205. The maximum daily turnover over the recent year was PLN 456k.

It is hardly possible to liquidate the position Investor FIZ has!

Similarly, Opera FIZ is stuck with illiquid and unprofitable/indebted private equity holdings like Termisil and SkyCash. It is hard to predict, whether these companies will become successful or be sold at a profit. What's clear however, they lock significant part of or even siphon off the fund's capital and make its return to growth extremely challenging.

Unfortunately there seems to be more skeletons in the closets at both these funds.

Marking to - totally illiquid - markets

Valuation of funds' assets is usually based on their market price. For example if a fund owns 1,000 shares of some listed company and the recent price of one share is 100, the accounting value of this holding is 1,000 times 100 or 100,000.

The problem here is, this valuation depends on the marginal price of the asset.

It is OK, as long as your position is small relative to the market turnover of this particular company.

However, such approach is very misleading when liquidity is low.

In such a situation, one should estimate the value of the asset based on the current, short term historical or eventually short term predicted liquidity.

In the case of the above mentioned Investor FIZ holdings - Tamex and Spice Mobile - the realistic value is close to zero. 

If Investor FIZ had decided to dump its shares on the market, they either would have tanked or it would not been possible at all.

Meanwhile, Tamex and Spice Mobility constitute some 11% of Investor FIZ assets.

Marking to magic

The situation is more complex with unlisted private equity holdings, such as Termsil and SkyCash.

Most often such assets are valued using discounted cash flow method (DCF), comparison method or adjusted purchasing price.

Each of these models is very sensitive to its assumptions and parameters. One can practically set any desired value in quite a broad range.

According to the most recent Opera FIZ financial statement, Termisil and SkyCash accounted for more than 20% of the funds' assets.

The real power of diversification

I have a problem with so called Modern Portfolio Theory.

To make it work, one "just need" to know returns of the particular shares to be put into the portfolio.

Having returns, you can easily calculate variances and covariances needed to optimize the structure of the portfolio - i.e. select the weights of the assets.

However, you need to know (estimate) the future returns!

And as Niels Bohr said, prediction is very difficult, especially about the future.

So, is there any other potential value of the portfolio diversification?

Actually, yes - it can reduce the liquidity risk.

A good example here may be the Renaissance Institutional Equity Fund (RIEF).

It is a younger (but bigger) sibling of the Renaissance Medallion Fund mentioned in my previous post.

At the end of 2012, RIEF manged USD 34.36 billion.

These assets were dispersed over 2,815 holdings. Hence the average single position represented just 0.0355%, and the largest 1.25% of the fund's portfolio.

The important benefit of such a wide diversification is that any single asset cannot significantly affect the whole portfolio.

The market and correlation risk are still present though. However, when you combine asset diversification with strategy diversification, it may be possible to significantly reduce the overall risk and generate positive returns.