Some time ago I wrote about Somers' D measure of association.
Somers' D is one of the many tools that can be used for validating scorecard models.
The task of a scorecard model is to assign scores, such as ratings, to the clients (in case of corporations usually called "counterparties") of a financial institution.
The ratings assigned by a scorecard model should reflect the probability of the client not meeting its obligations towards a bank or other financial institution.
As I showed in my previous post about relation between probability of default and interest rates, the higher risk of default needs to be compensate by higher interest rate charged.
Two aspects of a scorecard model are usually examined to assess the correctness of the model:
- discriminatory power
Discriminatory power is about the model's ability to rate clients/counterparties according to their credit quality, i.e. verify whether clients/counterparties of similar quality fall into similar ratings classes.
Meanwhile, calibration verifies whether ratings given are correct indicators of future defaults. In other worlds, we compare probability of default forecast by the model with actual ("true") defaults, here.
Tools commonly used for the assessment of the models discriminatory power are Receiver Operating Characteristic (ROC), above mentioned Somers' D and Gini coefficient.
There is a number of tests for verifying model calibration. Probably the most often cited in the literature are:
- binomial test
- Hosmer-Leweshow test
- Spiegelhalter test
- normal test
In most of the cases, the scorecard validation tests require knowledge of the previous scores and actual past defaults.
When previous scores are not available and cannot be retroactively calculated, some external ratings - such as issued by credit agencies - may be used. An additional condition should be met in such a situation - the scores generated by the internal model need to be aligned with the external ratings.
As I noted before, the actual default rates change over time. Meanwhile, most of the scorecard validation tools does not take this fluctuations into account. They assume that the future will be similar to the (averaged or recent) past.
In addition, the mentioned above validation tools do not take into consideration the differences in characteristics of the past and current assets being evaluated by the scorecard model. It is assumed the model will take care of the differences. However, the assets being rated may have pretty complex internal structures hidden from the model.
Lastly, scorecard models do not usually say anything about the possible recovery rates and resolution times.
If you would like to play a little with scorecard validation tests, you may take a look at:
[ Scorecard validation tests in R ]
[ Scorecard validation tests examples in R ]
[ Measures of association in R ]
Validation tests in R