StressTestingRetailCreditExample.m risk 案例源码程序 matlab代码 - matlab工具箱源码

www.gusucode.com > risk 案例源码程序 matlab代码 > risk/StressTestingRetailCreditExample.m
    %% Stress Testing of Consumer Credit Default Probabilities Using Panel Data
%
% This example shows how to work with consumer (retail) credit panel data
% to visualize observed default rates at different levels. It also shows
% how to fit a model to predict probabilities of default and perform a
% stress-testing analysis.
%
% The panel data set of consumer loans enables you to identify default rate
% patterns for loans of different ages, or years on books. You can use
% information about a score group to distinguish default rates for
% different score levels. In addition, you can use macroeconomic
% information to assess how the state of the economy affects consumer loan
% default rates.
%
% A standard logistic regression model, a type of generalized linear model,
% is fitted to the retail credit panel data with and without macroeconomic
% predictors. The example describes how to fit a more advanced model to
% account for panel data effects, a generalized linear mixed effects model.
% However, the panel effects are negligible for the data set in this
% example and the standard logistic model is preferred for efficiency.
%
% The standard logistic regression model predicts probabilities of default
% for all score levels, years on books, and macroeconomic variable
% scenarios. When the standard logistic regression model is used for a
% stress-testing analysis, the model predicts probabilities of default for
% a given baseline, as well as default probabilites for adverse and
% severely adverse macroeconomic scenarios.

%% Panel Data Description
%
% The main data set (|data|) contains the following variables:
%
% * |ID|:         Loan identifier.
% * |ScoreGroup|: Credit score at the beginning of the loan, discretized
%                 into three groups: |High Risk|, |Medium Risk|, and |Low
%                 Risk|.
% * |YOB|:        Years on books.
% * |Default|:    Default indicator. This is the response variable.
% * |Year|:       Calendar year.
%
% There is also a small data set (|dataMacro|) with macroeconomic data for
% the corresponding calendar years:
%
% * |Year|:       Calendar year.
% * |GDP|:        Gross domestic product growth (year over year).
% * |Market|:     Market return (year over year).
%
% The variables |YOB|, |Year|, |GDP|, and |Market| are observed at the end
% of the corresponding calendar year. The score group is a discretization
% of the original credit score when the loan started. A value of |1| for
% |Default| means that the loan defaulted in the corresponding calendar
% year.
%
% There is also a third data set (|dataMacroStress|) with baseline,
% adverse, and severely adverse scenarios for the macroeconomic variables.
% This table is used for the stress-testing analysis.
%
% This example uses simulated data, but the same approach has been
% successfully applied to real data sets.
%
%% Load the Panel Data
%
% Load the data and view the first 10 and last 10 rows of the table.
% The panel data is stacked, in the sense that observations for the same ID
% are stored in contiguous rows, creating a tall, thin table. The panel is
% unbalanced, because not all IDs have the same number of observations.

load RetailCreditPanelData.mat

fprintf('\nFirst ten rows:\n')
disp(data(1:10,:))

fprintf('Last ten rows:\n')
disp(data(end-9:end,:))

nRows = height(data);
UniqueIDs = unique(data.ID);
nIDs = length(UniqueIDs);

fprintf('Total number of IDs: %d\n',nIDs)
fprintf('Total number of rows: %d\n',nRows)

%% Default Rates by Score Groups and Years on Books
%
% Use the credit score group as a grouping variable to compute the observed
% default rate for each score group. For this, use the |varfun| function to
% compute the mean of the |Default| variable, grouping by the |ScoreGroup|
% variable. Plot the results on a bar chart. As expected, the default rate
% goes down as the credit quality improves.

DefRateByScore = varfun(@mean,data,'InputVariables','Default',...
   'GroupingVariables','ScoreGroup');
NumScoreGroups = height(DefRateByScore);

disp(DefRateByScore)

figure;
bar(double(DefRateByScore.ScoreGroup),DefRateByScore.mean_Default*100)
set(gca,'XTickLabel',categories(data.ScoreGroup))
title('Default Rate vs. Score Group')
xlabel('Score Group')
ylabel('Observed Default Rate (%)')
grid on

%%
% Next, compute default rates grouping by years on books (represented by
% the |YOB| variable). The resulting rates are conditional one-year default
% rates. For example, the default rate for the third year on books is the
% proportion of loans defaulting in the third year, relative to the number
% of loans that are in the portfolio past the second year. In other words,
% the default rate for the third year is the number of rows with |YOB| =
% |3| and |Default| = 1, divided by the number of rows with |YOB| = |3|.
%
% Plot the results. There is a clear downward trend, with default rates
% going down as the number of years on books increases. Years three and
% four have similar default rates. However, it is unclear from this plot
% whether this is a characteristic of the loan product or an effect of the
% macroeconomic environment.

DefRateByYOB = varfun(@mean,data,'InputVariables','Default',...
   'GroupingVariables','YOB');
NumYOB = height(DefRateByYOB);

disp(DefRateByYOB)

figure;
plot(double(DefRateByYOB.YOB),DefRateByYOB.mean_Default*100,'-*')
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
grid on

%%
% Now, group both by score group and number of years on books. Plot the
% results. The plot shows that all score groups behave similarly as time
% progresses, with a general downward trend. Years three and four are an
% exception to the downward trend: the rates flatten for the |High Risk|
% group, and go up in year three for the |Low Risk| group.

DefRateByScoreYOB = varfun(@mean,data,'InputVariables','Default',...
   'GroupingVariables',{'ScoreGroup','YOB'});

% Display output table to show the way it is structured
% Display only the first 10 rows, for brevity
disp(DefRateByScoreYOB(1:10,:))
disp('     ...')

DefRateByScoreYOB2 = reshape(DefRateByScoreYOB.mean_Default,...
   NumYOB,NumScoreGroups);
figure;
plot(DefRateByScoreYOB2*100,'-*')
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
legend(categories(data.ScoreGroup))
grid on

%% Years on Books Versus Calendar Years
%
% The data contains three cohorts, or vintages: loans started in 1997,
% 1998, and 1999. No loan in the panel data started after 1999.
%
% This section shows how to visualize the default rate for each cohort
% separately. The default rates for all cohorts are plotted, both against
% the number of years on books and against the calendar year. Patterns in
% the years on books suggest the loan product characteristics. Patterns
% in the calendar years suggest the influence of the macroeconomic
% environment.
%
% From years two through four on books, the curves show different patterns
% for the three cohorts. When plotted against the calendar year, however,
% the three cohorts show similar behavior from 2000 through 2002. The
% curves flatten during that period.

% Get IDs of 1997, 1998, and 1999 cohorts
IDs1997 = data.ID(data.YOB==1&data.Year==1997);
IDs1998 = data.ID(data.YOB==1&data.Year==1998);
IDs1999 = data.ID(data.YOB==1&data.Year==1999);
% IDs2000AndUp is unused, it is only computed to show that this is empty,
% no loans started after 1999
IDs2000AndUp = data.ID(data.YOB==1&data.Year>1999);

% Get default rates for each cohort separately
ObsDefRate1997 = varfun(@mean,data(ismember(data.ID,IDs1997),:),...
    'InputVariables','Default','GroupingVariables','YOB');

ObsDefRate1998 = varfun(@mean,data(ismember(data.ID,IDs1998),:),...
    'InputVariables','Default','GroupingVariables','YOB');

ObsDefRate1999 = varfun(@mean,data(ismember(data.ID,IDs1999),:),...
    'InputVariables','Default','GroupingVariables','YOB');

% Plot against the years on books
figure;
plot(ObsDefRate1997.YOB,ObsDefRate1997.mean_Default*100,'-*')
hold on
plot(ObsDefRate1998.YOB,ObsDefRate1998.mean_Default*100,'-*')
plot(ObsDefRate1999.YOB,ObsDefRate1999.mean_Default*100,'-*')
hold off
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Cohort 97','Cohort 98','Cohort 99')
grid on

% Plot against the calendar year
Year = unique(data.Year);
figure;
plot(Year,ObsDefRate1997.mean_Default*100,'-*')
hold on
plot(Year(2:end),ObsDefRate1998.mean_Default*100,'-*')
plot(Year(3:end),ObsDefRate1999.mean_Default*100,'-*')
hold off
title('Default Rate vs. Calendar Year')
xlabel('Calendar Year')
ylabel('Default Rate (%)')
legend('Cohort 97','Cohort 98','Cohort 99')
grid on

%% Model of Default Rates Using Score Group and Years on Books
%
% After you visualize the data, you can build predictive models for the
% default rates.
%
% Split the panel data into training and testing sets, defining these sets
% based on ID numbers.

NumTraining = floor(0.6*nIDs);

rng('default');
TrainIDInd = randsample(nIDs,NumTraining);
TrainDataInd = ismember(data.ID,UniqueIDs(TrainIDInd));
TestDataInd = ~TrainDataInd;

%%
% The first model uses only score group and number of years on books as
% predictors of the default rate _p_. The odds of defaulting are defined as
% _p/(1-p)_. The logistic model relates the logarithm of the odds, or _log
% odds_, to the predictors as follows:
%
% $$\log\left( \frac{p}{1-p} \right) = a_H + a_M 1_M + a_L 1_L + b_{YOB}
% YOB + \epsilon$$
%
% _1M_ is an indicator with a value |1| for |Medium Risk| loans and |0|
% otherwise, and similarly for _1L_ for |Low Risk| loans. This is a
% standard way of handling a categorical predictor such as |ScoreGroup|.
% There is effectively a different constant for each risk level: _aH_ for
% |High Risk|, _aH+aM_ for |Medium Risk|, and _aH+aL_ for |Low Risk|.
%
% To calibrate the model, call the |fitglm| function from Statistics and
% Machine Learning Toolbox(TM). The formula above is expressed as
%
% |Default ~ 1 + ScoreGroup + YOB|
%
% The |1 + ScoreGroup| terms account for the baseline constant and the
% adjustments for risk level. Set the optional argument |Distribution| to
% |binomial| to indicate that a logistic model is desired (that is, a model
% with log odds on the left side).

ModelNoMacro = fitglm(data(TrainDataInd,:),...
   'Default ~ 1 + ScoreGroup + YOB',...
   'Distribution','binomial');
disp(ModelNoMacro)
%%
% For any row in the data, the value of _p_ is not observed, only a |0|
% or |1| default indicator is observed. The calibration finds model
% coefficients, and the predicted values of _p_ for individual rows can be
% recovered with the |predict| function.

%%
% The |Intercept| coefficient is the constant for the |High Risk| level
% (the _aH_ term), and the |ScoreGroup_Medium Risk| and |ScoreGroup_Low
% Risk| coefficients are the adjustments for |Medium Risk| and |Low Risk|
% levels (the _aM_ and _aL_ terms).
%
% The default probability _p_ and the log odds (the left side of the model)
% move in the same direction when the predictors change. Therefore, because
% the adjustments for |Medium Risk| and |Low Risk| are negative, the
% default rates are lower for better risk levels, as expected. The
% coefficient for number of years on books is also negative, consistent
% with the overall downward trend for number of years on books observed in
% the data.

%%
% To account for panel data effects, a more advanced model using mixed
% effects can be fitted using the |fitglme| function from Statistics and
% Machine Learning Toolbox(TM). Although this model is not fitted in this
% example, the code is very similar:
%
% |ModelNoMacro = fitglme(data(TrainDataInd,:),...|
%
% |'Default ~ 1 + ScoreGroup + YOB + (1|ID)',...|
%
% |'Distribution','binomial');|
%
% The |(1|ID)| term in the formula adds a _random effect_ to the model.
% This effect is a predictor whose values are not given in the data, but
% calibrated together with the model coefficients. A random value is
% calibrated for each ID. This additional calibration requirement
% substantially increases the computational time to fit the model in this
% case, because of the very large number of IDs. For the panel data set in
% this example, the random term has a negligible effect. The variance of
% the random effects is very small and the model coefficients barely change
% when the random effect is introduced. The simpler logistic regression
% model is preferred, because it is faster to calibrate and to predict, and
% the default rates predicted with both models are essentially the same.

%%
% Predict the probability of default for training and testing data.

data.PDNoMacro = zeros(height(data),1);

% Predict in-sample
data.PDNoMacro(TrainDataInd) = predict(ModelNoMacro,data(TrainDataInd,:));
% Predict out-of-sample
data.PDNoMacro(TestDataInd) = predict(ModelNoMacro,data(TestDataInd,:));


%%
% Visualize the in-sample fit.

PredPDTrainYOB = varfun(@mean,data(TrainDataInd,:),...
    'InputVariables',{'Default','PDNoMacro'},'GroupingVariables','YOB');

figure;
scatter(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_Default*100,'*');
hold on
plot(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_PDNoMacro*100);
hold off
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Training Data)')
grid on

%%
% Visualize the out-of-sample fit.

PredPDTestYOB = varfun(@mean,data(TestDataInd,:),...
    'InputVariables',{'Default','PDNoMacro'},'GroupingVariables','YOB');

figure;
scatter(PredPDTestYOB.YOB,PredPDTestYOB.mean_Default*100,'*');
hold on
plot(PredPDTestYOB.YOB,PredPDTestYOB.mean_PDNoMacro*100);
hold off
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Observed','Predicted')
title('Model Fit (Testing Data)')
grid on

%%
% Visualize the in-sample fit for all score groups. The out-of-sample fit
% can be computed and visualized in a similar way.

PredPDTrainScoreYOB = varfun(@mean,data(TrainDataInd,:),...
    'InputVariables',{'Default','PDNoMacro'},...
    'GroupingVariables',{'ScoreGroup','YOB'});

figure;
hs = gscatter(PredPDTrainScoreYOB.YOB,...
    PredPDTrainScoreYOB.mean_Default*100,...
    PredPDTrainScoreYOB.ScoreGroup,'rbmgk','*');
mean_PDNoMacroMat = reshape(PredPDTrainScoreYOB.mean_PDNoMacro,...
   NumYOB,NumScoreGroups);
hold on
hp = plot(mean_PDNoMacroMat*100);
for ii=1:NumScoreGroups
   hp(ii).Color = hs(ii).Color;
end
hold off
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
legend(categories(data.ScoreGroup))
title('Model Fit by Score Group (Training Data)')
grid on


%% Model of Default Rates Including Macroeconomic Variables
%
% The trend predicted with the previous model, as a function of years on
% books, has a very regular decreasing pattern. The data, however, shows
% some deviations from that trend. To try to account for those deviations,
% add the gross domestic product annual growth (represented by the |GDP|
% variable) and stock market annual returns (represented by the |Market|
% variable) to the model. 
%
% $$\log\left( \frac{p}{1-p} \right) = a_H + a_M 1_M + a_L 1_L + b_{YOB}
% YOB + b_{GDP} GDP + b_{Market} Market + \epsilon$$
%
% Expand the data set to add one column for |GDP| and one for |Market|,
% using the data from the |dataMacro| table.

data.GDP = dataMacro.GDP(data.Year-1996);
data.Market = dataMacro.Market(data.Year-1996);
disp(data(1:10,:))

%%
% Fit the model with the macroeconomic variables by expanding the model
% formula to include the |GDP| and the |Market| variables.

ModelMacro = fitglm(data(TrainDataInd,:),...
   'Default ~ 1 + ScoreGroup + YOB + GDP + Market',...
   'Distribution','binomial');
disp(ModelMacro)

%%
% Both macroeconomic variables show a negative coefficient, consistent with
% the intuition that higher economic growth reduces default rates.

%%
% Predict the probability of default for the training and testing data.

data.PDMacro = zeros(height(data),1);

% Predict in-sample
data.PDMacro(TrainDataInd) = predict(ModelMacro,data(TrainDataInd,:));
% Predict out-of-sample
data.PDMacro(TestDataInd) = predict(ModelMacro,data(TestDataInd,:));

%%
% Visualize the in-sample fit. As desired, the model including
% macroeconomic variables, or macro model, deviates from the smooth trend
% predicted by the previous model. The rates predicted with the macro model
% match more closely with the observed default rates.

PredPDTrainYOBMacro = varfun(@mean,data(TrainDataInd,:),...
    'InputVariables',{'Default','PDMacro'},'GroupingVariables','YOB');

figure;
scatter(PredPDTrainYOBMacro.YOB,PredPDTrainYOBMacro.mean_Default*100,'*');
hold on
plot(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_PDNoMacro*100); % No Macro
plot(PredPDTrainYOBMacro.YOB,PredPDTrainYOBMacro.mean_PDMacro*100); % Macro
hold off
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Observed','No Macro', 'Macro')
title('Macro Model Fit (Training Data)')
grid on


%%
% Visualize the out-of-sample fit.

PredPDTestYOBMacro = varfun(@mean,data(TestDataInd,:),...
    'InputVariables',{'Default','PDMacro'},'GroupingVariables','YOB');

figure;
scatter(PredPDTestYOBMacro.YOB,PredPDTestYOBMacro.mean_Default*100,'*');
hold on
plot(PredPDTestYOB.YOB,PredPDTestYOB.mean_PDNoMacro*100); % No Macro
plot(PredPDTestYOBMacro.YOB,PredPDTestYOBMacro.mean_PDMacro*100); % Macro
hold off
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Observed','No Macro', 'Macro')
title('Macro Model Fit (Testing Data)')
grid on

%%
% Visualize the in-sample fit for all score groups.

PredPDTrainScoreYOBMacro = varfun(@mean,data(TrainDataInd,:),...
    'InputVariables',{'Default','PDMacro'},...
    'GroupingVariables',{'ScoreGroup','YOB'});

figure;
hs = gscatter(PredPDTrainScoreYOBMacro.YOB,...
    PredPDTrainScoreYOBMacro.mean_Default*100,...
    PredPDTrainScoreYOBMacro.ScoreGroup,'rbmgk','*');
mean_PDMacroMat = reshape(PredPDTrainScoreYOBMacro.mean_PDMacro,...
   NumYOB,NumScoreGroups);
hold on
hp = plot(mean_PDMacroMat*100);
for ii=1:NumScoreGroups
   hp(ii).Color = hs(ii).Color;
end
hold off
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
legend(categories(data.ScoreGroup))
title('Macro Model Fit by Score Group (Training Data)')
grid on


%% Stress Testing of Probability of Default
%
% Use the fitted macro model to stress-test the predicted probabilities of
% default.
%
% Assume the following are stress scenarios for the macroeconomic variables
% provided, for example, by a regulator.

disp(dataMacroStress)

%%
% Set up a basic data table for predicting the probabilities of default.
% This is a dummy data table, with one row for each combination of score
% group and number of years on books.

dataBaseline = table;
[ScoreGroup,YOB]=meshgrid(1:NumScoreGroups,1:NumYOB);
dataBaseline.ScoreGroup = categorical(ScoreGroup(:),1:NumScoreGroups,...
   categories(data.ScoreGroup),'Ordinal',true);
dataBaseline.YOB = YOB(:);
dataBaseline.ID = ones(height(dataBaseline),1);
dataBaseline.GDP = zeros(height(dataBaseline),1);
dataBaseline.Market = zeros(height(dataBaseline),1);

%%
% To make the predictions, set the same macroeconomic conditions (baseline,
% adverse, or severely adverse) for all combinations of score groups and
% number of years on books.

% Predict baseline the probabilities of default
dataBaseline.GDP(:) = dataMacroStress.GDP('Baseline');
dataBaseline.Market(:) = dataMacroStress.Market('Baseline');
dataBaseline.PD = predict(ModelMacro,dataBaseline);

% Predict the probabilities of default in the adverse scenario
dataAdverse = dataBaseline;
dataAdverse.GDP(:) = dataMacroStress.GDP('Adverse');
dataAdverse.Market(:) = dataMacroStress.Market('Adverse');
dataAdverse.PD = predict(ModelMacro,dataAdverse);

% Predict the probabilities of default in the severely adverse scenario
dataSevere = dataBaseline;
dataSevere.GDP(:) = dataMacroStress.GDP('Severe');
dataSevere.Market(:) = dataMacroStress.Market('Severe');
dataSevere.PD = predict(ModelMacro,dataSevere);

%%
% Visualize the average predicted probability of default across score
% groups under the three alternative regulatory scenarios. Here, all score
% groups are implicitly weighted equally. However, predictions can also be
% made at a loan level for any given portfolio to make the predicted
% default rates consistent with the actual distribution of loans in the
% portfolio. The same visualization can be produced for each score group
% separately.

PredPDYOB = zeros(NumYOB,3);
PredPDYOB(:,1) = mean(reshape(dataBaseline.PD,NumYOB,NumScoreGroups),2);
PredPDYOB(:,2) = mean(reshape(dataAdverse.PD,NumYOB,NumScoreGroups),2);
PredPDYOB(:,3) = mean(reshape(dataSevere.PD,NumYOB,NumScoreGroups),2);

figure;
bar(PredPDYOB*100);
xlabel('Years on Books')
ylabel('Predicted Default Rate (%)')
legend('Baseline','Adverse','Severe')
title('Stress Test, Probability of Default')
grid on

%% References
%
% # Generalized Linear Models documentation:
% http://www.mathworks.com/help/stats/generalized-linear-regression.html.
% # Generalized Linear Mixed Effects Models documentation:
% http://www.mathworks.com/help/stats/generalized-linear-mixed-effects-models.html.
% # Federal Reserve, Comprehensive Capital Analysis and Review (CCAR):
% http://www.federalreserve.gov/bankinforeg/ccar.htm.
% # Bank of England, Stress Testing:
% http://www.bankofengland.co.uk/financialstability/pages/fpc/stresstest.aspx.
% # European Banking Authority, EU-Wide Stress Testing:
% http://www.eba.europa.eu/risk-analysis-and-data/eu-wide-stress-testing.