www.gusucode.com > risk 案例源码程序 matlab代码 > risk/StressTestingRetailCreditExample.m
%% Stress Testing of Consumer Credit Default Probabilities Using Panel Data % % This example shows how to work with consumer (retail) credit panel data % to visualize observed default rates at different levels. It also shows % how to fit a model to predict probabilities of default and perform a % stress-testing analysis. % % The panel data set of consumer loans enables you to identify default rate % patterns for loans of different ages, or years on books. You can use % information about a score group to distinguish default rates for % different score levels. In addition, you can use macroeconomic % information to assess how the state of the economy affects consumer loan % default rates. % % A standard logistic regression model, a type of generalized linear model, % is fitted to the retail credit panel data with and without macroeconomic % predictors. The example describes how to fit a more advanced model to % account for panel data effects, a generalized linear mixed effects model. % However, the panel effects are negligible for the data set in this % example and the standard logistic model is preferred for efficiency. % % The standard logistic regression model predicts probabilities of default % for all score levels, years on books, and macroeconomic variable % scenarios. When the standard logistic regression model is used for a % stress-testing analysis, the model predicts probabilities of default for % a given baseline, as well as default probabilites for adverse and % severely adverse macroeconomic scenarios. %% Panel Data Description % % The main data set (|data|) contains the following variables: % % * |ID|: Loan identifier. % * |ScoreGroup|: Credit score at the beginning of the loan, discretized % into three groups: |High Risk|, |Medium Risk|, and |Low % Risk|. % * |YOB|: Years on books. % * |Default|: Default indicator. This is the response variable. % * |Year|: Calendar year. % % There is also a small data set (|dataMacro|) with macroeconomic data for % the corresponding calendar years: % % * |Year|: Calendar year. % * |GDP|: Gross domestic product growth (year over year). % * |Market|: Market return (year over year). % % The variables |YOB|, |Year|, |GDP|, and |Market| are observed at the end % of the corresponding calendar year. The score group is a discretization % of the original credit score when the loan started. A value of |1| for % |Default| means that the loan defaulted in the corresponding calendar % year. % % There is also a third data set (|dataMacroStress|) with baseline, % adverse, and severely adverse scenarios for the macroeconomic variables. % This table is used for the stress-testing analysis. % % This example uses simulated data, but the same approach has been % successfully applied to real data sets. % %% Load the Panel Data % % Load the data and view the first 10 and last 10 rows of the table. % The panel data is stacked, in the sense that observations for the same ID % are stored in contiguous rows, creating a tall, thin table. The panel is % unbalanced, because not all IDs have the same number of observations. load RetailCreditPanelData.mat fprintf('\nFirst ten rows:\n') disp(data(1:10,:)) fprintf('Last ten rows:\n') disp(data(end-9:end,:)) nRows = height(data); UniqueIDs = unique(data.ID); nIDs = length(UniqueIDs); fprintf('Total number of IDs: %d\n',nIDs) fprintf('Total number of rows: %d\n',nRows) %% Default Rates by Score Groups and Years on Books % % Use the credit score group as a grouping variable to compute the observed % default rate for each score group. For this, use the |varfun| function to % compute the mean of the |Default| variable, grouping by the |ScoreGroup| % variable. Plot the results on a bar chart. As expected, the default rate % goes down as the credit quality improves. DefRateByScore = varfun(@mean,data,'InputVariables','Default',... 'GroupingVariables','ScoreGroup'); NumScoreGroups = height(DefRateByScore); disp(DefRateByScore) figure; bar(double(DefRateByScore.ScoreGroup),DefRateByScore.mean_Default*100) set(gca,'XTickLabel',categories(data.ScoreGroup)) title('Default Rate vs. Score Group') xlabel('Score Group') ylabel('Observed Default Rate (%)') grid on %% % Next, compute default rates grouping by years on books (represented by % the |YOB| variable). The resulting rates are conditional one-year default % rates. For example, the default rate for the third year on books is the % proportion of loans defaulting in the third year, relative to the number % of loans that are in the portfolio past the second year. In other words, % the default rate for the third year is the number of rows with |YOB| = % |3| and |Default| = 1, divided by the number of rows with |YOB| = |3|. % % Plot the results. There is a clear downward trend, with default rates % going down as the number of years on books increases. Years three and % four have similar default rates. However, it is unclear from this plot % whether this is a characteristic of the loan product or an effect of the % macroeconomic environment. DefRateByYOB = varfun(@mean,data,'InputVariables','Default',... 'GroupingVariables','YOB'); NumYOB = height(DefRateByYOB); disp(DefRateByYOB) figure; plot(double(DefRateByYOB.YOB),DefRateByYOB.mean_Default*100,'-*') title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Observed Default Rate (%)') grid on %% % Now, group both by score group and number of years on books. Plot the % results. The plot shows that all score groups behave similarly as time % progresses, with a general downward trend. Years three and four are an % exception to the downward trend: the rates flatten for the |High Risk| % group, and go up in year three for the |Low Risk| group. DefRateByScoreYOB = varfun(@mean,data,'InputVariables','Default',... 'GroupingVariables',{'ScoreGroup','YOB'}); % Display output table to show the way it is structured % Display only the first 10 rows, for brevity disp(DefRateByScoreYOB(1:10,:)) disp(' ...') DefRateByScoreYOB2 = reshape(DefRateByScoreYOB.mean_Default,... NumYOB,NumScoreGroups); figure; plot(DefRateByScoreYOB2*100,'-*') title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Observed Default Rate (%)') legend(categories(data.ScoreGroup)) grid on %% Years on Books Versus Calendar Years % % The data contains three cohorts, or vintages: loans started in 1997, % 1998, and 1999. No loan in the panel data started after 1999. % % This section shows how to visualize the default rate for each cohort % separately. The default rates for all cohorts are plotted, both against % the number of years on books and against the calendar year. Patterns in % the years on books suggest the loan product characteristics. Patterns % in the calendar years suggest the influence of the macroeconomic % environment. % % From years two through four on books, the curves show different patterns % for the three cohorts. When plotted against the calendar year, however, % the three cohorts show similar behavior from 2000 through 2002. The % curves flatten during that period. % Get IDs of 1997, 1998, and 1999 cohorts IDs1997 = data.ID(data.YOB==1&data.Year==1997); IDs1998 = data.ID(data.YOB==1&data.Year==1998); IDs1999 = data.ID(data.YOB==1&data.Year==1999); % IDs2000AndUp is unused, it is only computed to show that this is empty, % no loans started after 1999 IDs2000AndUp = data.ID(data.YOB==1&data.Year>1999); % Get default rates for each cohort separately ObsDefRate1997 = varfun(@mean,data(ismember(data.ID,IDs1997),:),... 'InputVariables','Default','GroupingVariables','YOB'); ObsDefRate1998 = varfun(@mean,data(ismember(data.ID,IDs1998),:),... 'InputVariables','Default','GroupingVariables','YOB'); ObsDefRate1999 = varfun(@mean,data(ismember(data.ID,IDs1999),:),... 'InputVariables','Default','GroupingVariables','YOB'); % Plot against the years on books figure; plot(ObsDefRate1997.YOB,ObsDefRate1997.mean_Default*100,'-*') hold on plot(ObsDefRate1998.YOB,ObsDefRate1998.mean_Default*100,'-*') plot(ObsDefRate1999.YOB,ObsDefRate1999.mean_Default*100,'-*') hold off title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Default Rate (%)') legend('Cohort 97','Cohort 98','Cohort 99') grid on % Plot against the calendar year Year = unique(data.Year); figure; plot(Year,ObsDefRate1997.mean_Default*100,'-*') hold on plot(Year(2:end),ObsDefRate1998.mean_Default*100,'-*') plot(Year(3:end),ObsDefRate1999.mean_Default*100,'-*') hold off title('Default Rate vs. Calendar Year') xlabel('Calendar Year') ylabel('Default Rate (%)') legend('Cohort 97','Cohort 98','Cohort 99') grid on %% Model of Default Rates Using Score Group and Years on Books % % After you visualize the data, you can build predictive models for the % default rates. % % Split the panel data into training and testing sets, defining these sets % based on ID numbers. NumTraining = floor(0.6*nIDs); rng('default'); TrainIDInd = randsample(nIDs,NumTraining); TrainDataInd = ismember(data.ID,UniqueIDs(TrainIDInd)); TestDataInd = ~TrainDataInd; %% % The first model uses only score group and number of years on books as % predictors of the default rate _p_. The odds of defaulting are defined as % _p/(1-p)_. The logistic model relates the logarithm of the odds, or _log % odds_, to the predictors as follows: % % $$\log\left( \frac{p}{1-p} \right) = a_H + a_M 1_M + a_L 1_L + b_{YOB} % YOB + \epsilon$$ % % _1M_ is an indicator with a value |1| for |Medium Risk| loans and |0| % otherwise, and similarly for _1L_ for |Low Risk| loans. This is a % standard way of handling a categorical predictor such as |ScoreGroup|. % There is effectively a different constant for each risk level: _aH_ for % |High Risk|, _aH+aM_ for |Medium Risk|, and _aH+aL_ for |Low Risk|. % % To calibrate the model, call the |fitglm| function from Statistics and % Machine Learning Toolbox(TM). The formula above is expressed as % % |Default ~ 1 + ScoreGroup + YOB| % % The |1 + ScoreGroup| terms account for the baseline constant and the % adjustments for risk level. Set the optional argument |Distribution| to % |binomial| to indicate that a logistic model is desired (that is, a model % with log odds on the left side). ModelNoMacro = fitglm(data(TrainDataInd,:),... 'Default ~ 1 + ScoreGroup + YOB',... 'Distribution','binomial'); disp(ModelNoMacro) %% % For any row in the data, the value of _p_ is not observed, only a |0| % or |1| default indicator is observed. The calibration finds model % coefficients, and the predicted values of _p_ for individual rows can be % recovered with the |predict| function. %% % The |Intercept| coefficient is the constant for the |High Risk| level % (the _aH_ term), and the |ScoreGroup_Medium Risk| and |ScoreGroup_Low % Risk| coefficients are the adjustments for |Medium Risk| and |Low Risk| % levels (the _aM_ and _aL_ terms). % % The default probability _p_ and the log odds (the left side of the model) % move in the same direction when the predictors change. Therefore, because % the adjustments for |Medium Risk| and |Low Risk| are negative, the % default rates are lower for better risk levels, as expected. The % coefficient for number of years on books is also negative, consistent % with the overall downward trend for number of years on books observed in % the data. %% % To account for panel data effects, a more advanced model using mixed % effects can be fitted using the |fitglme| function from Statistics and % Machine Learning Toolbox(TM). Although this model is not fitted in this % example, the code is very similar: % % |ModelNoMacro = fitglme(data(TrainDataInd,:),...| % % |'Default ~ 1 + ScoreGroup + YOB + (1|ID)',...| % % |'Distribution','binomial');| % % The |(1|ID)| term in the formula adds a _random effect_ to the model. % This effect is a predictor whose values are not given in the data, but % calibrated together with the model coefficients. A random value is % calibrated for each ID. This additional calibration requirement % substantially increases the computational time to fit the model in this % case, because of the very large number of IDs. For the panel data set in % this example, the random term has a negligible effect. The variance of % the random effects is very small and the model coefficients barely change % when the random effect is introduced. The simpler logistic regression % model is preferred, because it is faster to calibrate and to predict, and % the default rates predicted with both models are essentially the same. %% % Predict the probability of default for training and testing data. data.PDNoMacro = zeros(height(data),1); % Predict in-sample data.PDNoMacro(TrainDataInd) = predict(ModelNoMacro,data(TrainDataInd,:)); % Predict out-of-sample data.PDNoMacro(TestDataInd) = predict(ModelNoMacro,data(TestDataInd,:)); %% % Visualize the in-sample fit. PredPDTrainYOB = varfun(@mean,data(TrainDataInd,:),... 'InputVariables',{'Default','PDNoMacro'},'GroupingVariables','YOB'); figure; scatter(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_Default*100,'*'); hold on plot(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_PDNoMacro*100); hold off xlabel('Years on Books') ylabel('Default Rate (%)') legend('Observed','Predicted') title('Model Fit (Training Data)') grid on %% % Visualize the out-of-sample fit. PredPDTestYOB = varfun(@mean,data(TestDataInd,:),... 'InputVariables',{'Default','PDNoMacro'},'GroupingVariables','YOB'); figure; scatter(PredPDTestYOB.YOB,PredPDTestYOB.mean_Default*100,'*'); hold on plot(PredPDTestYOB.YOB,PredPDTestYOB.mean_PDNoMacro*100); hold off xlabel('Years on Books') ylabel('Default Rate (%)') legend('Observed','Predicted') title('Model Fit (Testing Data)') grid on %% % Visualize the in-sample fit for all score groups. The out-of-sample fit % can be computed and visualized in a similar way. PredPDTrainScoreYOB = varfun(@mean,data(TrainDataInd,:),... 'InputVariables',{'Default','PDNoMacro'},... 'GroupingVariables',{'ScoreGroup','YOB'}); figure; hs = gscatter(PredPDTrainScoreYOB.YOB,... PredPDTrainScoreYOB.mean_Default*100,... PredPDTrainScoreYOB.ScoreGroup,'rbmgk','*'); mean_PDNoMacroMat = reshape(PredPDTrainScoreYOB.mean_PDNoMacro,... NumYOB,NumScoreGroups); hold on hp = plot(mean_PDNoMacroMat*100); for ii=1:NumScoreGroups hp(ii).Color = hs(ii).Color; end hold off xlabel('Years on Books') ylabel('Observed Default Rate (%)') legend(categories(data.ScoreGroup)) title('Model Fit by Score Group (Training Data)') grid on %% Model of Default Rates Including Macroeconomic Variables % % The trend predicted with the previous model, as a function of years on % books, has a very regular decreasing pattern. The data, however, shows % some deviations from that trend. To try to account for those deviations, % add the gross domestic product annual growth (represented by the |GDP| % variable) and stock market annual returns (represented by the |Market| % variable) to the model. % % $$\log\left( \frac{p}{1-p} \right) = a_H + a_M 1_M + a_L 1_L + b_{YOB} % YOB + b_{GDP} GDP + b_{Market} Market + \epsilon$$ % % Expand the data set to add one column for |GDP| and one for |Market|, % using the data from the |dataMacro| table. data.GDP = dataMacro.GDP(data.Year-1996); data.Market = dataMacro.Market(data.Year-1996); disp(data(1:10,:)) %% % Fit the model with the macroeconomic variables by expanding the model % formula to include the |GDP| and the |Market| variables. ModelMacro = fitglm(data(TrainDataInd,:),... 'Default ~ 1 + ScoreGroup + YOB + GDP + Market',... 'Distribution','binomial'); disp(ModelMacro) %% % Both macroeconomic variables show a negative coefficient, consistent with % the intuition that higher economic growth reduces default rates. %% % Predict the probability of default for the training and testing data. data.PDMacro = zeros(height(data),1); % Predict in-sample data.PDMacro(TrainDataInd) = predict(ModelMacro,data(TrainDataInd,:)); % Predict out-of-sample data.PDMacro(TestDataInd) = predict(ModelMacro,data(TestDataInd,:)); %% % Visualize the in-sample fit. As desired, the model including % macroeconomic variables, or macro model, deviates from the smooth trend % predicted by the previous model. The rates predicted with the macro model % match more closely with the observed default rates. PredPDTrainYOBMacro = varfun(@mean,data(TrainDataInd,:),... 'InputVariables',{'Default','PDMacro'},'GroupingVariables','YOB'); figure; scatter(PredPDTrainYOBMacro.YOB,PredPDTrainYOBMacro.mean_Default*100,'*'); hold on plot(PredPDTrainYOB.YOB,PredPDTrainYOB.mean_PDNoMacro*100); % No Macro plot(PredPDTrainYOBMacro.YOB,PredPDTrainYOBMacro.mean_PDMacro*100); % Macro hold off xlabel('Years on Books') ylabel('Default Rate (%)') legend('Observed','No Macro', 'Macro') title('Macro Model Fit (Training Data)') grid on %% % Visualize the out-of-sample fit. PredPDTestYOBMacro = varfun(@mean,data(TestDataInd,:),... 'InputVariables',{'Default','PDMacro'},'GroupingVariables','YOB'); figure; scatter(PredPDTestYOBMacro.YOB,PredPDTestYOBMacro.mean_Default*100,'*'); hold on plot(PredPDTestYOB.YOB,PredPDTestYOB.mean_PDNoMacro*100); % No Macro plot(PredPDTestYOBMacro.YOB,PredPDTestYOBMacro.mean_PDMacro*100); % Macro hold off xlabel('Years on Books') ylabel('Default Rate (%)') legend('Observed','No Macro', 'Macro') title('Macro Model Fit (Testing Data)') grid on %% % Visualize the in-sample fit for all score groups. PredPDTrainScoreYOBMacro = varfun(@mean,data(TrainDataInd,:),... 'InputVariables',{'Default','PDMacro'},... 'GroupingVariables',{'ScoreGroup','YOB'}); figure; hs = gscatter(PredPDTrainScoreYOBMacro.YOB,... PredPDTrainScoreYOBMacro.mean_Default*100,... PredPDTrainScoreYOBMacro.ScoreGroup,'rbmgk','*'); mean_PDMacroMat = reshape(PredPDTrainScoreYOBMacro.mean_PDMacro,... NumYOB,NumScoreGroups); hold on hp = plot(mean_PDMacroMat*100); for ii=1:NumScoreGroups hp(ii).Color = hs(ii).Color; end hold off xlabel('Years on Books') ylabel('Observed Default Rate (%)') legend(categories(data.ScoreGroup)) title('Macro Model Fit by Score Group (Training Data)') grid on %% Stress Testing of Probability of Default % % Use the fitted macro model to stress-test the predicted probabilities of % default. % % Assume the following are stress scenarios for the macroeconomic variables % provided, for example, by a regulator. disp(dataMacroStress) %% % Set up a basic data table for predicting the probabilities of default. % This is a dummy data table, with one row for each combination of score % group and number of years on books. dataBaseline = table; [ScoreGroup,YOB]=meshgrid(1:NumScoreGroups,1:NumYOB); dataBaseline.ScoreGroup = categorical(ScoreGroup(:),1:NumScoreGroups,... categories(data.ScoreGroup),'Ordinal',true); dataBaseline.YOB = YOB(:); dataBaseline.ID = ones(height(dataBaseline),1); dataBaseline.GDP = zeros(height(dataBaseline),1); dataBaseline.Market = zeros(height(dataBaseline),1); %% % To make the predictions, set the same macroeconomic conditions (baseline, % adverse, or severely adverse) for all combinations of score groups and % number of years on books. % Predict baseline the probabilities of default dataBaseline.GDP(:) = dataMacroStress.GDP('Baseline'); dataBaseline.Market(:) = dataMacroStress.Market('Baseline'); dataBaseline.PD = predict(ModelMacro,dataBaseline); % Predict the probabilities of default in the adverse scenario dataAdverse = dataBaseline; dataAdverse.GDP(:) = dataMacroStress.GDP('Adverse'); dataAdverse.Market(:) = dataMacroStress.Market('Adverse'); dataAdverse.PD = predict(ModelMacro,dataAdverse); % Predict the probabilities of default in the severely adverse scenario dataSevere = dataBaseline; dataSevere.GDP(:) = dataMacroStress.GDP('Severe'); dataSevere.Market(:) = dataMacroStress.Market('Severe'); dataSevere.PD = predict(ModelMacro,dataSevere); %% % Visualize the average predicted probability of default across score % groups under the three alternative regulatory scenarios. Here, all score % groups are implicitly weighted equally. However, predictions can also be % made at a loan level for any given portfolio to make the predicted % default rates consistent with the actual distribution of loans in the % portfolio. The same visualization can be produced for each score group % separately. PredPDYOB = zeros(NumYOB,3); PredPDYOB(:,1) = mean(reshape(dataBaseline.PD,NumYOB,NumScoreGroups),2); PredPDYOB(:,2) = mean(reshape(dataAdverse.PD,NumYOB,NumScoreGroups),2); PredPDYOB(:,3) = mean(reshape(dataSevere.PD,NumYOB,NumScoreGroups),2); figure; bar(PredPDYOB*100); xlabel('Years on Books') ylabel('Predicted Default Rate (%)') legend('Baseline','Adverse','Severe') title('Stress Test, Probability of Default') grid on %% References % % # Generalized Linear Models documentation: % http://www.mathworks.com/help/stats/generalized-linear-regression.html. % # Generalized Linear Mixed Effects Models documentation: % http://www.mathworks.com/help/stats/generalized-linear-mixed-effects-models.html. % # Federal Reserve, Comprehensive Capital Analysis and Review (CCAR): % http://www.federalreserve.gov/bankinforeg/ccar.htm. % # Bank of England, Stress Testing: % http://www.bankofengland.co.uk/financialstability/pages/fpc/stresstest.aspx. % # European Banking Authority, EU-Wide Stress Testing: % http://www.eba.europa.eu/risk-analysis-and-data/eu-wide-stress-testing.