www.gusucode.com > stats_featured 源码程序 matlab案例代码 > stats_featured/stattsdemo.m
%% Time Series Regression of Airline Passenger Data % This example shows how to analyze time series data using % Statistics and Machine Learning Toolbox(TM) features. % Copyright 2005-2014 The MathWorks, Inc. %% Air Passenger Data % First we create an array of monthly counts of airline passengers, % measured in thousands, for the period January 1949 through December 1960. % 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 y = [112 115 145 171 196 204 242 284 315 340 360 417 % Jan 118 126 150 180 196 188 233 277 301 318 342 391 % Feb 132 141 178 193 236 235 267 317 356 362 406 419 % Mar 129 135 163 181 235 227 269 313 348 348 396 461 % Apr 121 125 172 183 229 234 270 318 355 363 420 472 % May 135 149 178 218 243 264 315 374 422 435 472 535 % Jun 148 170 199 230 264 302 364 413 465 491 548 622 % Jul 148 170 199 242 272 293 347 405 467 505 559 606 % Aug 136 158 184 209 237 259 312 355 404 404 463 508 % Sep 119 133 162 191 211 229 274 306 347 359 407 461 % Oct 104 114 146 172 180 203 237 271 305 310 362 390 % Nov 118 140 166 194 201 229 278 306 336 337 405 432 ]; % Dec % Source: % Hyndman, R.J., Time Series Data Library, % http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/. % Copied in October, 2005. %% Create Time Series Object % When we create a time series object, we can keep the time information % along with the data values. We have monthly data, so we create an array % of dates and use it along with the Y data to create the time series % object. yr = repmat((1949:1960),12,1); mo = repmat((1:12)',1,12); time = datestr(datenum(yr(:),mo(:),1)); ts = timeseries(y(:),time,'name','AirlinePassengers'); ts.TimeInfo.Format = 'dd-mmm-yyyy'; tscol = tscollection(ts); plot(ts) %% Examine Trend and Seasonality % This series seems to have a strong seasonal component, with a trend that % may be linear or quadratic. Furthermore, the magnitude of the seasonal % variation increases as the general level increases. Perhaps a log % transformation would make the seasonal variation be more constant. First % we'll change the axis scale. h_gca = gca; h_gca.YScale = 'log'; %% % It appears that it would be easier to model the seasonal component on the % log scale. We'll create a new time series with a log transformation. tscol = addts(tscol, log(ts.data), 'logAirlinePassengers'); logts = tscol.logAirlinePassengers; %% % Now let's plot the yearly averages, with monthly deviations superimposed. % We want to see if the month-to-month variation within years appears % constant. For these manipulations treating the data as a matrix in a % month-by-year format, it's more convenient to operate on the original % data matrix. t = reshape(datenum(time),12,12); logy = log(y); ymean = repmat(mean(logy),12,1); ydiff = logy - ymean; x = yr + (mo-1)/12; plot(x,ymean,'b-',x,ymean+ydiff,'r-') title('Monthly variation within year') xlabel('Year') %% % Now let's reverse the years and months, and try to see if the % year-to-year trend is constant for each month. h_gca = gca; h_gca.Position = [0.13 0.58 0.78 0.34]; subplot(2,1,2); t = reshape(datenum(time),12,12); mmean = repmat(mean(logy,2),1,12); mdiff = logy - mmean; x = mo + (yr-min(yr(:)))/12; plot(x',mmean','b-',x',(mmean+mdiff)','r-') title('Yearly trend within month') xlabel('Month') %% Model Trend and Seasonality % Let's attempt to model this series as a linear trend plus a seasonal % component. subplot(1,1,1); X = [dummyvar(mo(:)), logts.time]; [b,bint,resid] = regress(logts.data, X); tscol = addts(tscol,X*b,'Fit1') plot(logts) hold on plot(tscol.Fit1,'Color','r') hold off legend('Data','Fit','location','NW') %% % Based on this graph, the fit appears to be good. The differences between % the actual data and the fitted values may well be small enough for our % purposes. % % But let's try to investigate this some more. We would like the residuals % to look independent. If there is autocorrelation (correlation between % adjacent residuals), then there may be an opportunity to model that and % make our fit better. Let's create a time series from the residuals and % plot it. tscol = addts(tscol,resid,'Resid1'); plot(tscol.Resid1) %% % The residuals do not look independent. In fact, the correlation between % adjacent residuals looks quite strong. We can test this formally using a % Durbin-Watson test. [p,dw] = dwtest(tscol.Resid1.data, X) %% % A low p-value for the Durbin-Watson statistic is an indication that the % residuals are correlated across time. A typical cutoff for hypothesis % tests is to decide that p<0.05 is significant. Here the very small % p-value gives strong evidence that the residuals are correlated. % % We can attempt to change the model to remove the autocorrelation. % The general shape of the curve is high in the middle and low at the ends. % This suggests that we should allow for a quadratic trend term. However, % it also appears that autocorrelation will remain after we add this term. % Let's try it. X = [dummyvar(mo(:)), logts.time, logts.time.^2]; [b2,bint,resid2] = regress(logts.data, X); tscol = addts(tscol,resid2,'Resid2'); plot(tscol.Resid2); [p,dw] = dwtest(tscol.Resid2.data, X) %% % Adding the squared term did remove the pronounced curvature in the % original residual plot, but both the plot and the new Durbin-Watson test % show that there is still significant correlation in the residuals. % % Autocorrelation like this could be the result of other causes that are % not captured in our X variable. Perhaps we could collect other data that % would help us improve our model and reduce the correlation. In the % absence of other data, we might simply add another parameter to the model % to represent the autocorrelation. Let's do that, removing the squared % term, and using an autoregressive model for the error. % % In an autoregressive process, we have two stages: % % Y(t) = X(t,:)*b + r(t) % regression model for original data % r(t) = rho * r(t-1) + u(t) % autoregressive model for residuals % % Unlike in the usual regression model when we would like the residual % series |r(t)| to be a set of independent values, this model allows the % residuals to follow an autoregressive model with its own error term % |u(t)| that consists of independent values. % % To create this model, we want to write an anonymous function |f| to % compute fitted values |Yfit|, so that |Y-Yfit| gives the u values: % % Yfit(t) = rho*Y(t-1) + (X(t,:) - rho*X(t-1,:))*b % % In this anonymous function we combine |[rho; b]| into a single parameter % vector |c|. The resulting residuals look much closer to an uncorrelated % series. r = corr(resid(1:end-1),resid(2:end)); % initial guess for rho X = [dummyvar(mo(:)), logts.time]; Y = logts.data; f = @(c,x) [Y(1); c(1)*Y(1:end-1) + (x(2:end,:)- c(1)*x(1:end-1,:))*c(2:end)]; c = nlinfit(X,Y,f,[r;b]); u = Y - f(c,X); tscol = addts(tscol,u,'ResidU'); plot(tscol.ResidU); %% Summary % This example provides an illustration of how to use the MATLAB(R) timeseries % object along with features from the Statistics and Machine Learning Toolbox. % It is simple to use the |ts.data| notation to extract the data and supply % it as input to any function. The |controlchart| % function also accepts time series objects directly. % % More elaborate analyses are possible by using features specifically % designed for time series, such as those in Econometrics Toolbox(TM) and % System Identification Toolbox(TM).