www.gusucode.com > stats_featured 源码程序 matlab案例代码 > stats_featured/bayesdemo.m
%% Bayesian Analysis for a Logistic Regression Model % This example shows how to make Bayesian inferences for a logistic % regression model using |slicesample|. % % Statistical inferences are usually based on maximum likelihood % estimation (MLE). MLE chooses the parameters that maximize the likelihood % of the data, and is intuitively appealing. In MLE, parameters are assumed % to be unknown but fixed, and are estimated with some confidence. In % Bayesian statistics, the uncertainty about the unknown parameters is % quantified using probability so that the unknown parameters are regarded % as random variables. % % Copyright 2005-2014 The MathWorks, Inc. %% Bayesian Inference % Bayesian inference is the process of analyzing statistical models with % the incorporation of prior knowledge about the model or model parameters. % The root of such inference is Bayes' theorem: % % $$P(\mathrm{parameters|data}) = \frac % {P(\mathrm{data|parameters}) \times P(\mathrm{parameters})} % {P(\mathrm{data})} % \propto \mathrm {likelihood} \times \mathrm{prior}$$ % % For example, suppose we have normal observations % % $$X|\theta \sim N(\theta, \sigma^2)$$ % % where sigma is known and the prior distribution for theta is % % $$\theta \sim N(\mu, \tau^2)$$ % % In this formula mu and tau, sometimes known as hyperparameters, are also % known. If we observe |n| samples of |X|, we can obtain the posterior % distribution for theta as % % $$\theta|X \sim N\left(\frac{\tau^2}{\sigma^2/n + \tau^2} \bar X + % \frac{\sigma^2/n}{{\sigma^2/n + \tau^2}} \mu, % \frac{(\sigma^2/n)\times \tau^2}{\sigma^2/n + % \tau^2}\right)$$ % % The following graph shows the prior, likelihood, and posterior for theta. rng(0,'twister'); n = 20; sigma = 50; x = normrnd(10,sigma,n,1); mu = 30; tau = 20; theta = linspace(-40, 100, 500); y1 = normpdf(mean(x),theta,sigma/sqrt(n)); y2 = normpdf(theta,mu,tau); postMean = tau^2*mean(x)/(tau^2+sigma^2/n) + sigma^2*mu/n/(tau^2+sigma^2/n); postSD = sqrt(tau^2*sigma^2/n/(tau^2+sigma^2/n)); y3 = normpdf(theta, postMean,postSD); plot(theta,y1,'-', theta,y2,'--', theta,y3,'-.') legend('Likelihood','Prior','Posterior') xlabel('\theta') %% Car Experiment Data % In some simple problems such as the previous normal mean inference % example, it is easy to figure out the posterior distribution in a closed % form. But in general problems that involve non-conjugate priors, the % posterior distributions are difficult or impossible to compute % analytically. We will consider logistic regression as an example. This % example involves an experiment to help model the proportion of cars of % various weights that fail a mileage test. The data include observations % of weight, number of cars tested, and number failed. We will work with % a transformed version of the weights to reduce the correlation in our % estimates of the regression parameters. % A set of car weights weight = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]'; weight = (weight-2800)/1000; % recenter and rescale % The number of cars tested at each weight total = [48 42 31 34 31 21 23 23 21 16 17 21]'; % The number of cars that have poor mpg performances at each weight poor = [1 2 0 3 8 8 14 17 19 15 17 21]'; %% Logistic Regression Model % Logistic regression, a special case of a generalized linear model, % is appropriate for these data since the response variable is binomial. % The logistic regression model can be written as: % % $$P(\mathrm{failure}) = \frac{e^{Xb}}{1+e^{Xb}}$$ % % where X is the design matrix and b is the vector containing the model % parameters. In MATLAB(R), we can write this equation as: logitp = @(b,x) exp(b(1)+b(2).*x)./(1+exp(b(1)+b(2).*x)); %% % If you have some prior knowledge or some non-informative priors are % available, you could specify the prior probability distributions for the % model parameters. For instance, in this example, we use normal priors for the % intercept |b1| and slope |b2|, i.e. prior1 = @(b1) normpdf(b1,0,20); % prior for intercept prior2 = @(b2) normpdf(b2,0,20); % prior for slope %% % By Bayes' theorem, the joint posterior distribution of the model parameters % is proportional to the product of the likelihood and priors. post = @(b) prod(binopdf(poor,total,logitp(b,weight))) ... % likelihood * prior1(b(1)) * prior2(b(2)); % priors %% % Note that the normalizing constant for the posterior in this model is % analytically intractable. However, even without knowing the normalizing % constant, you can visualize the posterior distribution, if you know the % approximate range of the model parameters. b1 = linspace(-2.5, -1, 50); b2 = linspace(3, 5.5, 50); simpost = zeros(50,50); for i = 1:length(b1) for j = 1:length(b2) simpost(i,j) = post([b1(i), b2(j)]); end; end; mesh(b2,b1,simpost) xlabel('Slope') ylabel('Intercept') zlabel('Posterior density') view(-110,30) %% % This posterior is elongated along a diagonal in the parameter space, % indicating that, after we look at the data, we believe that the % parameters are correlated. This is interesting, since before we % collected any data we assumed they were independent. The correlation % comes from combining our prior distribution with the likelihood function. %% Slice Sampling % Monte Carlo methods are often used in Bayesian data analysis to summarize % the posterior distribution. The idea is that, even if you cannot compute % the posterior distribution analytically, you can generate a random sample % from the distribution and use these random values to estimate the % posterior distribution or derived statistics such as the posterior mean, % median, standard deviation, etc. Slice sampling is an algorithm designed % to sample from a distribution with an arbitrary density function, known % only up to a constant of proportionality -- exactly what is needed for % sampling from a complicated posterior distribution whose normalization % constant is unknown. The algorithm does not generate independent samples, % but rather a Markovian sequence whose stationary distribution is the % target distribution. Thus, the slice sampler is a Markov Chain Monte % Carlo (MCMC) algorithm. However, it differs from other well-known MCMC % algorithms because only the scaled posterior need be specified -- no % proposal or marginal distributions are needed. % % This example shows how to use the slice sampler as part of a Bayesian % analysis of the mileage test logistic regression model, including % generating a random sample from the posterior distribution for the model % parameters, analyzing the output of the sampler, and making inferences % about the model parameters. The first step is to generate a random % sample. initial = [1 1]; nsamples = 1000; trace = slicesample(initial,nsamples,'pdf',post,'width',[20 2]); %% Analysis of Sampler Output % After obtaining a random sample from the slice sampler, it is important % to investigate issues such as convergence and mixing, to determine % whether the sample can reasonably be treated as a set of random % realizations from the target posterior distribution. Looking at marginal % trace plots is the simplest way to examine the output. subplot(2,1,1) plot(trace(:,1)) ylabel('Intercept'); subplot(2,1,2) plot(trace(:,2)) ylabel('Slope'); xlabel('Sample Number'); %% % It is apparent from these plots is that the effects of the parameter % starting values take a while to disappear (perhaps fifty or so samples) % before the process begins to look stationary. % % It is also be helpful in checking for convergence to use a moving window % to compute statistics such as the sample mean, median, or standard % deviation for the sample. This produces a smoother plot than the raw % sample traces, and can make it easier to identify and understand any % non-stationarity. movavg = filter( (1/50)*ones(50,1), 1, trace); subplot(2,1,1) plot(movavg(:,1)) xlabel('Number of samples') ylabel('Means of the intercept'); subplot(2,1,2) plot(movavg(:,2)) xlabel('Number of samples') ylabel('Means of the slope'); %% % Because these are moving averages over a window of 50 iterations, the % first 50 values are not comparable to the rest of the plot. However, the % remainder of each plot seems to confirm that the parameter posterior % means have converged to stationarity after 100 or so iterations. It is % also apparent that the two parameters are correlated with each other, in % agreement with the earlier plot of the posterior density. % % Since the settling-in period represents samples that can not reasonably % be treated as random realizations from the target distribution, it's % probably advisable not to use the first 50 or so values at the beginning % of the slice sampler's output. You could just delete those rows of the % output, however, it's also possible to specify a "burn-in" period. This % is convenient when a suitable burn-in length is already known, perhaps % from previous runs. trace = slicesample(initial,nsamples,'pdf',post, ... 'width',[20 2],'burnin',50); subplot(2,1,1) plot(trace(:,1)) ylabel('Intercept'); subplot(2,1,2) plot(trace(:,2)) ylabel('Slope'); %% % These trace plots do not seem to show any non-stationarity, indicating % that the burn-in period has done its job. % % However, there is a second aspect of the trace plots that should also be % explored. While the trace for the intercept looks like high frequency % noise, the trace for the slope appears to have a lower frequency % component, indicating there autocorrelation between values at adjacent % iterations. We could still compute the mean from this autocorrelated % sample, but it is often convenient to reduce the storage requirements by % removing redundancy in the sample. If this eliminated the % autocorrelation, it would also allow us to treat this as a sample of % independent values. For example, you can thin out the sample by keeping % only every 10th value. trace = slicesample(initial,nsamples,'pdf',post,'width',[20 2], ... 'burnin',50,'thin',10); %% % To check the effect of this thinning, it is useful to estimate the sample % autocorrelation functions from the traces and use them to check if the % samples mix rapidly. F = fft(detrend(trace,'constant')); F = F .* conj(F); ACF = ifft(F); ACF = ACF(1:21,:); % Retain lags up to 20. ACF = real([ACF(1:21,1) ./ ACF(1,1) ... ACF(1:21,2) ./ ACF(1,2)]); % Normalize. bounds = sqrt(1/nsamples) * [2 ; -2]; % 95% CI for iid normal labs = {'Sample ACF for intercept','Sample ACF for slope' }; for i = 1:2 subplot(2,1,i) lineHandles = stem(0:20, ACF(:,i) , 'filled' , 'r-o'); lineHandles.MarkerSize = 4; grid('on') xlabel('Lag') ylabel(labs{i}) hold on plot([0.5 0.5 ; 20 20] , [bounds([1 1]) bounds([2 2])] , '-b'); plot([0 20] , [0 0] , '-k'); hold off a = axis; axis([a(1:3) 1]); end %% % The autocorrelation values at the first lag are significant for the % intercept parameter, and even more so for the slope parameter. We could % repeat the sampling using a larger thinning parameter in order to reduce % the correlation further. For the purposes of this example, however, % we'll continue to use the current sample. %% Inference for the Model Parameters % As expected, a histogram of the sample mimics the plot of the posterior % density. subplot(1,1,1) hist3(trace,[25,25]); xlabel('Intercept') ylabel('Slope') zlabel('Posterior density') view(-110,30) %% % You can use a histogram or a kernel smoothing density estimate to % summarize the marginal distribution properties of the posterior samples. subplot(2,1,1) hist(trace(:,1)) xlabel('Intercept'); subplot(2,1,2) ksdensity(trace(:,2)) xlabel('Slope'); %% % You could also compute descriptive statistics such as the posterior mean % or percentiles from the random samples. To determine if the sample size % is large enough to achieve a desired precision, it is helpful to monitor % the desired statistic of the traces as a function of the number of % samples. csum = cumsum(trace); subplot(2,1,1) plot(csum(:,1)'./(1:nsamples)) xlabel('Number of samples') ylabel('Means of the intercept'); subplot(2,1,2) plot(csum(:,2)'./(1:nsamples)) xlabel('Number of samples') ylabel('Means of the slope'); %% % In this case, it appears that the sample size of 1000 is more than % sufficient to give good precision for the posterior mean estimate. bHat = mean(trace) %% Summary % The Statistics and Machine Learning Toolbox(TM) offers a variety of % functions that allow you to specify likelihoods and priors easily. They % can be combined to derive a posterior distribution. The |slicesample| % function enables you to carry out Bayesian analysis in MATLAB using % Markov Chain Monte Carlo simulation. It can be used even in problems % with posterior distributions that are difficult to sample from using % standard random number generators.