www.gusucode.com > bioinfo 案例源码程序 matlab代码 > bioinfo/CharacterizeGenesInvolvedInTheYeastDiauxicShiftExample.m

    %% Characterize Genes Involved in the Yeast Diauxic Shift
% This example shows how to explore gene expression profiles using the
% attractor metagene algorithm [1]. This example uses data from the
% microarray study of gene expression in yeast reported in [2].

% Copyright 2015 The MathWorks, Inc.


%% Load Data Set
% The authors used DNA microarrays to study temporal gene expression of
% almost all genes in Saccharomyces cerevisiae during the metabolic shift
% from fermentation to respiration. Expression levels were measured at
% seven time points during the diauxic shift. The full data set can be
% downloaded from the Gene Expression Omnibus website,
% http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28.

%%
% Start by loading the data into MATLAB(R). The MAT-file |yeastdata.mat|
% contains the |VALUE| data or |LOG_RAT2N_MEAN|, or log2 of ratio of
% |CH2DN_MEAN| and |CH1DN_MEAN| from the seven time steps in the
% experiment, the names of the genes, and an array of the times at which
% the expression levels were measured.
clear all
load yeastdata
whos yeastvalues genes
%%
% |genes| is a cell array of the gene names. You can access the entries using
% MATLAB cell array indexing:
genes{414}
%%
% This indicates that the 414th row of the variable |yeastvalues| contains
% expression levels for the ORF YEL011W. 

%%
% A simple plot can be used to show the expression profile for this ORF.
plot(times, yeastvalues(414,:))
xlabel('Time (Hours)');
ylabel('Expression Level');

%% Clean Data Set
% The data set contains several spots marked as |'EMPTY'|. These are empty
% spots on the microarray, and, for the purpose of this example, let's
% consider these points to be noises and remove them.
emptySpots = strcmp('EMPTY',genes);
yeastvalues = yeastvalues(~emptySpots,:);
genes = genes(~emptySpots);
fprintf('Number of genes after removing empty spots is %d.\n',numel(genes))
%%
% The data set also contains several missing values, marked as |NaN|.
sum(sum(isnan(yeastvalues)))
%%
% Impute these missing values using the |knnimpute| function which replaces
% missing data with the corresponding value from an average of the k
% columns that are nearest. Here, k is set to 3.
yeastvalues = knnimpute(yeastvalues, 3);
sum(sum(isnan(yeastvalues)))

%% Filter Genes
% The data set contains a lot of information corresponding to genes that do
% not show any interesting changes during the experiment. To make it easier
% to find the interesting genes, reduce the size of the data set by
% removing genes with expression profiles that are flat. This flat data
% indicates that genes associated with these profiles are not significantly
% affected by the diauxic shift. In this example, you are interested in
% genes with large changes in expression accompanying the shift.

%%
% You can use the |genevarfilter| function to filter out genes with small
% variance over time. This function removes rows of |yeastvalues| with
% variance less than the 10th percentile.
[mask, yeastvalueshighexp, geneshighexp] = genevarfilter(yeastvalues,genes);
numel(geneshighexp)
%%
% The function |genelowvalfilter| removes genes that have very low absolute
% expression values. In this example, a fairly large value is used to
% filter the genes.
[mask, yeastvalueshighexp, geneshighexp] = genelowvalfilter(yeastvalueshighexp,geneshighexp,'absval',log2(3));
numel(geneshighexp)
%%
% Use the |geneentropyfilter| function to remove genes whose profiles have
% low entropy.
[mask, yeastvalueshighexp, geneshighexp] = geneentropyfilter(yeastvalueshighexp,geneshighexp,'prctile',15);
numel(geneshighexp) 

%% Explore Gene Profiles using the Attractor Metagene Algorithm
% Use the |metafeatures| function to look for all possible metagenes by
% setting the |'Start'| name-value pair to the |'robust'| option.
rng('default'); %For reproducibility
[m,w,gs,gsI] = metafeatures(yeastvalueshighexp,geneshighexp,'Start','robust');

%%
% The algorithm returns two unique metagenes for the data set. You can tell
% this by checking the number of rows of the variable |m|.
size(m)
%%
% The sorted list of genes for each metagene is returned in the |gs|
% variable. Genes that contributed most to the discovered metagene are
% listed at the top, that is, genes with larger weights. The first column
% of |gs| contains genes for the first metagene and the second column for
% the second metagene.
size(gs)
%%
% Plot the expression profiles of top 20 genes of each metagene.
figure
for i = 1:2
    subplot(1,2,i);
    for j = 1:20
        predG = find(strcmp(geneshighexp,gs(j,i)));
        plot(times,yeastvalueshighexp(predG,:));
        xlabel('Time (Hours)');
        ylabel('Expression Level');
        hold on
    end
end
suptitle('Expression Profiles of Top 20 Genes from Each Metagene')
%%
% From this figure, we can observe the top-ranked genes from the first
% metagene were upregulated during the diauxic shift, and those from the
% second metagene were downregulated. This implies that the attractor
% metagene algorithm was able to differentiate and identify genes
% associated with two distinct biomolecular events based on their
% expression profiles.

%% 
% Display the first 20 ORFs of the first metagene.
gs(1:20,1)

%%
% Functions of many of these ORFs are unknown at this point, but among
% those 20 ORFs, genes associated with some of these ORFs were reported by
% [2] to be upregulated during the shift. For instance, the expression
% levels of GLC3, which encodes |YEL011W|, 1,4-alpha-glucan branching
% enzyme known to be involved in starch and sucrose metabolism, and SDH
% genes, which encode |YDR178W| and |YLL041C| involving in the citric acid
% cycle, increased during the diauxic shift [2].

%%
% Display the first 20 ORFs of the second metagene.
gs(1:20,2)
%%
% This list contains several genes related to protein synthesis, including
% ribosomal proteins (|YMR229C|, |YJL177W|, |YDR060W|, |YKL082C|),
% elongation and translation (|YPL226W|, |YLR009W|). This is in congruent
% with the findings reported in [2].

%% References
% [1] Cheng, W-Y., Ou Yang, T-H., and Anastassiou, D. (2013).
%     Biomolecular events in cancer revealed by attractor metagenes. PLoS
%     Computational Biology 9.2: e1002920.
%
% [2] DeRisi, J.L., Iyer, V.R., and Brown, P.O. (1997). Exploring the
%     metabolic and genetic control of gene expression on a genomic scale.
%     Science, 278(5338):680-6.