www.gusucode.com > bioinfo 案例源码程序 matlab代码 > bioinfo/CharacterizeGenesInvolvedInTheYeastDiauxicShiftExample.m
%% Characterize Genes Involved in the Yeast Diauxic Shift % This example shows how to explore gene expression profiles using the % attractor metagene algorithm [1]. This example uses data from the % microarray study of gene expression in yeast reported in [2]. % Copyright 2015 The MathWorks, Inc. %% Load Data Set % The authors used DNA microarrays to study temporal gene expression of % almost all genes in Saccharomyces cerevisiae during the metabolic shift % from fermentation to respiration. Expression levels were measured at % seven time points during the diauxic shift. The full data set can be % downloaded from the Gene Expression Omnibus website, % http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28. %% % Start by loading the data into MATLAB(R). The MAT-file |yeastdata.mat| % contains the |VALUE| data or |LOG_RAT2N_MEAN|, or log2 of ratio of % |CH2DN_MEAN| and |CH1DN_MEAN| from the seven time steps in the % experiment, the names of the genes, and an array of the times at which % the expression levels were measured. clear all load yeastdata whos yeastvalues genes %% % |genes| is a cell array of the gene names. You can access the entries using % MATLAB cell array indexing: genes{414} %% % This indicates that the 414th row of the variable |yeastvalues| contains % expression levels for the ORF YEL011W. %% % A simple plot can be used to show the expression profile for this ORF. plot(times, yeastvalues(414,:)) xlabel('Time (Hours)'); ylabel('Expression Level'); %% Clean Data Set % The data set contains several spots marked as |'EMPTY'|. These are empty % spots on the microarray, and, for the purpose of this example, let's % consider these points to be noises and remove them. emptySpots = strcmp('EMPTY',genes); yeastvalues = yeastvalues(~emptySpots,:); genes = genes(~emptySpots); fprintf('Number of genes after removing empty spots is %d.\n',numel(genes)) %% % The data set also contains several missing values, marked as |NaN|. sum(sum(isnan(yeastvalues))) %% % Impute these missing values using the |knnimpute| function which replaces % missing data with the corresponding value from an average of the k % columns that are nearest. Here, k is set to 3. yeastvalues = knnimpute(yeastvalues, 3); sum(sum(isnan(yeastvalues))) %% Filter Genes % The data set contains a lot of information corresponding to genes that do % not show any interesting changes during the experiment. To make it easier % to find the interesting genes, reduce the size of the data set by % removing genes with expression profiles that are flat. This flat data % indicates that genes associated with these profiles are not significantly % affected by the diauxic shift. In this example, you are interested in % genes with large changes in expression accompanying the shift. %% % You can use the |genevarfilter| function to filter out genes with small % variance over time. This function removes rows of |yeastvalues| with % variance less than the 10th percentile. [mask, yeastvalueshighexp, geneshighexp] = genevarfilter(yeastvalues,genes); numel(geneshighexp) %% % The function |genelowvalfilter| removes genes that have very low absolute % expression values. In this example, a fairly large value is used to % filter the genes. [mask, yeastvalueshighexp, geneshighexp] = genelowvalfilter(yeastvalueshighexp,geneshighexp,'absval',log2(3)); numel(geneshighexp) %% % Use the |geneentropyfilter| function to remove genes whose profiles have % low entropy. [mask, yeastvalueshighexp, geneshighexp] = geneentropyfilter(yeastvalueshighexp,geneshighexp,'prctile',15); numel(geneshighexp) %% Explore Gene Profiles using the Attractor Metagene Algorithm % Use the |metafeatures| function to look for all possible metagenes by % setting the |'Start'| name-value pair to the |'robust'| option. rng('default'); %For reproducibility [m,w,gs,gsI] = metafeatures(yeastvalueshighexp,geneshighexp,'Start','robust'); %% % The algorithm returns two unique metagenes for the data set. You can tell % this by checking the number of rows of the variable |m|. size(m) %% % The sorted list of genes for each metagene is returned in the |gs| % variable. Genes that contributed most to the discovered metagene are % listed at the top, that is, genes with larger weights. The first column % of |gs| contains genes for the first metagene and the second column for % the second metagene. size(gs) %% % Plot the expression profiles of top 20 genes of each metagene. figure for i = 1:2 subplot(1,2,i); for j = 1:20 predG = find(strcmp(geneshighexp,gs(j,i))); plot(times,yeastvalueshighexp(predG,:)); xlabel('Time (Hours)'); ylabel('Expression Level'); hold on end end suptitle('Expression Profiles of Top 20 Genes from Each Metagene') %% % From this figure, we can observe the top-ranked genes from the first % metagene were upregulated during the diauxic shift, and those from the % second metagene were downregulated. This implies that the attractor % metagene algorithm was able to differentiate and identify genes % associated with two distinct biomolecular events based on their % expression profiles. %% % Display the first 20 ORFs of the first metagene. gs(1:20,1) %% % Functions of many of these ORFs are unknown at this point, but among % those 20 ORFs, genes associated with some of these ORFs were reported by % [2] to be upregulated during the shift. For instance, the expression % levels of GLC3, which encodes |YEL011W|, 1,4-alpha-glucan branching % enzyme known to be involved in starch and sucrose metabolism, and SDH % genes, which encode |YDR178W| and |YLL041C| involving in the citric acid % cycle, increased during the diauxic shift [2]. %% % Display the first 20 ORFs of the second metagene. gs(1:20,2) %% % This list contains several genes related to protein synthesis, including % ribosomal proteins (|YMR229C|, |YJL177W|, |YDR060W|, |YKL082C|), % elongation and translation (|YPL226W|, |YLR009W|). This is in congruent % with the findings reported in [2]. %% References % [1] Cheng, W-Y., Ou Yang, T-H., and Anastassiou, D. (2013). % Biomolecular events in cancer revealed by attractor metagenes. PLoS % Computational Biology 9.2: e1002920. % % [2] DeRisi, J.L., Iyer, V.R., and Brown, P.O. (1997). Exploring the % metabolic and genetic control of gene expression on a genomic scale. % Science, 278(5338):680-6.