Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays

Bioinformatics. 2008 Apr 15;24(8):1056-62. doi: 10.1093/bioinformatics/btn053. Epub 2008 Mar 3.

Abstract

Motivation: Oligonucleotide arrays such as Affymetrix GeneChips use multiple probes, or a probe set, to measure the abundance of mRNA of every gene of interest. Some analysis methods attempt to summarize the multiple observations into one single score before conducting further analysis such as detecting differentially expressed genes (DEG), clustering and classification. However, there is a risk of losing a significant amount of information and consequently reaching inaccurate or even incorrect conclusions during this data reduction.

Results: We developed a novel statistical method called robustified multivariate analysis of variance (MANOVA) based on the traditional MANOVA model and permutation test to detect DEG for both one-way and two-way cases. It can be extended to detect some special patterns of gene expression through profile analysis across k (>or=2) populations. The method utilizes probe-level data and requires no assumptions about the distribution of the dataset. We also propose a method of estimating the null distribution using quantile normalization in contrast to the 'pooling' method (Section 3.1). Monte Carlo simulation and real data analysis are conducted to demonstrate the performance of the proposed method comparing with the 'pooling' method and the usual Analysis of Variance (ANOVA) test based on the summarized scores. It is found that the new method successfully detects DEG under desired false discovery rate and is more powerful than the competing method especially when the number of groups is small.

Availability: The package of robustified MANOVA can be downloaded from http://faculty.ucr.edu/~xpcui/software

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Analysis of Variance
  • Artifacts*
  • Data Interpretation, Statistical*
  • Gene Expression Profiling / methods*
  • Multivariate Analysis
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software*