Biomarker Identification by Feature Wrappers

  1. Momiao Xiong1,
  2. Xiangzhong Fang, and
  3. Jinying Zhao
  1. Human Genetics Center, University of Texas–Houston, Houston, Texas 77225, USA

Abstract

Gene expression studies bridge the gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. These studies open new avenues for identifying complex disease genes and biomarkers for disease diagnosis and for assessing drug efficacy and toxicity. However, the majority of analytical methods applied to gene expression data are not efficient for biomarker identification and disease diagnosis. In this paper, we propose a general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers. Using this framework, we develop three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being “wrapped around”: linear discriminant analysis, logistic regression, and support vector machines. To effectively carry out this computationally intensive search process, we employ sequential forward search and sequential forward floating search algorithms. To evaluate the performance of feature selection for biomarker identification we have applied the proposed methods to three data sets. The preliminary results demonstrate that very high classification accuracy can be attained by identified composite classifiers with several biomarkers.

Footnotes

  • 1 Corresponding author.

  • E-MAIL mxiong{at}utsph.sph.uth.tmc.edu; FAX (713) 500-0900.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.190001.

    • Received March 29, 2001.
    • Accepted August 23, 2001.
| Table of Contents

Preprint Server