SSP'05 IEEE/SP 13th workshop on Statistical Signal Processing
July, 17-20, 2005 - Bordeaux - France

Welcome Program By Session By Author By ID

Information regarding the paper

Title
Estimating the number of clusters in microarray data sets based on an information theoretic criterion
Author(s)
Daniel Nicorici Tampere University of Technology
Jaakko Astola Tampere University of Technology
Olli Yli-Harja Tampere University of Technology
Get the paper in PDF format
 
To obtain Acrobat Reader (version 5 minimum required) necessary to his read.

Abstract

This study focuses on an information theoretic approach for estimating the number of clusters K, in microarray data sets. We present an automatic method for estimating K, based on a particular version of the Normalized Maximum Likelihood (NML) model. The strength of the Minimum Description Length (MDL) methods, such as the NML model, in statistical inference is to find the model structure which, in this particular clustering problem, amounts to find the best number of clusters and the best cluster structure for the data. The models are compared using the NML code length. The study introduces a new method for computing the code length of the encoded clustering vector for the data samples, based on the NML model. Experiments with publicly available microarray data sets demonstrate the ability of the new method to find the biologically meaningful clusters.

©2005 IEEE
Edition : Télécom Paris -- 2005