Archives
Romanian Journal of Information Technology and Automatic Control / Vol. 16, No. 4, 2006
Clustering and Meta-clustering Gene Expression Data with Positive Matrix
Liviu BADEA, Doina ȚILIVEA
Although clustering is probably the most frequently used tool for data mining gene expression data, existing clustering approaches face at least one of the following problems in this domain: a huge number of variables (genes) as compared to the number of samples, high noise levels, the inability to naturally deal with overlapping clusters, the instability of the resulting clusters w.r.t. the initialization of the algorithm and/or the difficulty in clustering genes and samples simultaneously. In this paper we show that these problems (except maybe the first) can be elegantly dealt with by using nonnegative matrix factorizations to cluster genes and samples simultaneously while allowing for bicluster overlaps and by employing Positive Tensor Factorization to perform a two-way meta-clustering of the biclusters produced in several different clustering runs (thereby addressing the above-mentioned instability). The application of our approach to a large lung cancer dataset proved computationally tractable and was able to perfectly recover the histological classification of the various cancer subtypes represented in the dataset.
Keywords:
bioinformatics, data mining, gene expression data analysis, clustering, meta-clustering.
CITE THIS PAPER AS:
Liviu BADEA,
Doina ȚILIVEA,
"Clustering and Meta-clustering Gene Expression Data with Positive Matrix",
Romanian Journal of Information Technology and Automatic Control,
ISSN 1220-1758,
vol. 16(4),
pp. 19-28,
2006.