Institutul Național de Cercetare – Dezvoltare în Informatică, ICI București
Rezumat: Although clustering is probably the most frequently used tool for data mining gene expression data, existing clustering approaches face at least one of the following problems in this domain: a huge number of variables (genes) as compared to the number of samples, high noise levels, the inability to naturally deal with overlapping clusters, the instability of the resulting clusters w.r.t. the initialization of the algorithm and/or the difficulty in clustering genes and samples simultaneously. In this paper we show that these problems (except maybe the first) can be elegantly dealt with by using nonnegative matrix factorizations to cluster genes and samples simultaneously while allowing for bicluster overlaps and by employing Positive Tensor Factorization to perform a two-way meta-clustering of the biclusters produced in several different clustering runs (thereby addressing the above-mentioned instability). The application of our approach to a large lung cancer dataset proved computationally tractable and was able to perfectly recover the histological classification of the various cancer subtypes represented in the dataset.
Cuvinte cheie: bioinformatics, data mining, gene expression data analysis, clustering, meta-clustering.
COORDONATELE PENTRU CITAREA ACESTUI ARTICOL SUNT URMĂTOARELE:
Alexandru Adriana, Ianculescu Marilena, Jitaru Elena, Pârvan Monica, Clustering and Meta-clustering Gene Expression Data with Positive Matrix, Revista Română de Informatică şi Automatică (Romanian Journal of Information Technology and Automatic Control), ISSN 1220-1758, vol. 16(4), pp. 19-28, 2006.