Archives

Romanian Journal of Information Technology and Automatic Control / Vol. 23, No. 1, 2013


Exploring Multidimensional Data

Cornel LEPĂDATU

Abstract:

Data exploring is a set of methods for describing and analyzing multidimensional data used in any area where data are too numerous to be comprehended by a human mind. Some of the methods are helpful in revealing relationships that may exist between different data and in developing statistical information to enable a succinct description of the information contained. Others allow data regrouping to disclose their homogenous part, thus permitting their better understanding and defining. Multidimensional exploratory methods are descriptive, mostly geometric, based on a major mathematical tool, the matrix algebra, expressing, without assuming a priori, a probabilistic model. These methods allow mainly information processing and a synthesis of large tables of data by estimating the correlations between the variables studied, the statistical tools used being the correlation matrix or the variance-covariance matrix. An exploratory approach allows data prospector to address one of the main objectives of data mining, that is exploring multidimensional data and dimension reduction: graphical representation, deduction of representative subsets of variables or a set of components preceding other methods. The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s [10]; as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created [11]. As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data [7,16]. Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. The limitations also affect Internet search, finance and business informatics.

Keywords:
Canonical Correlation Analysis, Multiple Correspondence Analysis, Correspondence Analysis, Canonical Discriminant Analysis, Principal Component Analysis.

View full article:

CITE THIS PAPER AS:
Cornel LEPĂDATU, "Exploring Multidimensional Data", Romanian Journal of Information Technology and Automatic Control, ISSN 1220-1758, vol. 23(1), pp. 13-30, 2013.