Exploring Multidimensional Data
The Romanian Academy, Romanian Academy Library
Abstract: Data exploring is a set of methods for describing and analyzing multidimensional data used in any area where data are too numerous to be comprehended by a human mind. Some of the methods are helpful in revealing relationships that may exist between different data and in developing statistical information to enable a succinct description of the information contained. Others allow data regrouping to disclose their homogenous part, thus permitting their better understanding and defining.
Multidimensional exploratory methods are descriptive, mostly geometric, based on a major mathematical tool, the matrix algebra, expressing, without assuming a priori, a probabilistic model. These methods allow mainly information processing and a synthesis of large tables of data by estimating the correlations between the variables studied, the statistical tools used being the correlation matrix or the variance-covariance matrix.
An exploratory approach allows data prospector to address one of the main objectives of data mining, that is exploring multidimensional data and dimension reduction: graphical representation, deduction of representative subsets of variables or a set of components preceding other methods.
The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s ; as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created . As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data [7,16]. Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. The limitations also affect Internet search, finance and business informatics.
Keywords: Canonical Correlation Analysis, Multiple Correspondence Analysis, Correspondence Analysis, Canonical Discriminant Analysis, Principal Component Analysis.
- BACCINI,; BESSE, P.: Data mining / Exploration Statistique. Toulouse: INSA, 2010,
- BENZÉCRI,-P.: Histoire et Préhistoire de l’Analyse des données: Partie 5. Les Cahiers de l’analyse des données, vol. 2, no.1, 1977, pp. 9-40.
- ENĂCHESCU,: Data Mining – metode şi aplicaţii. Bucureşti: Editura Academiei Române, 2009, 277 p.
- FALGUEROLLES,: L’analyse des données: before and around., Electronic Journal for History of Probability and Statistics, vol. 4, no. 2, dec. 2008.
- FILIP,: Decizie asistată de calculator: decizii, decidenţi – metode de bază şi instrumente informatice asociate. Ediția a 2-a, rev. Bucureşti: Editura Tehnică, 2005, 376 p.
- FILIP,: Sisteme suport pentru decizii. Ediția a 2-a, rev. Bucureşti: Editura Tehnică, 2007, 364 p.
- FRANCIS, : Future telescope array drives development of exabyte processing. 2012 (http://arstechnica.com/science/2012/04/future-telescope-array-drives-development-of-exabyte-processing/ , accesat 2012-12-18).
- GORUNESCU,: Data Mining, Concepts, Models and Techniques. Springer-Heidelberg, series Intelligent Systems Reference Library, 2011, 372 p.
- HASTIE, T.; TIBSHIRANI, R.; FRIEDMAN, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition. Springer-Verlag, Springer Series in Statistics, 2008, 763 p.
- HILBERT, ; LOPEZ, : The World’s Technological Capacity to Store, Communicate, and Compute Information. Science, Vol. 332, 6025, apr. 2011 p. 60-65.
- IBM – Bringing big data to the enterprise (http://www-01.ibm.com/software/data/bigdata/, accesat 2012-12-18).
- MĂRGINEAN,: Sisteme inteligente pentru asistarea deciziilor. Editura Risoprint, Cluj-Napoca, 2006, 239 p.
- PENG,; KOU,; SHI, Y.; CHEN, Z.: A descriptive framework for the field of data mining and knowledge discovery. International Journal of Information Technology & Decision Making, Vol. 7, No. 4, 2008, pp. 639-682.
- TAN, P-N.; STEINBACH,; KUMAR,: Introduction to Data Mining. Addison-Wesley, 2006, 769 p.
- TUFFERY,: Data mining et statistique décisionnelle, 3ème Edition. Editions TECHNIP, 2010, 705 p.
- WATTERS, : The Age of Exabytes: Tools and Approaches for Managing Big Data. Hewlett-Packard Development Company, 2010 (http://readwrite.com/2012/03/05/big-data , accesat 2012-12-18).
- WU,; KUMAR, V. (ed.): The Top Ten Algorithms in Data Mining. Chapman & Hall / CRC DMKD Series, 2009, 232 p
This work is licensed under a Creative Commons Attribution 4.0 International License.