How to Detect a Small Cluster in Big Data?
Resumo
Detecting small clusters in a large amount of data is a difficult problem, mainly when there are only a few samples to be detected. There are general purpose solutions for small cluster detection, but many times they are not adequate for specific data. Artificial Intelligence techniques have been proposed, because they present the advantage of requiring little or no a priori assumption on the data distributions. The amount and higher dimensional nature of big data makes it too complex to be processed and analyzed by traditional methods. Hierarchical Self Organizing Maps, (HSOM) can improve the decision making with an approach based on specialization of Self Organizing Maps (SOM), dimensionality reduction and visualization of clusters. The goal is to propose a methodology to detect and visualize small clusters in the data with a toy case, where traditional human based approaches are not possible or are too complex to process, and the results clearly demonstrate that the HSOM based method outperforms the most widely adopted traditional methods revealing a number of small clusters hidden in data.
Texto Completo:
PDF (English)Referências
Ahmed, M. U., & Funk, P. (2011). Mining rare cases in post-operative pain by means of outlier detection. In Signal Processing and Information Technology (ISSPIT), 2011 IEEE International Symposium, pp. 035-041.
Alvanides, S. & Openshaw, S. (1999). Geographical Information and Planning (Eds, Stillwell, J. C. H., Geertman, S. & Openshaw, S.) Springer-Verlag, pp. 299-315.
Bação, F., Lobo, V., & Painho, M. (2008). Applications of different self-organizing map variants to geographic information science problems. Self Organizing Maps: applications in geographic information science, pp. 21-44.
Ben-Gal, I. (2005). Outlier detection. Data Mining and Knowledge Discovery Handbook, pp. 131-146.
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: from concept to implementation. Prentice-Hall, Inc..
Chandola, V., Banerjee, A., & Kumar, V. (2007). Outlier detection: A survey. ACM Computing Surveys, to appear.
Chaudhary, A., Szalay, A. S., & Moore, A. W. (2002). Very fast outlier detection in large multidimensional datasets. In Proceedings of the ACM SIGMOD Workshop in Research Issues in Data Mining and Knowledge Discovery (DMKD).
Ferdousi, Z., & Maeda, A. (2006). Unsupervised outlier detection in time series data. In Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on IEEE.
Flexer, A. (2001). On the use of self-organizing maps for clustering and visualization. Intelligent Data Analysis. Vol. 5 (5), pp. 373-384.
He, Z., Xu, X., Huang, J. Z., & Deng, S. (2004). A frequent pattern discovery method for outlier detection. In Advances in Web-Age Information Management, Springer, Berlin Heidelberg, pp. 726-732.
Himberg, J. (2000). A SOM based cluster visualization and its application for false coloring. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference. Vol. 3, pp. 587-592.
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review. Vol. 22 (2), pp. 85-126.
Horn, M. E. (1995). Solution techniques for large regional partitioning problems. Geographical Analysis. Vol. 27 (3), pp. 230-248.
Ilango, V., Subramanian, R., & Vasudevan, V. (2012). A Five Step Procedure for Outlier Analysis in Data Mining. European Journal of Scientific Research. Vol. 75 (3), pp. 327- 339.
Kaski, S., & Kohonen, T. (1996). Exploratory data analysis of the self-organizing map: Structures of welfare and poverty in the world. Neural networks in financial engineering. Proceedings of the third international conference on neural networks in the capital markets.
Kaski, S., Venna, J., & Kohonen, T. (1999). Coloring that reveals high-dimensional structures in data. In Neural Information Processing, 1999. Proceedings. ICONIP'99. 6th International Conference. Vol. 2, pp. 729-734.
Kaski, S., J. Nikkilä & T. Kohonen (1998). Methods for interpreting a Self Organizing Maps in data analysis. Proceedings of ESANN'98, 6th European Symposium on Artificial Neural Networks, Bruges, Belgium, D-Facto.
Kohonen, T. (1982). Self-organizing formation of topologically correct feature maps. RecMap: rectangular map approximations. Vol. 43 (1), pp. 59-69.
Kohonen, T. (1998). The self-organizing map. Neurocomputing. Vol. 21 (1), pp. 1-6.
Kohonen, T. (2001). SOM, Vol. 30. Springer Verlag.
Li, J., Huang, K. Y., Jin, J., & Shi, J. (2008). A survey of statistical methods for health care outlier detection. Health care management science. Vol. 11 (3), 275-287.
Lobo, V., Cabral, P., & Bação, F. (2007). Self Organizing Maps for urban modelling. In Proceedings. 9th International Conference on Geocomputation.
Macmillan, W. D. & Pierce, T. (1994) In Spatial Analysis and GIS (Eds, Fotheringham).
Mehrotra, A., Johnson, E. L., & Nemhauser, G. L. (1998). An optimization based heuristic for political districting. Management Science. Vol. 44 (8), pp. 1100-1114.
Skupin, A., & Agarwal, P. (2008). Introduction: What is a Self‐Organizing Map?. Self- organizing maps: Applications in geographic information science, pp. 1-20.
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (1999). Self-organizing map in Matlab: the SOM Toolbox. In Proceedings of the Matlab DSP conference. Vol. 99, pp. 16-17.
Wasserman, P. D. (1989). Neural computing: Theory and practice. Van Nostrand Reinhold, New York, pp. 44-54
Yang, W. S., & Hwang, S. Y. (2006). A process-mining framework for the detection of health care outlier and abuse. Expert Systems with Applications. Vol. 31 (1), pp. 56-68.
DOI: http://dx.doi.org/10.18803/capsi.v14.162-173
Apontamentos
- Não há apontamentos.