Abstract
In this paper, we proposed an efficient data structure called “Sparse Matrices” for representing documents. The document database can be represented by using sparse matrices rather than dense matrices. The matrix can be given as an input for k-means algorithm. Using sparse matrices not only will reduce the size of the database as well as it found efficient in running the program. The experimental results have shown that sparse matrices gives good results compared to dense matrices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tan, A.-H.: Text Mining state of art and challenges. In: Proceedings of the PAKDD 1999 Workshop (1999)
Han, J., Kamber, M.: DataMining concepts and Techniques, 2nd edn. Morgan Kaufmann publishers (2006)
van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality Reduction a Comparative Review. Citeseer (2007)
Cui, X., Potok, T.E., Palathingal, P.: Document Clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium (2005)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
The TechTC-300 Test Collection for Text Categorization Version: 1.0 TechTC - Technion Repository of Text Categorization Datasets, Maintained by: Evgeniy Gabrilovich gabr@cs.technion.ac.il
Davis, T.A.: The University of Florida Sparse Matrix Collection
Tewarson, R.P.: Sparse matrices. ELSEVIER
Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., Janssens, F.: Hybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets. In: Siam Proceeding on Data Mining (2009)
Arnold, G., Holzl, J., Koksal, A.S., Bodík, R., Sagiv, M.: Specifying and Verifying Sparse Matrix Codes. In: ICFP 2010 Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Killani, R., Satapathy, S.C., Sowjanya, A.M. (2012). An Efficient Data Structure for Document Clustering Using K-Means Algorithm. In: Satapathy, S.C., Avadhani, P.S., Abraham, A. (eds) Proceedings of the International Conference on Information Systems Design and Intelligent Applications 2012 (INDIA 2012) held in Visakhapatnam, India, January 2012. Advances in Intelligent and Soft Computing, vol 132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27443-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-27443-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27442-8
Online ISBN: 978-3-642-27443-5
eBook Packages: EngineeringEngineering (R0)