Abstract
Document Clustering is the process of collecting similar kind of documents into one group based on any particular similarity function. Document clustering is also referred as text clustering. Informative features like phrases and their weights are considered to be more important to perform efficient document clustering. This paper mainly deals on two key parts for achieving efficient document clustering. The first part is a phrase based document model named as the Document Adjacency List, it explains about the construction of a phrase based model of the document set. It produces efficient phrase matching which is useful to decide the similarity among the documents. The second part is the document clustering algorithm that is proposed to enhance the Document Adjacency List for clustering based on the similarity measure. The combination of the above two parts leads to better calculation of similarity among documents and similarity further helps to calculate document clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)
Momin, B.F., Kulkarni, P.J., Chaudhari, A.: Web document clustering using document index graph, Sangli, India
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Zamir, O., Etzioni, O., Madanim, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining, pp. 287–290, August 1997
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Wang, K., Zhang, J., Li, D., Zhangand, X., Guo, T.: Adaptive affinity propagation clustering. Acta Autom. Sinica 33, 1242–1246 (2007)
He, Y., Chen, Q., Wang, X., Xu, R., Bai, X., Meng, X.: An adaptive affinity propagation document clustering. In: 7th International Conference on Informatics and Systems (INFOS), pp. 1–7 (2010)
Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)
Nahm, U.Y., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: National Conference on Artificial Intelligence (AAAI 2000), pp. 627–632 (2000)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1–3), 233–272 (1999)
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM-self-organizing maps of document collections. In: Proceedings of WSOM 1997, Workshop on Self Organizing Maps, pp. 310–3315. ESP, Finland, June 1997
Cios, K., Pedrycz, W.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers, Berlin (1998)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988). Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings Eighth International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 436–442 (2002)
Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381–389 (1978)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Lukka, S., Shaik, R. (2018). A Well Organized Phrase-Based Document Clustering Using ASCII Values and Adjacency List. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-60618-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)