A Well Organized Phrase-Based Document Clustering Using ASCII Values and Adjacency List

Lukka, Srikanth; Shaik, Rizwana

doi:10.1007/978-3-319-60618-7_12

Srikanth Lukka¹⁸ &
Rizwana Shaik¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

1279 Accesses

Abstract

Document Clustering is the process of collecting similar kind of documents into one group based on any particular similarity function. Document clustering is also referred as text clustering. Informative features like phrases and their weights are considered to be more important to perform efficient document clustering. This paper mainly deals on two key parts for achieving efficient document clustering. The first part is a phrase based document model named as the Document Adjacency List, it explains about the construction of a phrase based model of the document set. It produces efficient phrase matching which is useful to decide the similarity among the documents. The second part is the document clustering algorithm that is proposed to enhance the Document Adjacency List for clustering based on the similarity measure. The combination of the above two parts leads to better calculation of similarity among documents and similarity further helps to calculate document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)
Article Google Scholar
Momin, B.F., Kulkarni, P.J., Chaudhari, A.: Web document clustering using document index graph, Sangli, India
Google Scholar
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Zamir, O., Etzioni, O., Madanim, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining, pp. 287–290, August 1997
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Article MathSciNet MATH Google Scholar
Wang, K., Zhang, J., Li, D., Zhangand, X., Guo, T.: Adaptive affinity propagation clustering. Acta Autom. Sinica 33, 1242–1246 (2007)
MATH Google Scholar
He, Y., Chen, Q., Wang, X., Xu, R., Bai, X., Meng, X.: An adaptive affinity propagation document clustering. In: 7th International Conference on Informatics and Systems (INFOS), pp. 1–7 (2010)
Google Scholar
Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)
Article Google Scholar
Nahm, U.Y., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: National Conference on Artificial Intelligence (AAAI 2000), pp. 627–632 (2000)
Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1–3), 233–272 (1999)
Article MATH Google Scholar
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM-self-organizing maps of document collections. In: Proceedings of WSOM 1997, Workshop on Self Organizing Maps, pp. 310–3315. ESP, Finland, June 1997
Google Scholar
Cios, K., Pedrycz, W.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers, Berlin (1998)
Book MATH Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988). Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Google Scholar
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings Eighth International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 436–442 (2002)
Google Scholar
Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381–389 (1978)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Vignan’s Lara Institute of Technology and Science, Vadlamudi, Guntur, India
Srikanth Lukka & Rizwana Shaik

Authors

Srikanth Lukka
View author publications
You can also search for this author in PubMed Google Scholar
Rizwana Shaik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Srikanth Lukka or Rizwana Shaik .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research, Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, USA
Ajith Abraham
VIT University, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
School of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal
Ana Maria Madureira
Universiti Teknikal Malaysia Melaka, Durian Tunggal, Malaysia
Azah Kamilah Muda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lukka, S., Shaik, R. (2018). A Well Organized Phrase-Based Document Clustering Using ASCII Values and Adjacency List. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-60618-7_12
Published: 19 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60617-0
Online ISBN: 978-3-319-60618-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics