Skip to main content

A Well Organized Phrase-Based Document Clustering Using ASCII Values and Adjacency List

  • Conference paper
  • First Online:
Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016) (SoCPaR 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Included in the following conference series:

  • 1279 Accesses

Abstract

Document Clustering is the process of collecting similar kind of documents into one group based on any particular similarity function. Document clustering is also referred as text clustering. Informative features like phrases and their weights are considered to be more important to perform efficient document clustering. This paper mainly deals on two key parts for achieving efficient document clustering. The first part is a phrase based document model named as the Document Adjacency List, it explains about the construction of a phrase based model of the document set. It produces efficient phrase matching which is useful to decide the similarity among the documents. The second part is the document clustering algorithm that is proposed to enhance the Document Adjacency List for clustering based on the similarity measure. The combination of the above two parts leads to better calculation of similarity among documents and similarity further helps to calculate document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)

    Article  Google Scholar 

  2. Momin, B.F., Kulkarni, P.J., Chaudhari, A.: Web document clustering using document index graph, Sangli, India

    Google Scholar 

  3. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  4. Zamir, O., Etzioni, O., Madanim, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining, pp. 287–290, August 1997

    Google Scholar 

  5. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Wang, K., Zhang, J., Li, D., Zhangand, X., Guo, T.: Adaptive affinity propagation clustering. Acta Autom. Sinica 33, 1242–1246 (2007)

    MATH  Google Scholar 

  7. He, Y., Chen, Q., Wang, X., Xu, R., Bai, X., Meng, X.: An adaptive affinity propagation document clustering. In: 7th International Conference on Informatics and Systems (INFOS), pp. 1–7 (2010)

    Google Scholar 

  8. Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)

    Article  Google Scholar 

  9. Nahm, U.Y., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: National Conference on Artificial Intelligence (AAAI 2000), pp. 627–632 (2000)

    Google Scholar 

  10. Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1–3), 233–272 (1999)

    Article  MATH  Google Scholar 

  11. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM-self-organizing maps of document collections. In: Proceedings of WSOM 1997, Workshop on Self Organizing Maps, pp. 310–3315. ESP, Finland, June 1997

    Google Scholar 

  12. Cios, K., Pedrycz, W.: Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers, Berlin (1998)

    Book  MATH  Google Scholar 

  13. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988). Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Google Scholar 

  14. Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings Eighth International Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 436–442 (2002)

    Google Scholar 

  15. Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381–389 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Srikanth Lukka or Rizwana Shaik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Lukka, S., Shaik, R. (2018). A Well Organized Phrase-Based Document Clustering Using ASCII Values and Adjacency List. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60618-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60617-0

  • Online ISBN: 978-3-319-60618-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics