Skip to main content

Discovering Context Using Contextual Positional Regions Based on Chains of Frequent Terms in Text Documents

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 385))

Abstract

While assigning importance to terms in Vector Space Model (VSM), most of the times, weights are assigned to terms straightaway. This way of assigning importance to terms fails to capture positional influence of terms in the document. To capture positional influence of terms, this paper proposes an algorithm to create Contextual Positional Regions (CPRs) called Dynamic Partitioning of Text Documents with Chains of Frequent Terms (DynaPart-CFT). Based on CPRs, Contextual Positional Influence (CPI) is calculated which helps in improving F-measure during text categorization. This novel way of assigning importance to terms is evaluated using three standard text datasets. The performance improvement is at the expense of small additional storage cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining text data, pp. 163–222. Springer (2012)

    Google Scholar 

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine learning 6(1), 37–66 (1991)

    Google Scholar 

  3. Brézillon, P.: Context in problem solving: a survey. The Knowledge Engineering Review 14(01), 47–80 (1999)

    Article  Google Scholar 

  4. Brown, P.J., Bovey, J.D., Chen, X.: Context-aware applications: from the laboratory to the marketplace. IEEE Personal Communications 4(5), 58–64 (1997)

    Article  Google Scholar 

  5. Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, New York (1994)

    Google Scholar 

  6. Chand, K.P., Narsimha, G.: An integrated approach to improve the text categorization using semantic measures. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining-Volume 2. SIST, vol. 32, pp. 39–47. Springer, Heidelberg (2015)

    Google Scholar 

  7. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)

    Google Scholar 

  8. Chen, G., Kotz, D., et al.: A survey of context-aware mobile computing research. Tech. rep., Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College (2000)

    Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  10. Dao, N.: A new class of functions for describing logical structures in text. PhD thesis, Massachusetts Institute of Technology (2004)

    Google Scholar 

  11. Dey, A.K.: Understanding and using context. Personal and ubiquitous computing 5(1), 4–7 (2001)

    Article  Google Scholar 

  12. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)

    Google Scholar 

  13. Dumitrescu, A., Santini, S.: Think locally, search globally; context based information retrieval. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 396–401. IEEE (2009)

    Google Scholar 

  14. Gawrysiak, P., Gancarz, L., Okoniewski, M.: Recording word position information for improved document categorization. In: Proceedings of the PAKDD Text Mining Workshop (2002)

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  16. Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)

    Google Scholar 

  17. Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. Ldv Forum 20, 19–62 (2005)

    Google Scholar 

  18. Hull, R., Neaves, P., Bedford-Roberts, J.: Towards situated computing. In: First International Symposium on Wearable Computers, Digest of Papers, pp. 146–153. IEEE (1997)

    Google Scholar 

  19. Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Transactions on Computers 4(8), 966–974 (2005)

    Google Scholar 

  20. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  21. Kou, G., Peng, Y.: An application of latent semantic analysis for text categorization. International Journal of Computers Communications & Control 10(3), 357–369 (2015)

    Article  Google Scholar 

  22. Kulkarni, A., Tokekar, V., Kulkarni, P.: Identifying context of text documents using naïve bayes classification and apriori association rule mining. In: 2012 CSI Sixth International Conference on Software Engineering (CONSEG), pp. 1–4. IEEE (2012)

    Google Scholar 

  23. Kulkarni, A., Tokekar, V., Kulkarni, P.: Text classification by enhancing weights of terms based on their positional appearances. International Journal of Computer Applications 78(9), 23–26 (2013)

    Article  Google Scholar 

  24. Kulkarni, A., Tokekar, V., Kulkarni, P.: Discovering context of labelled text documents using context similarity coefficient. Procedia Computer Science 49C(9), 118–127 (2015)

    Article  Google Scholar 

  25. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)

    Article  Google Scholar 

  26. Lang, K.: Newsweeder: learning to filter netnews. In: Proc of 12th Intl Conference on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

  27. Lewis, D.: Reuetrs-21578 text categorization test collection, dist 1.0 (1997)

    Google Scholar 

  28. Lewis, D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  29. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  30. Murata, M., Ma, Q., Uchimoto, K., Ozaku, H., Utiyama, M., Isahara, H.: Japanese probabilistic information retrieval using location and category information. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian languages, pp. 81–88. ACM (2000)

    Google Scholar 

  31. Navrat, P., Taraba, T.: Context search. In: 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 99–102. IEEE (2007)

    Google Scholar 

  32. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communication of the ACM 18 (1975)

    Google Scholar 

  33. Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Network 8(5), 22–32 (1994)

    Article  Google Scholar 

  34. Stovall, J.G.: Writing for the Mass Media. 6th edn. Pearson Education (2006)

    Google Scholar 

  35. Xue, X.B., Zhou, Z.H.: Distributional features for text categorization. TKDE 21(3), 428–442 (2009)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anagha Kulkarni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kulkarni, A., Tokekar, V., Kulkarni, P. (2016). Discovering Context Using Contextual Positional Regions Based on Chains of Frequent Terms in Text Documents. In: Berretti, S., Thampi, S., Dasgupta, S. (eds) Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-319-23258-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23258-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23257-7

  • Online ISBN: 978-3-319-23258-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics