Discovering Context Using Contextual Positional Regions Based on Chains of Frequent Terms in Text Documents

Kulkarni, Anagha; Tokekar, Vrinda; Kulkarni, Parag

doi:10.1007/978-3-319-23258-4_28

Anagha Kulkarni⁵,
Vrinda Tokekar⁶ &
Parag Kulkarni⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 385))

1481 Accesses
2 Citations

Abstract

While assigning importance to terms in Vector Space Model (VSM), most of the times, weights are assigned to terms straightaway. This way of assigning importance to terms fails to capture positional influence of terms in the document. To capture positional influence of terms, this paper proposes an algorithm to create Contextual Positional Regions (CPRs) called Dynamic Partitioning of Text Documents with Chains of Frequent Terms (DynaPart-CFT). Based on CPRs, Contextual Positional Influence (CPI) is calculated which helps in improving F-measure during text categorization. This novel way of assigning importance to terms is evaluated using three standard text datasets. The performance improvement is at the expense of small additional storage cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining text data, pp. 163–222. Springer (2012)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine learning 6(1), 37–66 (1991)
Google Scholar
Brézillon, P.: Context in problem solving: a survey. The Knowledge Engineering Review 14(01), 47–80 (1999)
Article Google Scholar
Brown, P.J., Bovey, J.D., Chen, X.: Context-aware applications: from the laboratory to the marketplace. IEEE Personal Communications 4(5), 58–64 (1997)
Article Google Scholar
Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, New York (1994)
Google Scholar
Chand, K.P., Narsimha, G.: An integrated approach to improve the text categorization using semantic measures. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining-Volume 2. SIST, vol. 32, pp. 39–47. Springer, Heidelberg (2015)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
Google Scholar
Chen, G., Kotz, D., et al.: A survey of context-aware mobile computing research. Tech. rep., Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College (2000)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Dao, N.: A new class of functions for describing logical structures in text. PhD thesis, Massachusetts Institute of Technology (2004)
Google Scholar
Dey, A.K.: Understanding and using context. Personal and ubiquitous computing 5(1), 4–7 (2001)
Article Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)
Google Scholar
Dumitrescu, A., Santini, S.: Think locally, search globally; context based information retrieval. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 396–401. IEEE (2009)
Google Scholar
Gawrysiak, P., Gancarz, L., Okoniewski, M.: Recording word position information for improved document categorization. In: Proceedings of the PAKDD Text Mining Workshop (2002)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)
Google Scholar
Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. Ldv Forum 20, 19–62 (2005)
Google Scholar
Hull, R., Neaves, P., Bedford-Roberts, J.: Towards situated computing. In: First International Symposium on Wearable Computers, Digest of Papers, pp. 146–153. IEEE (1997)
Google Scholar
Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Transactions on Computers 4(8), 966–974 (2005)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Kou, G., Peng, Y.: An application of latent semantic analysis for text categorization. International Journal of Computers Communications & Control 10(3), 357–369 (2015)
Article Google Scholar
Kulkarni, A., Tokekar, V., Kulkarni, P.: Identifying context of text documents using naïve bayes classification and apriori association rule mining. In: 2012 CSI Sixth International Conference on Software Engineering (CONSEG), pp. 1–4. IEEE (2012)
Google Scholar
Kulkarni, A., Tokekar, V., Kulkarni, P.: Text classification by enhancing weights of terms based on their positional appearances. International Journal of Computer Applications 78(9), 23–26 (2013)
Article Google Scholar
Kulkarni, A., Tokekar, V., Kulkarni, P.: Discovering context of labelled text documents using context similarity coefficient. Procedia Computer Science 49C(9), 118–127 (2015)
Article Google Scholar
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
Article Google Scholar
Lang, K.: Newsweeder: learning to filter netnews. In: Proc of 12th Intl Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Lewis, D.: Reuetrs-21578 text categorization test collection, dist 1.0 (1997)
Google Scholar
Lewis, D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Murata, M., Ma, Q., Uchimoto, K., Ozaku, H., Utiyama, M., Isahara, H.: Japanese probabilistic information retrieval using location and category information. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian languages, pp. 81–88. ACM (2000)
Google Scholar
Navrat, P., Taraba, T.: Context search. In: 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 99–102. IEEE (2007)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communication of the ACM 18 (1975)
Google Scholar
Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Network 8(5), 22–32 (1994)
Article Google Scholar
Stovall, J.G.: Writing for the Mass Media. 6th edn. Pearson Education (2006)
Google Scholar
Xue, X.B., Zhou, Z.H.: Distributional features for text categorization. TKDE 21(3), 428–442 (2009)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Cummins COE for Women, Pune, India
Anagha Kulkarni
Institute of Engineering and Technology, DAVV, Indore, India
Vrinda Tokekar
EkLAT, Pune, India
Parag Kulkarni

Authors

Anagha Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Vrinda Tokekar
View author publications
You can also search for this author in PubMed Google Scholar
Parag Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anagha Kulkarni .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’ Informazione (DINFO) , Universita degli Studi di Firenze, Firenze, Italy
Stefano Berretti
School of CS/IT, Indian Institute of Information Tech. and Management – Kerala (IIITM-K), Trivandrum, India
Sabu M. Thampi
Electrical and Computer Engineering, The University of Iowa College of Engineering, Iowa, Iowa, USA
Soura Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulkarni, A., Tokekar, V., Kulkarni, P. (2016). Discovering Context Using Contextual Positional Regions Based on Chains of Frequent Terms in Text Documents. In: Berretti, S., Thampi, S., Dasgupta, S. (eds) Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-319-23258-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-23258-4_28
Published: 22 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23257-7
Online ISBN: 978-3-319-23258-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics