Abstract
While assigning importance to terms in Vector Space Model (VSM), most of the times, weights are assigned to terms straightaway. This way of assigning importance to terms fails to capture positional influence of terms in the document. To capture positional influence of terms, this paper proposes an algorithm to create Contextual Positional Regions (CPRs) called Dynamic Partitioning of Text Documents with Chains of Frequent Terms (DynaPart-CFT). Based on CPRs, Contextual Positional Influence (CPI) is calculated which helps in improving F-measure during text categorization. This novel way of assigning importance to terms is evaluated using three standard text datasets. The performance improvement is at the expense of small additional storage cost.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining text data, pp. 163–222. Springer (2012)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine learning 6(1), 37–66 (1991)
Brézillon, P.: Context in problem solving: a survey. The Knowledge Engineering Review 14(01), 47–80 (1999)
Brown, P.J., Bovey, J.D., Chen, X.: Context-aware applications: from the laboratory to the marketplace. IEEE Personal Communications 4(5), 58–64 (1997)
Callan, J.P.: Passage-level evidence in document retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, New York (1994)
Chand, K.P., Narsimha, G.: An integrated approach to improve the text categorization using semantic measures. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining-Volume 2. SIST, vol. 32, pp. 39–47. Springer, Heidelberg (2015)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
Chen, G., Kotz, D., et al.: A survey of context-aware mobile computing research. Tech. rep., Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College (2000)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Dao, N.: A new class of functions for describing logical structures in text. PhD thesis, Massachusetts Institute of Technology (2004)
Dey, A.K.: Understanding and using context. Personal and ubiquitous computing 5(1), 4–7 (2001)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)
Dumitrescu, A., Santini, S.: Think locally, search globally; context based information retrieval. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 396–401. IEEE (2009)
Gawrysiak, P., Gancarz, L., Okoniewski, M.: Recording word position information for improved document categorization. In: Proceedings of the PAKDD Text Mining Workshop (2002)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11(1) (2009)
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)
Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. Ldv Forum 20, 19–62 (2005)
Hull, R., Neaves, P., Bedford-Roberts, J.: Towards situated computing. In: First International Symposium on Wearable Computers, Digest of Papers, pp. 146–153. IEEE (1997)
Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Transactions on Computers 4(8), 966–974 (2005)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Kou, G., Peng, Y.: An application of latent semantic analysis for text categorization. International Journal of Computers Communications & Control 10(3), 357–369 (2015)
Kulkarni, A., Tokekar, V., Kulkarni, P.: Identifying context of text documents using naïve bayes classification and apriori association rule mining. In: 2012 CSI Sixth International Conference on Software Engineering (CONSEG), pp. 1–4. IEEE (2012)
Kulkarni, A., Tokekar, V., Kulkarni, P.: Text classification by enhancing weights of terms based on their positional appearances. International Journal of Computer Applications 78(9), 23–26 (2013)
Kulkarni, A., Tokekar, V., Kulkarni, P.: Discovering context of labelled text documents using context similarity coefficient. Procedia Computer Science 49C(9), 118–127 (2015)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
Lang, K.: Newsweeder: learning to filter netnews. In: Proc of 12th Intl Conference on Machine Learning, pp. 331–339 (1995)
Lewis, D.: Reuetrs-21578 text categorization test collection, dist 1.0 (1997)
Lewis, D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Murata, M., Ma, Q., Uchimoto, K., Ozaku, H., Utiyama, M., Isahara, H.: Japanese probabilistic information retrieval using location and category information. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian languages, pp. 81–88. ACM (2000)
Navrat, P., Taraba, T.: Context search. In: 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 99–102. IEEE (2007)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communication of the ACM 18 (1975)
Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Network 8(5), 22–32 (1994)
Stovall, J.G.: Writing for the Mass Media. 6th edn. Pearson Education (2006)
Xue, X.B., Zhou, Z.H.: Distributional features for text categorization. TKDE 21(3), 428–442 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kulkarni, A., Tokekar, V., Kulkarni, P. (2016). Discovering Context Using Contextual Positional Regions Based on Chains of Frequent Terms in Text Documents. In: Berretti, S., Thampi, S., Dasgupta, S. (eds) Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-319-23258-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-23258-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23257-7
Online ISBN: 978-3-319-23258-4
eBook Packages: EngineeringEngineering (R0)