Generating Term Weighting Schemes Through Genetic Programming

Mazyad, Ahmad; Teytaud, Fabien; Fonlupt, Cyril

doi:10.1007/978-3-030-13709-0_8

Ahmad Mazyad¹⁷,
Fabien Teytaud¹⁷ &
Cyril Fonlupt¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11331))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

2024 Accesses

Abstract

Term-Weighting Scheme (TWS) is an important step in text classification. It determines how documents are represented in Vector Space Model (VSM). Even though state-of-the-art TWSs exhibit good behaviors, a large number of new works propose new approaches and new TWSs that improve performances. Furthermore, it is still difficult to tell which TWS is well suited for a specific problem. In this paper, we are interested in automatically generating new TWSs with the help of evolutionary algorithms and especially genetic programming (GP). GP evolves and combines different statistical information and generates a new TWS based on the performance of the learning method. We experience the generated TWSs on three well-known benchmarks. Our study shows that even early generated formulas are quite competitive with the state-of-the-art TWSs and even in some cases outperform them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cazenave, T.: Nested Monte-Carlo expression discovery. In: ECAI, pp. 1057–1058 (2010)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)
MATH Google Scholar
Cramer, N.L.: A representation for the adaptive generation of simple sequential programs. In: Proceedings of the First International Conference on Genetic Algorithms, pp. 183–187 (1985)
Google Scholar
Cummins, R., O’Riordan, C.: Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif. Intell. Rev. 24(3–4), 277–299 (2005)
Article Google Scholar
Cummins, R., O’Riordan, C.: Evolved term-weighting schemes in information retrieval: an analysis of the solution space. Artif. Intell. Rev. 26(1–2), 35–47 (2006)
Article Google Scholar
Cummins, R., O’Riordan, C.: Evolving local and global weighting schemes in information retrieval. Inf. Retr. 9(3), 311–330 (2006)
Article Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text mining and its applications. STUDFUZZ, pp. 81–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-45219-5_7
Chapter Google Scholar
Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Li, M.Z.L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24655-8_64
Chapter Google Scholar
Escalante, H.J., et al.: Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. 83, 176–189 (2015)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)
MATH Google Scholar
Fan, W., Fox, E.A., Pathak, P., Wu, H.: The effects of fitness functions on genetic programming-based ranking discovery for web search. J. Assoc. Inf. Sci. Technol. 55(7), 628–636 (2004)
Article Google Scholar
Guru, D., Suhil, M.: A novel term class relevance measure for text categorization. Proc. Comput. Sci. 45, 13–22 (2015)
Article Google Scholar
Ibrahim, O.A.S., Landa-Silva, D.: Term frequency with average term occurrences for textual information retrieval. Soft Comput. 20(8), 3045–3061 (2016)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Kadhim, A.I.: Statistical computation and term weighting for feature extraction on Twitter. In: 2018 International Conference on Advance of Sustainable Engineering and its Application (ICASEA), pp. 109–114, March 2018
Google Scholar
Karakus, M.: Function identification for the intrinsic strength and elastic properties of granitic rocks via genetic programming (GP). Comput. Geosci. 37(9), 1318–1323 (2011)
Article Google Scholar
Koza, J.R.: Concept formation and decision tree induction using the genetic programming paradigm. In: Schwefel, H.-P., Männer, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 124–128. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0029742
Chapter Google Scholar
Koza, J.R.: Genetic Programming II, Automatic Discovery of Reusable Subprograms. MIT Press, Cambridge (1992)
Google Scholar
Koza, J.R.: Genetic programming: on the Programming of Computers by Means of Natural Selection, vol. 1. MIT Press, Cambridge (1992)
MATH Google Scholar
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)
Article Google Scholar
Lewis, M.A., Fagg, A.H., Solidum, A.: Genetic programming approach to the construction of a neural network for control of a walking robot. In: IEEE International Conference on Robotics and Automation, vol. 3, pp. 2618–2623 (1992)
Google Scholar
Mazyad, A., Teytaud, F., Fonlupt, C.: A comparative study on term weighting schemes for text classification. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds.) MOD 2017. LNCS, vol. 10710, pp. 100–108. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72926-8_9
Chapter Google Scholar
Mazyad, A., Teytaud, F., Fonlupt, C.: Information gain based term weighting method for multi-label text classification task. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 607–615. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01054-6_44
Chapter Google Scholar
Mladeni’c, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Text and the Web, Conference on Automated Learning and Discovery CONALD-98. Citeseer (1998)
Google Scholar
Oren, N.: Reexamining tf.idf based information retrieval with genetic programming. In: Proceedings of the 2002 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology, pp. 224–234. South African Institute for Computer Scientists and Information Technologists (2002)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Searson, D.P., Leahy, D.E., Willis, M.J.: GPTIPS: an open source genetic programming toolbox for multigene symbolic regression. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, pp. 77–80. Citeseer (2010)
Google Scholar
Tretyakov, K.: Machine learning techniques in spam filtering. In: Data Mining Problem-Oriented Seminar, MTAT, vol. 3, pp. 60–79 (2004)
Google Scholar
Trotman, A.: Learning to rank. Inf. Retr. 8(3), 359–381 (2005)
Article Google Scholar
Wang, D., Zhang, H.: Inverse category frequency based supervised term weighting scheme for text categorization. preprint arXiv:1012.2609v4 (2013)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Google Scholar
Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LISIC, Université du Littoral Côte d’Opale, 50 Rue Ferdinand Buisson, 62100, Calais, France
Ahmad Mazyad, Fabien Teytaud & Cyril Fonlupt

Authors

Ahmad Mazyad
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Teytaud
View author publications
You can also search for this author in PubMed Google Scholar
Cyril Fonlupt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabien Teytaud .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy and University of Reading, Reading, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton
IBM, Tivoli Research Lab, Rome, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazyad, A., Teytaud, F., Fonlupt, C. (2019). Generating Term Weighting Schemes Through Genetic Programming. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-13709-0_8
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics