Skip to main content

Proposed Methodology

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 816))

Abstract

This chapter presents and summarizes the proposed method that was used to achieve the research objectives of the present study, including (i) a new weight scheme, that is, length feature weight (LFW) in Sect. 4.4.1; (ii) three models for TFSP to find the best algorithm for the FS problem in Sect. 4.5; (iii) a dynamic DR technique in Sect. 4.6; (iv) three models of the KHA for TDCP in Sect. 4.8; (v) a new multi-objective function for enhancing the clustering decision of the local search algorithm in Sect. 4.8; (vi) experiments and results in Sect. 4.9; and (vii) conclusion in Sect. 4.10.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.unine.ch/Info/clef/.

  2. 2.

    Porter stemmer. Website at http://tartarus.org/martin/PorterStemmer/.

  3. 3.

    http://text-processing.com/demo/.

References

  • Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.

    Article  Google Scholar 

  • Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.

    Google Scholar 

  • Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016a, July). Multi-objectives based text clustering technique using k-mean algorithm. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549464.

  • Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016b, July). Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549453.

  • Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016c, July). Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549456.

  • Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016). A krill herd algorithm for efficient text documents clustering. In 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72).

    Google Scholar 

  • Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. EAI. https://doi.org/10.4108/eai.27-2-2017.152282.

  • Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abdalkareem, Z. A. (2015). Island-based harmony search for optimization problems. Expert Systems with Applications, 42(4), 2026–2035.

    Article  Google Scholar 

  • Armano, G., & Farmani, M. R. (2016). Multiobjective clustering analysis using particle swarm optimization. Expert Systems with Applications, 55, 184–193.

    Article  Google Scholar 

  • Bandyopadhyay, S., & Maulik, U. (2002). An evolutionary technique based on k-means algorithm for optimal clustering in rn. Information Sciences, 146(1), 221–237.

    Article  MathSciNet  MATH  Google Scholar 

  • Basu, T., & Murthy, C. (2015). A similarity assessment technique for effective grouping of documents. Information Sciences, 311, 149–162.

    Article  Google Scholar 

  • Bharti, K. K., & Singh, P. K. (2014). A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science, 5(2), 156–169.

    Article  Google Scholar 

  • Bharti, K. K., & Singh, P. K. (2015a). Chaotic gradient artificial bee colony for text clustering. Soft Computing, 1–14.

    Google Scholar 

  • Bharti, K. K., & Singh, P. K. (2015b). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.

    Article  Google Scholar 

  • Bharti, K. K., & Singh, P. K. (2016). Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Applied Soft Computing, 43, 20–34.

    Article  Google Scholar 

  • Bolaji, A. L., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill herd algorithm (kh) and its applications. Applied Soft Computing, 49, 437–446.

    Article  Google Scholar 

  • Chen, L., Liu, M., Wu, C., & Xu, A. (2016). A novel clustering algorithm and its incremental version for large-scale text collection. Information Technology and Control, 45(2), 136–147.

    Article  Google Scholar 

  • Cobos, C., León, E., & Mendoza, M. (2010). A harmony search algorithm for clustering with feature selection. Revista Facultad de Ingeniería Universidad de Antioquia (55), 153–164.

    Google Scholar 

  • Cole, R. M. (1998). Clustering with genetic algorithms. Citeseer.

    Google Scholar 

  • Cui, X., Potok, T. E., & Palathingal, P. (2005). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191).

    Google Scholar 

  • De Vries, C. M. (2014). Document clustering algorithms, representations and evaluation for information retrieval.

    Google Scholar 

  • Deb, K., Sindhya, K., & Hakanen, J. (2016). Multi-objective optimization. Decision sciences: Theory and practice (pp. 145–184). Boca Raton: CRC Press.

    Chapter  Google Scholar 

  • Del Buono, N., & Pio, G. (2015). Non-negative matrix tri-factorization for co-clustering: An analysis of the block matrix. Information Sciences, 301, 13–26.

    Article  Google Scholar 

  • Forsati, R., & Mahdavi, M. (2010). Web text mining using harmony search. Recent advances in harmony search algorithm (pp. 51–64). Berlin: Springer.

    Chapter  Google Scholar 

  • Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.

    Article  MathSciNet  Google Scholar 

  • Forsati, R., Keikha, A., & Shamsfard, M. (2015). An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing, 159, 9–26.

    Article  Google Scholar 

  • Gandomi, A. H., & Alavi, A. H. (2012). Krill herd: A new bio-inspired optimization algorithm. Communications in Nonlinear Science and Numerical Simulation, 17(12), 4831–4845.

    Article  MathSciNet  MATH  Google Scholar 

  • George, G., & Parthiban, L. (2015). Multi objective hybridized firefly algorithm with group search optimization for data clustering. In 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 125–130).

    Google Scholar 

  • Ghanem, O., & Alhanjouri, M. (2014). Evaluating the effect of preprocessing in arabic documents clustering (Unpublished doctoral dissertation). Master’s thesis, Computer Engineering Department, Islamic University of Gaza, Palestine.

    Google Scholar 

  • Hong, S.-S., Lee, W., & Han, M.-M. (2015). The feature selection method based on genetic algorithm for efficient of text clustering and text classification. International Journal of Advances in Soft Computing and Its Applications, 7(1), 22–40.

    Google Scholar 

  • Inbarani, H. H., Bagyamathi, M., & Azar, A. T. (2015). A novel hybrid feature selection method based on rough set and improved harmony search. Neural Computing and Applications, 26(8), 1859–1880.

    Article  Google Scholar 

  • Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.

    Article  Google Scholar 

  • Kaur, S. P., & Madan, N. (2016). Document clustering using firefly algorithm. Artificial Intelligent Systems and Machine Learning, 8(5), 182–185.

    Google Scholar 

  • Liao, H., Xu, Z., & Zeng, X.-J. (2014). Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making. Information Sciences, 271, 125–142.

    Article  MathSciNet  MATH  Google Scholar 

  • Mahdavi, M., & Abolhassani, H. (2009). Harmony k-means algorithm for document clustering. Data Mining and Knowledge Discovery, 18(3), 370–391.

    Article  MathSciNet  Google Scholar 

  • Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.

    Article  MathSciNet  MATH  Google Scholar 

  • Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook (Vol. 2). New York: Springer.

    Book  MATH  Google Scholar 

  • Moayedikia, A., Jensen, R., Wiil, U. K., & Forsati, R. (2015). Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Engineering Applications of Artificial Intelligence, 44, 153–167.

    Article  Google Scholar 

  • Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266).

    Google Scholar 

  • Nanda, S. J., & Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation, 16, 1–18.

    Article  Google Scholar 

  • Nebu, C. M., & Joseph, S. (2016). A hybrid dimension reduction technique for document clustering. Innovations in bio-inspired computing and applications (pp. 403–416). Cham: Springer.

    Chapter  Google Scholar 

  • Prabha, K. A., & Visalakshi, N. K. (2014). Improved particle swarm optimization based k-means clustering. In 2014 International Conference on Intelligent Computing Applications (ICICA) (pp. 59–63).

    Google Scholar 

  • Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

    Article  MATH  Google Scholar 

  • Shafiei, M., Wang, S., Zhang, R., Milios, E., Tang, B., Tougas, J., et al. (2007). Document representation and dimension reduction for text clustering. In 2007 IEEE 23rd International Conference on Data Engineering Workshop (pp. 770–779).

    Google Scholar 

  • Shah, N., & Mahajan, S. (2012). Document clustering: A detailed review. International Journal of Applied Information Systems, 4(5), 30–38.

    Article  Google Scholar 

  • Singh, P., & Sharma, M. (2013). Text document clustering and similarity measures. Department of Computer Science & Engineering.

    Google Scholar 

  • Singh, V. K., Tiwari, N., & Garg, S. (2011). Document clustering using k-means, heuristic k-means and fuzzy c-means. In 2011 International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 297–301).

    Google Scholar 

  • Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024–1032.

    Article  Google Scholar 

  • Wang, X., Cao, J., Liu, Y., Gao, S., & Deng, X. (2012). Text clustering based on the improved TFIDF by the iterative algorithm. In 2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM) (pp. 140–143).

    Google Scholar 

  • Wang, G.-G., Gandomi, A. H., & Alavi, A. H. (2014). Stud krill herd algorithm. Neurocomputing, 128, 363–370.

    Article  Google Scholar 

  • Zaw, M. M., & Mon, E. E. (2015). Web document clustering by using pso-based cuckoo search clustering algorithm. Recent advances in swarm intelligence and evolutionary computation (pp. 263–281). Cham: Springer.

    Google Scholar 

  • Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowledge-Based Systems, 64, 22–31.

    Article  Google Scholar 

  • Zhao, W., & Wang, Y. (2010a). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).

    Google Scholar 

  • Zhao, W., & Wang, Y. (2010b). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).

    Google Scholar 

  • Zhong, S., & Ghosh, J. (2005). Generative model-based document clustering: A comparative study. Knowledge and Information Systems, 8(3), 374–384.

    Article  Google Scholar 

  • Zhong, N., Li, Y., & Wu, S.-T. (2012). Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 24(1), 30–44.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Mohammad Qasim Abualigah .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Abualigah, L.M.Q. (2019). Proposed Methodology. In: Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence, vol 816. Springer, Cham. https://doi.org/10.1007/978-3-030-10674-4_4

Download citation

Publish with us

Policies and ethics