Abstract
This chapter presents and summarizes the proposed method that was used to achieve the research objectives of the present study, including (i) a new weight scheme, that is, length feature weight (LFW) in Sect. 4.4.1; (ii) three models for TFSP to find the best algorithm for the FS problem in Sect. 4.5; (iii) a dynamic DR technique in Sect. 4.6; (iv) three models of the KHA for TDCP in Sect. 4.8; (v) a new multi-objective function for enhancing the clustering decision of the local search algorithm in Sect. 4.8; (vi) experiments and results in Sect. 4.9; and (vii) conclusion in Sect. 4.10.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Porter stemmer. Website at http://tartarus.org/martin/PorterStemmer/.
- 3.
References
Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.
Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016a, July). Multi-objectives based text clustering technique using k-mean algorithm. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549464.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016b, July). Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549453.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016c, July). Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549456.
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016). A krill herd algorithm for efficient text documents clustering. In 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72).
Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. EAI. https://doi.org/10.4108/eai.27-2-2017.152282.
Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abdalkareem, Z. A. (2015). Island-based harmony search for optimization problems. Expert Systems with Applications, 42(4), 2026–2035.
Armano, G., & Farmani, M. R. (2016). Multiobjective clustering analysis using particle swarm optimization. Expert Systems with Applications, 55, 184–193.
Bandyopadhyay, S., & Maulik, U. (2002). An evolutionary technique based on k-means algorithm for optimal clustering in rn. Information Sciences, 146(1), 221–237.
Basu, T., & Murthy, C. (2015). A similarity assessment technique for effective grouping of documents. Information Sciences, 311, 149–162.
Bharti, K. K., & Singh, P. K. (2014). A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science, 5(2), 156–169.
Bharti, K. K., & Singh, P. K. (2015a). Chaotic gradient artificial bee colony for text clustering. Soft Computing, 1–14.
Bharti, K. K., & Singh, P. K. (2015b). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.
Bharti, K. K., & Singh, P. K. (2016). Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Applied Soft Computing, 43, 20–34.
Bolaji, A. L., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill herd algorithm (kh) and its applications. Applied Soft Computing, 49, 437–446.
Chen, L., Liu, M., Wu, C., & Xu, A. (2016). A novel clustering algorithm and its incremental version for large-scale text collection. Information Technology and Control, 45(2), 136–147.
Cobos, C., León, E., & Mendoza, M. (2010). A harmony search algorithm for clustering with feature selection. Revista Facultad de Ingeniería Universidad de Antioquia (55), 153–164.
Cole, R. M. (1998). Clustering with genetic algorithms. Citeseer.
Cui, X., Potok, T. E., & Palathingal, P. (2005). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191).
De Vries, C. M. (2014). Document clustering algorithms, representations and evaluation for information retrieval.
Deb, K., Sindhya, K., & Hakanen, J. (2016). Multi-objective optimization. Decision sciences: Theory and practice (pp. 145–184). Boca Raton: CRC Press.
Del Buono, N., & Pio, G. (2015). Non-negative matrix tri-factorization for co-clustering: An analysis of the block matrix. Information Sciences, 301, 13–26.
Forsati, R., & Mahdavi, M. (2010). Web text mining using harmony search. Recent advances in harmony search algorithm (pp. 51–64). Berlin: Springer.
Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.
Forsati, R., Keikha, A., & Shamsfard, M. (2015). An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing, 159, 9–26.
Gandomi, A. H., & Alavi, A. H. (2012). Krill herd: A new bio-inspired optimization algorithm. Communications in Nonlinear Science and Numerical Simulation, 17(12), 4831–4845.
George, G., & Parthiban, L. (2015). Multi objective hybridized firefly algorithm with group search optimization for data clustering. In 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 125–130).
Ghanem, O., & Alhanjouri, M. (2014). Evaluating the effect of preprocessing in arabic documents clustering (Unpublished doctoral dissertation). Master’s thesis, Computer Engineering Department, Islamic University of Gaza, Palestine.
Hong, S.-S., Lee, W., & Han, M.-M. (2015). The feature selection method based on genetic algorithm for efficient of text clustering and text classification. International Journal of Advances in Soft Computing and Its Applications, 7(1), 22–40.
Inbarani, H. H., Bagyamathi, M., & Azar, A. T. (2015). A novel hybrid feature selection method based on rough set and improved harmony search. Neural Computing and Applications, 26(8), 1859–1880.
Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.
Kaur, S. P., & Madan, N. (2016). Document clustering using firefly algorithm. Artificial Intelligent Systems and Machine Learning, 8(5), 182–185.
Liao, H., Xu, Z., & Zeng, X.-J. (2014). Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making. Information Sciences, 271, 125–142.
Mahdavi, M., & Abolhassani, H. (2009). Harmony k-means algorithm for document clustering. Data Mining and Knowledge Discovery, 18(3), 370–391.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.
Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook (Vol. 2). New York: Springer.
Moayedikia, A., Jensen, R., Wiil, U. K., & Forsati, R. (2015). Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Engineering Applications of Artificial Intelligence, 44, 153–167.
Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266).
Nanda, S. J., & Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation, 16, 1–18.
Nebu, C. M., & Joseph, S. (2016). A hybrid dimension reduction technique for document clustering. Innovations in bio-inspired computing and applications (pp. 403–416). Cham: Springer.
Prabha, K. A., & Visalakshi, N. K. (2014). Improved particle swarm optimization based k-means clustering. In 2014 International Conference on Intelligent Computing Applications (ICICA) (pp. 59–63).
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Shafiei, M., Wang, S., Zhang, R., Milios, E., Tang, B., Tougas, J., et al. (2007). Document representation and dimension reduction for text clustering. In 2007 IEEE 23rd International Conference on Data Engineering Workshop (pp. 770–779).
Shah, N., & Mahajan, S. (2012). Document clustering: A detailed review. International Journal of Applied Information Systems, 4(5), 30–38.
Singh, P., & Sharma, M. (2013). Text document clustering and similarity measures. Department of Computer Science & Engineering.
Singh, V. K., Tiwari, N., & Garg, S. (2011). Document clustering using k-means, heuristic k-means and fuzzy c-means. In 2011 International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 297–301).
Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024–1032.
Wang, X., Cao, J., Liu, Y., Gao, S., & Deng, X. (2012). Text clustering based on the improved TFIDF by the iterative algorithm. In 2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM) (pp. 140–143).
Wang, G.-G., Gandomi, A. H., & Alavi, A. H. (2014). Stud krill herd algorithm. Neurocomputing, 128, 363–370.
Zaw, M. M., & Mon, E. E. (2015). Web document clustering by using pso-based cuckoo search clustering algorithm. Recent advances in swarm intelligence and evolutionary computation (pp. 263–281). Cham: Springer.
Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowledge-Based Systems, 64, 22–31.
Zhao, W., & Wang, Y. (2010a). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).
Zhao, W., & Wang, Y. (2010b). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).
Zhong, S., & Ghosh, J. (2005). Generative model-based document clustering: A comparative study. Knowledge and Information Systems, 8(3), 374–384.
Zhong, N., Li, Y., & Wu, S.-T. (2012). Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 24(1), 30–44.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Abualigah, L.M.Q. (2019). Proposed Methodology. In: Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence, vol 816. Springer, Cham. https://doi.org/10.1007/978-3-030-10674-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-10674-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10673-7
Online ISBN: 978-3-030-10674-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)