Proposed Methodology

Abualigah, Laith Mohammad Qasim

doi:10.1007/978-3-030-10674-4_4

Proposed Methodology

Laith Mohammad Qasim Abualigah³

Chapter
First Online: 19 December 2018

685 Accesses
3 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 816))

Abstract

This chapter presents and summarizes the proposed method that was used to achieve the research objectives of the present study, including (i) a new weight scheme, that is, length feature weight (LFW) in Sect. 4.4.1; (ii) three models for TFSP to find the best algorithm for the FS problem in Sect. 4.5; (iii) a dynamic DR technique in Sect. 4.6; (iv) three models of the KHA for TDCP in Sect. 4.8; (v) a new multi-objective function for enhancing the clustering decision of the local search algorithm in Sect. 4.8; (vi) experiments and results in Sect. 4.9; and (vii) conclusion in Sect. 4.10.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.unine.ch/Info/clef/.
2.
Porter stemmer. Website at http://tartarus.org/martin/PorterStemmer/.
3.
http://text-processing.com/demo/.

References

Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.
Article Google Scholar
Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.
Google Scholar
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016a, July). Multi-objectives based text clustering technique using k-mean algorithm. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549464.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016b, July). Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549453.
Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016c, July). Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). https://doi.org/10.1109/CSIT.2016.7549456.
Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016). A krill herd algorithm for efficient text documents clustering. In 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72).
Google Scholar
Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. EAI. https://doi.org/10.4108/eai.27-2-2017.152282.
Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abdalkareem, Z. A. (2015). Island-based harmony search for optimization problems. Expert Systems with Applications, 42(4), 2026–2035.
Article Google Scholar
Armano, G., & Farmani, M. R. (2016). Multiobjective clustering analysis using particle swarm optimization. Expert Systems with Applications, 55, 184–193.
Article Google Scholar
Bandyopadhyay, S., & Maulik, U. (2002). An evolutionary technique based on k-means algorithm for optimal clustering in rn. Information Sciences, 146(1), 221–237.
Article MathSciNet MATH Google Scholar
Basu, T., & Murthy, C. (2015). A similarity assessment technique for effective grouping of documents. Information Sciences, 311, 149–162.
Article Google Scholar
Bharti, K. K., & Singh, P. K. (2014). A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science, 5(2), 156–169.
Article Google Scholar
Bharti, K. K., & Singh, P. K. (2015a). Chaotic gradient artificial bee colony for text clustering. Soft Computing, 1–14.
Google Scholar
Bharti, K. K., & Singh, P. K. (2015b). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.
Article Google Scholar
Bharti, K. K., & Singh, P. K. (2016). Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Applied Soft Computing, 43, 20–34.
Article Google Scholar
Bolaji, A. L., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill herd algorithm (kh) and its applications. Applied Soft Computing, 49, 437–446.
Article Google Scholar
Chen, L., Liu, M., Wu, C., & Xu, A. (2016). A novel clustering algorithm and its incremental version for large-scale text collection. Information Technology and Control, 45(2), 136–147.
Article Google Scholar
Cobos, C., León, E., & Mendoza, M. (2010). A harmony search algorithm for clustering with feature selection. Revista Facultad de Ingeniería Universidad de Antioquia (55), 153–164.
Google Scholar
Cole, R. M. (1998). Clustering with genetic algorithms. Citeseer.
Google Scholar
Cui, X., Potok, T. E., & Palathingal, P. (2005). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191).
Google Scholar
De Vries, C. M. (2014). Document clustering algorithms, representations and evaluation for information retrieval.
Google Scholar
Deb, K., Sindhya, K., & Hakanen, J. (2016). Multi-objective optimization. Decision sciences: Theory and practice (pp. 145–184). Boca Raton: CRC Press.
Chapter Google Scholar
Del Buono, N., & Pio, G. (2015). Non-negative matrix tri-factorization for co-clustering: An analysis of the block matrix. Information Sciences, 301, 13–26.
Article Google Scholar
Forsati, R., & Mahdavi, M. (2010). Web text mining using harmony search. Recent advances in harmony search algorithm (pp. 51–64). Berlin: Springer.
Chapter Google Scholar
Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.
Article MathSciNet Google Scholar
Forsati, R., Keikha, A., & Shamsfard, M. (2015). An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing, 159, 9–26.
Article Google Scholar
Gandomi, A. H., & Alavi, A. H. (2012). Krill herd: A new bio-inspired optimization algorithm. Communications in Nonlinear Science and Numerical Simulation, 17(12), 4831–4845.
Article MathSciNet MATH Google Scholar
George, G., & Parthiban, L. (2015). Multi objective hybridized firefly algorithm with group search optimization for data clustering. In 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 125–130).
Google Scholar
Ghanem, O., & Alhanjouri, M. (2014). Evaluating the effect of preprocessing in arabic documents clustering (Unpublished doctoral dissertation). Master’s thesis, Computer Engineering Department, Islamic University of Gaza, Palestine.
Google Scholar
Hong, S.-S., Lee, W., & Han, M.-M. (2015). The feature selection method based on genetic algorithm for efficient of text clustering and text classification. International Journal of Advances in Soft Computing and Its Applications, 7(1), 22–40.
Google Scholar
Inbarani, H. H., Bagyamathi, M., & Azar, A. T. (2015). A novel hybrid feature selection method based on rough set and improved harmony search. Neural Computing and Applications, 26(8), 1859–1880.
Article Google Scholar
Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.
Article Google Scholar
Kaur, S. P., & Madan, N. (2016). Document clustering using firefly algorithm. Artificial Intelligent Systems and Machine Learning, 8(5), 182–185.
Google Scholar
Liao, H., Xu, Z., & Zeng, X.-J. (2014). Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making. Information Sciences, 271, 125–142.
Article MathSciNet MATH Google Scholar
Mahdavi, M., & Abolhassani, H. (2009). Harmony k-means algorithm for document clustering. Data Mining and Knowledge Discovery, 18(3), 370–391.
Article MathSciNet Google Scholar
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.
Article MathSciNet MATH Google Scholar
Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook (Vol. 2). New York: Springer.
Book MATH Google Scholar
Moayedikia, A., Jensen, R., Wiil, U. K., & Forsati, R. (2015). Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Engineering Applications of Artificial Intelligence, 44, 153–167.
Article Google Scholar
Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266).
Google Scholar
Nanda, S. J., & Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary Computation, 16, 1–18.
Article Google Scholar
Nebu, C. M., & Joseph, S. (2016). A hybrid dimension reduction technique for document clustering. Innovations in bio-inspired computing and applications (pp. 403–416). Cham: Springer.
Chapter Google Scholar
Prabha, K. A., & Visalakshi, N. K. (2014). Improved particle swarm optimization based k-means clustering. In 2014 International Conference on Intelligent Computing Applications (ICICA) (pp. 59–63).
Google Scholar
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Article MATH Google Scholar
Shafiei, M., Wang, S., Zhang, R., Milios, E., Tang, B., Tougas, J., et al. (2007). Document representation and dimension reduction for text clustering. In 2007 IEEE 23rd International Conference on Data Engineering Workshop (pp. 770–779).
Google Scholar
Shah, N., & Mahajan, S. (2012). Document clustering: A detailed review. International Journal of Applied Information Systems, 4(5), 30–38.
Article Google Scholar
Singh, P., & Sharma, M. (2013). Text document clustering and similarity measures. Department of Computer Science & Engineering.
Google Scholar
Singh, V. K., Tiwari, N., & Garg, S. (2011). Document clustering using k-means, heuristic k-means and fuzzy c-means. In 2011 International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 297–301).
Google Scholar
Uğuz, H. (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024–1032.
Article Google Scholar
Wang, X., Cao, J., Liu, Y., Gao, S., & Deng, X. (2012). Text clustering based on the improved TFIDF by the iterative algorithm. In 2012 IEEE Symposium on Electrical & Electronics Engineering (EEESYM) (pp. 140–143).
Google Scholar
Wang, G.-G., Gandomi, A. H., & Alavi, A. H. (2014). Stud krill herd algorithm. Neurocomputing, 128, 363–370.
Article Google Scholar
Zaw, M. M., & Mon, E. E. (2015). Web document clustering by using pso-based cuckoo search clustering algorithm. Recent advances in swarm intelligence and evolutionary computation (pp. 263–281). Cham: Springer.
Google Scholar
Zhang, Y., Wang, S., Phillips, P., & Ji, G. (2014). Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowledge-Based Systems, 64, 22–31.
Article Google Scholar
Zhao, W., & Wang, Y. (2010a). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).
Google Scholar
Zhao, W., & Wang, Y. (2010b). Notice of retraction an improved genetic algorithm for text feature selection. In 2010 International Conference on Intelligent Computing and Cognitive Informatics (ICICCI) (pp. 7–10).
Google Scholar
Zhong, S., & Ghosh, J. (2005). Generative model-based document clustering: A comparative study. Knowledge and Information Systems, 8(3), 374–384.
Article Google Scholar
Zhong, N., Li, Y., & Wu, S.-T. (2012). Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering, 24(1), 30–44.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universiti Sains Malaysia, Penang, Malaysia
Laith Mohammad Qasim Abualigah

Authors

Laith Mohammad Qasim Abualigah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laith Mohammad Qasim Abualigah .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Abualigah, L.M.Q. (2019). Proposed Methodology. In: Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence, vol 816. Springer, Cham. https://doi.org/10.1007/978-3-030-10674-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-10674-4_4
Published: 19 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10673-7
Online ISBN: 978-3-030-10674-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics