Abstract
Most people consider that the World Wide Web (WWW) is a mine of information. The explosive growth in the WWW, not only in the amount of information but also in contents of Web pages, makes traditional search engines inadequate approach to the retrieval of documents or web pages that are most relevant to user needs (degree of relevance) in a short time. To improve the information retrieval process, from both time and degree of relevance to user need, parallel genetic algorithms could be utilized. In this paper, island genetic algorithm (IGA) is utilized to achieve parallelism and speed up the web information retrieval process. Four different islands with different selection methods and fitness functions are suggested to be used to improve degree of relevance. To achieve parallel behavior, the four islands are executed independently on different servers. Query expansion technique is used to add useful words to user query and enhance number of retrieved documents. Finally, the results obtained by the four islands are combined and passed to a decision making phase to select the documents most pertinent to user needs. Cosine similarity measure is used to evaluate the performance of the proposed technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen H, Chau M (2004) Web mining: machine learning for web applications. Ann Rev Inf Sci Technol 38:289–329
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press
Herrouz A, Khentout C, Djoudi M (2013) Overview of web content mining tools. Int J Eng Sci (IJES) 2:1–6
Johnson F, Gupta SK (2012) Web content mining techniques: a survey. Int J Comput Appl 47:4–50
Picaroungne F, Monmarché N, Oliver A, Venturini G (2002) Web mining with a genetic algorithm. In: 11th international World Wide Web conference, Honolulu, Hawaii, 7–11 May 2002
Vallim MS, Coello JMA (2003) An agent for web information dissemination based on a genetic algorithm. In: International conference on systems, man and cybernetics. IEEE Press
Kim S, Zhang B-T (2003) Genetic mining of HTML structures for effective web-document retrieval. Applied intelligence, vol 18. Kluwer Academic Publishers, pp 243–256
Vizine AL, de Castro LN, Gudwin RR (2005) An evolutionary algorithm to optimize web document retrieval. In: International conference on integration of knowledge intensive multi-agent systems
Marghny MH, Ali AF (2005) Web mining based on genetic algorithm. In: Cairo: AIML 05 conference, pp 82–87
Yan X, Zhang C, Zhang S (2009) Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Elsevier, Expert Systems with Applications, vol 36, pp 3066–3076
Sabnis V, Thakur RS (2013) GA based model for web content mining. IJCSI Int J Comput Sci Issues 10:308–313
Tungar D, Potgantwar AD (2014) Investigation of web mining optimization using microbial genetic algorithm. J Eng Res Appl 4:593–597
Thada V, Jaglan V (2014) Use of genetic algorithm in web information retrieval. Int J Emerg Technol Comput Appl Sci 7:278–281
Whitley D, Rana S, Heckendorn RB (1997) Island model genetic algorithms and linearly separable problems. In: Evolutionary computing: proceedings of AISB workshop, Lecture notes in computer science, vol 1305, pp 109–125
Belal MA, Haggag MH (2013) A structured population genetic-algorithm based on hierarchical hypercube of henes expressions. Int J Comput Appl 64:5–18
Engelbrecht AP (2002) England, Computational intelligence: an introduction. Wiley
Simon D (2013) Evolutionary optimization algorithms. Wiley
Yu X, Gen M (2010) Introduction to evolutionary algorithms. Springer Science & Business Media
Choi D, Kim J, Kim P (2014) A method for normalizing non-standard words in online social network services: a case study on twitter. In: Vinh PC et al (ed) Context-aware systems and applications, Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 128. Springer International Publishing, pp 359–368
Mezyan N, Samawi VW (2015) Web information retrieval using island genetic algorithm. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2015, 21–23 Oct, 2015, San Francisco, USA, pp 325–330
Liu B (2007) Web Data mining: exploring hyperlinks, contents and usage data. Springer, New York
Rivas AR, Iglesias EL, Borrajo L (online) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014 (2014). http://dx.doi.org/10.1155/2014/132158
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mezyan, N., Samawi, V.W. (2017). Web Retrieval with Query Expansion: A Parallel Retrieving Technique. In: Ao, SI., Kim, H., Amouzegar, M. (eds) Transactions on Engineering Technologies. WCECS 2015. Springer, Singapore. https://doi.org/10.1007/978-981-10-2717-8_28
Download citation
DOI: https://doi.org/10.1007/978-981-10-2717-8_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2716-1
Online ISBN: 978-981-10-2717-8
eBook Packages: EngineeringEngineering (R0)