Skip to main content

Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval

  • Conference paper
Computational Science and Its Applications – ICCSA 2013 (ICCSA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7974))

Included in the following conference series:

Abstract

In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes where, in order to effectively increase the precision of web document retrieval and to decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed using statistical results from a web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abberley, D., Kirby, D., Renals, S., Robinson, T.: The THISL broadcast news retrieval system. In: Proc. ESCA ETRW Workshop Accessing Information in Spoken Audio (Cambridge), pp. 14–19 (1999); Section on Query Expansion – Concise, mathematical overview

    Google Scholar 

  2. Franzoni, V., Milani, A.: PMING Distance: A Collaborative Semantic Proximity Measure. In: WI-IAT, vol. 2, pp. 442–449 (2012); IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (2012) ISBN: 978-1-4673-6057-9, doi:10.1109/WI-IAT.2012.226

    Google Scholar 

  3. Mitra, M., Singhal, A., Buckley, C.: Improving Automatic Query Expansion. In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214

    Google Scholar 

  4. Wong, C.F.: Automatic Semantic Web document Annotation and Retrieval. PhD Thesis, Hong Kong Baptist University (August 2010)

    Google Scholar 

  5. Hollink, L., Schreiber, G., Wielinga, B.: Query Expansion for Web document Content Search (2008)

    Google Scholar 

  6. Santucci, V., Milani, A.: Covariance-based parameters adaptation in differential evolution. In: GECCO 2011 Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 687–690. ACM (2011) ISBN: 978-1-4503-0690-4

    Google Scholar 

  7. Gentili, E., Milani, A., Poggioni, V.: Data Summarization Model for User Action Log Files. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012, Part III. LNCS, vol. 7335, pp. 539–549. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Budanitsky, A., Hirst, G.: Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: Proceedings of Workshop on WordNet and Other Lexical Resources, Pittsburgh, PA, USA, p. 641. North American Chapter of the Association for Computational Linguistics (2001)

    Google Scholar 

  9. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  10. Franzoni, V., Milani, A.: Heuristic Semantic Walk. In: Browsing a Collaborative Network With a Search Engine-Based Heuristic 2001. LNCS (in press, 2013)

    Google Scholar 

  11. Miller, E.G.A.: Wordnet: a lexical database for English. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  12. Jin, Y., Khan, L., Wang, L., Awad, M.: Web document annotations by combining multiple evidence & wordnet. In: MULTIMEDIA 2005: Proceedings of the 13th Annual ACM International Conference on Multimedia, New York, NY, USA, pp. 706–715 (2005)

    Google Scholar 

  13. Andreou, A.: Ontologies and Query Expansion (2005)

    Google Scholar 

  14. Natsev, A., Haubold, A., Tesic, J., Xie, L., Yan, R.: Semantiv Concept-Based Query Expansion and Re-ranking for Multimedia Retrieval. In: Proceedings of the 15th ACM International Conference on Multimedia, New York, NY, USA, pp. 991–1000 (2007)

    Google Scholar 

  15. Wong, R.C.F., Leung, C.H.C.: Automatic Semantic Annotation of Real-World Web Web documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1933–1944 (2008)

    Article  Google Scholar 

  16. Reed, S., Lenat, D.: Mapping Ontologies into Cyc. In: Proceedings of AAAI 2002 Conference Workshop on Ontologies for The Semantic Web, Edmonton, Canada (2002)

    Google Scholar 

  17. Matuszek, C., Witbrock, M., Kahlert, R., Cabral, J., Schneider, D., Shah, P., Lenat, D.: Searching for Common Sense: Populating Cyc from the Web. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania (2005)

    Google Scholar 

  18. Gao, Y., Fan, J.: Incorporating Concept Ontology To Enable Probabilistic Concept Reasoning for Multi-Level Web document Annotation. In: Proceedings of the 8th ACM International Workshop on Multimedia information Retrieval, pp. 79–88 (2006)

    Google Scholar 

  19. Torralba, A., Fergus, R., Freeman, W.T.: 80 Million Tiny Web documents: A Large Data Set for Nonparametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1958–1970 (2008)

    Article  Google Scholar 

  20. Leung, C.H.C., Chan, W.S., Milani, A., Liu, J., Li, Y.X.: Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine. ACM Transactions on Intelligent Systems and Technology (2012)

    Google Scholar 

  21. Franzoni, V., Gervasi, O.: Guidelines for Web Usability and Accessibility on the Nintendo Wii. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science VI. LNCS, vol. 5730, pp. 19–40. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Cilibrasi, R., Vitanyi, P.: The Google Similarity Distance. ArXiv.org (2004)

    Google Scholar 

  23. Cialdea Mayer, M., Limongelli, C., Orlandini, A., Poggioni, V.: Linear temporal logic as an executable semantics for planning languages. Journal of Logic, Language and Information 16(1) (2007)

    Google Scholar 

  24. Tam, A.M., Leung, C.H.C.: Semantic Content Retrieval and Structured Annotation: Beyond Keywords. In: ISO/IEC JTC1/SC29/WG11 MPEG00/M5738, Noordwijkerhout, Netherlands (March 2000)

    Google Scholar 

  25. Manning, D., Schutze, H.: Foundations of statistical natural language processing. The MIT Press, London (2002)

    Google Scholar 

  26. Baioletti, M., Milani, A., Poggioni, V., Rossi, F.: Experimental evaluation of pheromone models in ACOPlan. Annals of Mathematics and Artificial Intelligence 62, 187–217 (2011)

    Article  MATH  Google Scholar 

  27. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  28. Leung, C.H.C., Chan, W.S., Milani, A., Liu, J., Li, Y.X.: Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine. ACM Transactions on Intelligent Systems and Technology (2012)

    Google Scholar 

  29. Li, Y.X., Leung, C.H.C.: Multi-level Semantic Characterisation and Refinement for Web Web document Search. In: The 2nd International Conference on Innovative Computing and Communication, pp. 70–73 (2011)

    Google Scholar 

  30. Santucci, V., Milani, A.: Particle Swarm Optimization in the EDAs framework. In: Gaspar-Cunha, A., Takahashi, R., Schaefer, G., Costa, L. (eds.) Soft Computing in Industrial Applications. AISC, vol. 96, pp. 87–96. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  31. Santucci, V., Milani, A.: Community of Scientist Optimization An autonomy oriented approach to distributed optimization. AI Communications 25(2), 157–172 (2012)

    MathSciNet  Google Scholar 

  32. Santucci, V., Milani, A.: Adaptive Memetic Particle Swarm Optimization. In: Proceedings of 16th Online Conference on Soft Computing in Industrial Applications (WSC16)

    Google Scholar 

  33. Santucci, V., Milani, A.: Community of Scientist Optimization: Foraging and Competing for Research Resources. In: IJCAI 2011 Workshop Proceedings, 18th RCRA International Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion, pp. 66–80 (2011)

    Google Scholar 

  34. Milani, A., Baioletti, M., Santucci, V.: Discrete Differential Evolution for Learning Bayesian Network Structure. In: Proceedings of GECCO 2013, Genetic and Evolutionary Computation Conference (2013)

    Google Scholar 

  35. Milani, A., Santucci, V.: Particle Swarm Estimation of Distribution Algorithm for Lymphoma Classification through Automatic Biopsies Analysis. In: Proceedings of Mibisoc 2013, International Conference on Medical Imaging using Bio-inspired and Soft-Computing (2013)

    Google Scholar 

  36. Milani, A., Ukey, N., Niyogi, R., Poggioni, V., Singh, K.: A Bidirectional Heuristic for Web Service Composition with Costs. International Journal of Web and Grid Services, Inderscience 6, 160–175 (2010)

    Google Scholar 

  37. Franzoni, V.: Semantic Proximity Measures for the Web (Misure di Prossimità Semantic per il Web), Laurea Thesis, Department of Mathematics and Computer Science, Università degli Studi di Perugia, Italy (2012)

    Google Scholar 

  38. Milani, A., Poggioni, V.: Planning in Reactive Environments. Computational Intelligence 23, 439–463 (2007)

    Article  MathSciNet  Google Scholar 

  39. Milani, A., Santucci, A.V., Leung, V.C.: Optimal Design of Web Information Contents for E-Commerce Applications. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences. LNEE, vol. 62, pp. 978–990. Springer, Heidelberg (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leung, C.H.C., Li, Y., Milani, A., Franzoni, V. (2013). Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39649-6_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39649-6_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39648-9

  • Online ISBN: 978-3-642-39649-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics