Advertisement

In Search of a Semantic Book Search Engine on the Web: Are We There Yet?

  • Irfan Ullah
  • Shah KhusroEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 464)

Abstract

Books being a valuable source of knowledge and learning, have always been searched for on the Web. Traditional Web Information Retrieval (IR) techniques of searching and ranking are applied for this purpose. These techniques, however, are basically designed for dealing with hyperlinked collections of rich text in the form of web pages. Books are inherently different from web pages and the traditional Web IR techniques do not account for their well-organized structure and the logically connected content. Book searching solutions currently available on the Web and in other digital environments, however, do not exploit these implicit semantics resulting in not satisfying the requirements of all stakeholders including readers, authors, publishers, and librarians. These semantics hidden in the well thought out structure and the logical connections in book contents are only visible to human beings. The position put forward here is that most of the available searching solutions treat books as plaintext collections leading to inaccurate and imprecise book search results. Ways and means must, therefore, be found to treat books differently from other web documents and to use their structural semantics and logical connections in the content for searching, ranking and recommendations. Development of comprehensive book structure ontology will help in harvesting these implicit semantics. Similarly, in order to fulfill information needs of the readers, different domain-level ontologies are required so that book contents can be conceptually connected and be made machine ‘understandable’. Moreover, tables in a book consist of structured data and are a rich source of semantics. Similarly, the context of images and figures may be exploited for relating contents within and across books. Discovery and the subsequent utilization of these semantics in book IR process will result in more precise and accurate systems and to the satisfaction of all stakeholders.

Keywords

Semantic web Information retrieval Ontology Ranking Recommendations Search engines 

References

  1. 1.
  2. 2.
    Magdy, W., Darwish, K.: Book search: indexing the valuable parts. In: Proceedings of the 2008 ACM Workshop on Research Advances in Large Digital Book Repositories, pp. 53–56. ACM, Napa Valley, California, USA (2008)Google Scholar
  3. 3.
    Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: Book layout analysis: TOC structure extraction engine. In: Geva, S., Kamps, J., Trotman, A. (eds.) Advances in Focused Retrieval, vol. 5631, pp. 164–171. Springer, Berlin, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Giguet, E., Lucas, N.: The book structure extraction competition with the resurgence software at Caen University. In: Geva, S., Kamps, J., Trotman, A. (eds.) Focused Retrieval and Evaluation, vol. 6203, pp. 170–178. Springer, Berlin, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Giguet, E., Lucas, N.: The book structure extraction competition with the resurgence software for part and chapter detection at Caen University. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds.) Comparative Evaluation of Focused Retrieval, vol. 6932, pp. 128–139. Springer, Berlin, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Giguet, E., Lucas, N.: The book structure extraction competition with the resurgence full content software at Caen University. In: Geva, S., Kamps, J., Schenkel, R. (eds.) Focused Retrieval of Content and Structure, vol. 7424, pp. 86–97. Springer, Berlin, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y.: TOC structure extraction from OCR-ed books. In: Geva, S., Kamps, J., Schenkel, R. (eds.) Focused Retrieval of Content and Structure, vol. 7424, pp. 98–108. Springer, Berlin, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Marinai, S., Marino, E., Soda, G.: Conversion of PDF books in ePub format. Int. Conf. Doc. Anal. Recogn. (ICDAR) 2011, 478–482 (2011)Google Scholar
  9. 9.
    Déjean, H., Meunier, J.-L.: XRCE participation to the 2009 book structure task. In: Geva, S., Kamps, J., Trotman, A. (eds.) Focused Retrieval and Evaluation, vol. 6203, pp. 160–169. Springer, Berlin, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Déjean, H., Meunier, J.-L.: Structuring documents according to their table of contents. In: Proceedings of the 2005 ACM Symposium on Document Engineering, pp. 2–9. ACM, Bristol, United Kingdom (2005)Google Scholar
  11. 11.
    Déjean, H., Meunier, J.-L.: On tables of contents and how to recognize them. IJDAR 12, 1–20 (2009)CrossRefGoogle Scholar
  12. 12.
    Kamps, J.: The impact of author ranking in a library catalogue. In: Proceedings of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing, pp. 35–40. ACM, Glasgow, Scotland, UK (2011)Google Scholar
  13. 13.
    Gelernter, J., Lesk, M.E.: Traditional resources help interpret texts. In: Proceedings of the 2008 ACM Workshop on Research Advances in Large Digital Book Repositories, pp. 17–20. ACM, Napa Valley, California, USA (2008)Google Scholar
  14. 14.
    Vincent, L.: Google book search: document understanding on a massive scale. In: Ninth International Conference on Document Analysis and Recognition, 2007. ICDAR 2007, vol. 2, pp. 819–823 (2007)Google Scholar
  15. 15.
    Schilit, W.N., Kolak, O., Vincent-foglesong, J.J.P.: Ranking similar passages. US Patent 20,090,055,389 (2009)Google Scholar
  16. 16.
    Petrou, D., Chan, C.-K., Loreto, D., Reynar, J.C., Jevtic, N.: Query-independent entity importance in books. Google Patents (2011)Google Scholar
  17. 17.
    Tiroshi, A., Kuflik, T., Kay, J., Kummerfeld, B.: Recommender systems and the social web. In: Ardissono, L., Kuflik, T. (eds.) Advances in User Modeling, vol. 7138, pp. 60–70. Springer, Berlin, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Khusro, S., Ullah, I., Rauf, A., Mahfooz, S.: Issues and challenges in book information retrieval. Information 17, 2055–2078 (2014)Google Scholar
  19. 19.
    Ryang, H., Yun, U.: Effective ranking techniques for book review retrieval based on the structural feature. In: Lee, G., Howard, D., Ślęzak, D. (eds.) Convergence and Hybrid Information Technology, vol. 6935, pp. 360–367. Springer, Berlin, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Koolen, M., Kamps, J., Kazai, G.: Social book search: comparing topical relevance judgements and book suggestions for evaluation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 185–194. ACM, Maui, Hawaii, USA (2012)Google Scholar
  21. 21.
    Pera, M.S., Ng, Y.-K.: Personalized recommendations on books for K-12 readers. In: Proceedings of the Fifth ACM Workshop on Research Advances in Large Digital Book Repositories and Complementary Media, pp. 11–12. ACM, Maui, Hawaii, USA (2012)Google Scholar
  22. 22.
    Pera, M.S., Yiu Kai, N.: How can we help our K-12 teachers?: using a recommender to make personalized book suggestions. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 2, pp. 335–342 (2014)Google Scholar
  23. 23.
    Pera, M.S., Ng, Y.-K.: Analyzing book-related features to recommend books for emergent readers. In: Proceedings of the 26th ACM Conference on Hypertext and Social Media, pp. 221–230. ACM, Guzelyurt, Northern Cyprus (2015)Google Scholar
  24. 24.
    Smith, D.A., Manmatha, R., Allan, J.: Mining relational structure from millions of books: position paper. In: Proceedings of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing, pp. 49–54. ACM, Glasgow, Scotland, UK (2011)Google Scholar
  25. 25.
    Kang, J., Choi, J.: An ontology-based recommendation system using long-term and short-term preferences. In: 2011 International Conference on Information Science and Applications (ICISA), pp. 1–8. IEEE, Jeju Island, 26–29 Apr 2011Google Scholar
  26. 26.
    Asjana, M., López, V., Muñoz, M., Moreno, M.: Semantic web mining for book recommendation. In: Casillas, J., Martínez-López, F.J., Corchado Rodríguez, J.M. (eds.) Management Intelligent Systems, vol. 171, pp. 101–109. Springer, Berlin, Heidelberg (2012)Google Scholar
  27. 27.
    Garrido, A.L., Soledad Pera, M., Ilarri, S.: SOLE-R: a semantic and linguistic approach for book recommendations. In: 14th International Conference on Advanced Learning Technologies (ICALT), 2014, pp. 524–528. IEEE, Athens (2014)Google Scholar
  28. 28.
    Khusro, S., Latif, A., Ullah, I.: On methods and tools of table detection, extraction and annotation in PDF documents. J. Inf. Sci. 41, 41–57 (2015)CrossRefGoogle Scholar
  29. 29.
    Wu, H., Kazai, G., Taylor, M.: Book search experiments: Investigating IR methods for the indexing and retrieval of books. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) Advances in Information Retrieval, pp. 234–245. Springer, Berlin, Heidenberg (2008)CrossRefGoogle Scholar
  30. 30.
    Vakkari, P.: Finding fiction: Known items or good books to read. In: BooksOnline ’09 Workshop: 2nd Workshop on Research Advances in Large Digital Book Collections, Corfu, Greece (2009)Google Scholar
  31. 31.
    Agrawal, H., Yadav, S.: Search engine results improvement—a review. IEEE Int. Conf. Comput. Intell. Commun. Technol. (CICT) 2015, 180–185 (2015)CrossRefGoogle Scholar
  32. 32.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intell. Syst. 16, 72–79 (2001)CrossRefGoogle Scholar
  33. 33.
    Latif, A., Khusro, S., Ahmad, N., Ullah, I.: A hybrid approach for annotating book tables. Int. Arab J. Inf. Technol. (accepted for publication)Google Scholar
  34. 34.
    Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Tablerank: a ranking algorithm for table search and retrieval. In: Proceedings of the National Conference on Artificial Intelligence, vol. 22, pp. 317. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press, Vancouver, British Columbia (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PeshawarPeshawarPakistan
  2. 2.Department of Computer ScienceShaheed Benazir Bhutto UniversitySheringalPakistan

Personalised recommendations