Skip to main content

Bridging the Gaps towards Advanced Data Discovery over Semi-structured Data

  • Conference paper
Book cover Conceptual Modeling (ER 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7532))

Included in the following conference series:

  • 2847 Accesses

Abstract

In this work we argue that two main gaps currently hinder the development of new applications requiring sophisticated data discovery capabilities over rich (semi-structured) entity-relationship data. The first gap exists at the conceptual level, and the second at the logical level. Aiming at fulfilling the identified gaps, we propose a novel methodology for developing data discovery applications. We first describe a data discovery extension to the classic ER conceptual model termed Entity Relationship Data Discovery (ERD2). We further present a novel logical model termed the Document Category Sets (DCS) model, used to represent entities and their relationships within an enhanced document model, and describe how data discovery requirements captured by the ERD2 conceptual model can be translated into the DCS logical model. Finally, we propose an efficient data discovery system implementation, and share details of two different data discovery applications that were developed in IBM using the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ruthven, I.: Interactive information retrieval. Annual Rev. Info. Sci & Technol. 42, 43–91 (2008)

    Article  Google Scholar 

  2. Tunkelang, D.: Faceted Search. Morgan & Claypool Publishers (2009)

    Google Scholar 

  3. Basu Roy, S., Wang, H., Das, G., Nambiar, U., Mohania, M.: Minimum-effort driven dynamic faceted search in structured databases. In: Proceedings of CIKM, pp. 13–22. ACM, New York (2008)

    Chapter  Google Scholar 

  4. Ben-Yitzhak, O., Golbandi, N., Har’El, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D., Shekita, E., Sznajder, B., Yogev, S.: Beyond basic faceted search. In: Proceedings of WSDM, pp. 33–44. ACM (2008)

    Google Scholar 

  5. Roitman, H., Yogev, S., Tsimerman, Y., Kim, D.W., Mesika, Y.: Exploratory search over social-medical data. In: Proceedings of CIKM, pp. 2513–2516. ACM, New York (2011)

    Google Scholar 

  6. Yogev, S., Roitman, H., Carmel, D., Zwerdling, N.: Towards expressive exploratory search over entity-relationship data. In: Proceedings of WWW (2012)

    Google Scholar 

  7. Chen, P.P.-S.: The entity-relationship model-toward a unified view of data. ACM Trans. Database Syst. 1, 9–36 (1976)

    Article  Google Scholar 

  8. Ceri, S., Fraternali, P., Bongio, A.: Web modeling language (webml): a modeling language for designing web sites. In: Proceedings of WWW, pp. 137–157. North-Holland Publishing Co., Amsterdam (2000)

    Google Scholar 

  9. Baresi, L., Garzotto, F., Paolini, P.: Extending uml for modeling web applications. In: Proceedings of HICSS, p. 3055. IEEE Computer Society, Washington, DC (2001)

    Google Scholar 

  10. Hanus, M., Koschnicke, S.: An ER-Based Framework for Declarative Web Programming. In: Carro, M., Peña, R. (eds.) PADL 2010. LNCS, vol. 5937, pp. 201–216. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Bozzon, A., Iofciu, T., Nejdl, W., Tönnies, S.: Integrating Databases, Search Engines and Web Applications: A Model-Driven Approach. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 210–225. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Clarkson, E.C., Navathe, S.B., Foley, J.D.: Generalized formal models for faceted user interfaces. In: Proceedings of JCDL, pp. 125–134. ACM, New York (2009)

    Chapter  Google Scholar 

  13. Bonino, D., Corno, F., Farinetti, L.: Faset: A set theory model for faceted search. In: Proceedings of WI-IAT, pp. 474–481. IEEE Computer Society, Washington (2009)

    Google Scholar 

  14. Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of PODS, pp. 1–9. ACM (2006)

    Google Scholar 

  15. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of VLDB. VLDB Endowment, pp. 670–681 (2002)

    Google Scholar 

  16. Zhou, Q., Wang, C., Xiong, M., Wang, H., Yu, Y.: SPARK: Adapting Keyword Query to Semantic Search. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 694–707. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Lei, Y., Uren, V., Motta, E.: Semsearch: A search engine for the semantic web. Managing Knowledge in a World of Networks, 238–245 (2006)

    Google Scholar 

  18. Balog, K., Meij, E., de Rijke, M.: Entity search: building bridges between two worlds. In: Proceedings of SEMSEARCH, pp. 9:1–9:5. ACM (2010)

    Google Scholar 

  19. Stonebraker, M.: Sql databases v. nosql databases. Commun. ACM 53, 10–11 (2010)

    Article  Google Scholar 

  20. Carmel, D., Zwerdling, N., Yogev, S.: Entity oriented search and exploration for cultural heritage collections. In: Proceedings of WWW (2012)

    Google Scholar 

  21. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of WWW, pp. 697–706. ACM, New York (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yogev, S., Roitman, H. (2012). Bridging the Gaps towards Advanced Data Discovery over Semi-structured Data. In: Atzeni, P., Cheung, D., Ram, S. (eds) Conceptual Modeling. ER 2012. Lecture Notes in Computer Science, vol 7532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34002-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34002-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34001-7

  • Online ISBN: 978-3-642-34002-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics