Abstract
In this work we argue that two main gaps currently hinder the development of new applications requiring sophisticated data discovery capabilities over rich (semi-structured) entity-relationship data. The first gap exists at the conceptual level, and the second at the logical level. Aiming at fulfilling the identified gaps, we propose a novel methodology for developing data discovery applications. We first describe a data discovery extension to the classic ER conceptual model termed Entity Relationship Data Discovery (ERD2). We further present a novel logical model termed the Document Category Sets (DCS) model, used to represent entities and their relationships within an enhanced document model, and describe how data discovery requirements captured by the ERD2 conceptual model can be translated into the DCS logical model. Finally, we propose an efficient data discovery system implementation, and share details of two different data discovery applications that were developed in IBM using the proposed methodology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ruthven, I.: Interactive information retrieval. Annual Rev. Info. Sci & Technol. 42, 43–91 (2008)
Tunkelang, D.: Faceted Search. Morgan & Claypool Publishers (2009)
Basu Roy, S., Wang, H., Das, G., Nambiar, U., Mohania, M.: Minimum-effort driven dynamic faceted search in structured databases. In: Proceedings of CIKM, pp. 13–22. ACM, New York (2008)
Ben-Yitzhak, O., Golbandi, N., Har’El, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D., Shekita, E., Sznajder, B., Yogev, S.: Beyond basic faceted search. In: Proceedings of WSDM, pp. 33–44. ACM (2008)
Roitman, H., Yogev, S., Tsimerman, Y., Kim, D.W., Mesika, Y.: Exploratory search over social-medical data. In: Proceedings of CIKM, pp. 2513–2516. ACM, New York (2011)
Yogev, S., Roitman, H., Carmel, D., Zwerdling, N.: Towards expressive exploratory search over entity-relationship data. In: Proceedings of WWW (2012)
Chen, P.P.-S.: The entity-relationship model-toward a unified view of data. ACM Trans. Database Syst. 1, 9–36 (1976)
Ceri, S., Fraternali, P., Bongio, A.: Web modeling language (webml): a modeling language for designing web sites. In: Proceedings of WWW, pp. 137–157. North-Holland Publishing Co., Amsterdam (2000)
Baresi, L., Garzotto, F., Paolini, P.: Extending uml for modeling web applications. In: Proceedings of HICSS, p. 3055. IEEE Computer Society, Washington, DC (2001)
Hanus, M., Koschnicke, S.: An ER-Based Framework for Declarative Web Programming. In: Carro, M., Peña, R. (eds.) PADL 2010. LNCS, vol. 5937, pp. 201–216. Springer, Heidelberg (2010)
Bozzon, A., Iofciu, T., Nejdl, W., Tönnies, S.: Integrating Databases, Search Engines and Web Applications: A Model-Driven Approach. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 210–225. Springer, Heidelberg (2007)
Clarkson, E.C., Navathe, S.B., Foley, J.D.: Generalized formal models for faceted user interfaces. In: Proceedings of JCDL, pp. 125–134. ACM, New York (2009)
Bonino, D., Corno, F., Farinetti, L.: Faset: A set theory model for faceted search. In: Proceedings of WI-IAT, pp. 474–481. IEEE Computer Society, Washington (2009)
Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of PODS, pp. 1–9. ACM (2006)
Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: Proceedings of VLDB. VLDB Endowment, pp. 670–681 (2002)
Zhou, Q., Wang, C., Xiong, M., Wang, H., Yu, Y.: SPARK: Adapting Keyword Query to Semantic Search. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 694–707. Springer, Heidelberg (2007)
Lei, Y., Uren, V., Motta, E.: Semsearch: A search engine for the semantic web. Managing Knowledge in a World of Networks, 238–245 (2006)
Balog, K., Meij, E., de Rijke, M.: Entity search: building bridges between two worlds. In: Proceedings of SEMSEARCH, pp. 9:1–9:5. ACM (2010)
Stonebraker, M.: Sql databases v. nosql databases. Commun. ACM 53, 10–11 (2010)
Carmel, D., Zwerdling, N., Yogev, S.: Entity oriented search and exploration for cultural heritage collections. In: Proceedings of WWW (2012)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of WWW, pp. 697–706. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yogev, S., Roitman, H. (2012). Bridging the Gaps towards Advanced Data Discovery over Semi-structured Data. In: Atzeni, P., Cheung, D., Ram, S. (eds) Conceptual Modeling. ER 2012. Lecture Notes in Computer Science, vol 7532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34002-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-34002-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34001-7
Online ISBN: 978-3-642-34002-4
eBook Packages: Computer ScienceComputer Science (R0)