Abstract
In the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and time-consuming ETL (Extract-Transform-Load) processes are used, towards data lakes, which can be viewed as storage repositories holding a vast amount of raw data. In this paper, we position ourselves in the recurrent context where a user has a local dataset that is not sufficient for processing the queries that are of interest to him. In this context, we show how the data lake, or more specifically the service lake since we are focusing on data providing services, can be leveraged to enrich the local dataset with concepts that cater for the processing of user queries. Furthermore, we present the algorithms we have developed for this purpose and showcase the working of our solution using a study case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anisimov, A.A.: Review of the data warehouse toolkit: the complete guide to dimensional modeling. SIGMOD Rec. 32, 101–102 (2003)
Arens, Y., Chee, C.Y., Hsu, C., Knoblock, C.A.: Retrieving and integrating data from multiple information sources. Int. J. Coop. Inf. Syst. 2, 127–158 (1993)
Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G., Melchiori, M., Vincini, M.: Information integration: the MOMIS project demonstration. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 611–614 (2000)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32, 13–47 (2006)
Chawathe, S.S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J.D., Widom, J.: The TSIMMIS project: integration of heterogeneous information sources. In: IPSJ, pp. 7–18 (1994)
Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16 (2006)
Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying heterogeneous information sources using source descriptions. In: Proceedings of the 22th International Conference on Very Large Data Bases, pp. 251–262 (1996)
Liu, H., Singh, P.: Conceptnet: a practical commonsense reasoning toolkit. BT Tech. J. 22, 211–226 (2004)
Preda, N., Kasneci, G., Suchanek, F.M., Neumann, T., Yuan, W., Weikum, G.: Active knowledge: dynamically enriching RDF knowledge bases by web services. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 399–410 (2010)
Quix, C.: Managing data lakes in big data era. In: Proceedings 5th International Conference on Cyber Technology in Automation, Control and Intelligent Systems, pp. 820–824 (2015)
Truong, H.L., Dustdar, S.: On analyzing and specifying concerns for data as a service. In: 4th IEEE Asia-Pacific Services Computing Conference, pp. 87–94 (2009)
Tuchinda, R., Knoblock, C.A., Szekely, P.A.: Building mashups by demonstration. Trans. Web 5, 16: 1–16: 45 (2011)
Ziegler, P., Dittrich, K.R.: Three decades of data integration - all problems solved? In: Jacquart, R. (ed.) Building the Information Society. IFIP International Federation for Information Processing, vol. 156, pp. 3–12. Springer, Toulouse (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alili, H., Belhajjame, K., Grigori, D., Drira, R., Ghezala, H.H.B. (2017). On Enriching User-Centered Data Integration Schemas in Service Lakes. In: Abramowicz, W. (eds) Business Information Systems. BIS 2017. Lecture Notes in Business Information Processing, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-319-59336-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-59336-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59335-7
Online ISBN: 978-3-319-59336-4
eBook Packages: Computer ScienceComputer Science (R0)