Abstract
An approach for rule-based specification of data integration using RIF-BLD logic dialect that is a recommendation of W3C is presented. The approach allows to combine entities defined in different sources represented in different data models (relational, XML, graph-based, document-based) in the same rule. Logical semantics of RIF-BLD provides for unambiguous interpretation of data integration rules. The paper proposes an approach for implementation of RIF-BLD rules using IBM High-level integration language (HIL) as well. Thus data integration rules can be compiled into MapReduce programs and executed over Hadoop-based distributed infrastructures.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Apache Hadoop Project (2017). http://hadoop.apache.org/
Ballard, C., Alon, T., Dronavalli, N., Jennings, S., Lee, M., Toratani, S.: IBM InfoSphere Information Server Deployment Architectures (2012). ibm.com/redbooks
Bar-Or, A., Choudhary, S.: Transform XML using the DataStage XML stage. IBM developerWorks (2011)
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J.: Jaql: a scripting language for large scale semistructured data analysis. In: 37th International conference on very large data bases VLDB, pp. 1272–1283. Curran Associates, New York (2011)
Boley, H., Kifer, M. (eds.): RIF Framework for Logic Dialects. W3C Recommendation, 2nd edn., 5 February 2013
Boley, H., Kifer, M. (eds.): RIF Basic Logic Dialect. W3C Recommendation, 2nd edn., 5 February 2013
Burdick, D., Hernández, M.A., Ho, H., Koutrika, G., Krishnamurthy, R., Popa, L., Stanoi, I.R., Vaithyanathan, S., Das, S.: Extracting, linking and integrating data from public sources: a financial case study. IEEE Data Eng. Bull. 34(3), 60–67 (2011)
Devyatkin, D., Shelmanov, A.: Text processing framework for emergency event detection in the Arctic zone. In: Kalinichenko, L., Kuznetsov, Sergei O., Manolopoulos, Y. (eds.) DAMDID/RCDL 2016. CCIS, vol. 706, pp. 74–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57135-5_6
Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: semantics and query answering. Theoret. Comput. Sci. 336(1), 89–124 (2005)
Hernandez, M., Koutrika, G., Krishnamurthy, R., Popa, L., Wisnesky, R.: HIL: A high-level scripting language for entity integration. In: 16th Conference (International) on Extending Database Technology Proceedings EDBT 2013, pp. 549–560 (2013)
IBM InfoSphere BigInsights Version 3.0 Information Center. https://goo.gl/lZpEQd
InfoSphere Big Match for Hadoop. Technical Overview. https://goo.gl/0TMqvw
Introducing JSON. http://www.json.org/
Miner, D.: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly Media, Newton (2012)
The Apache Hive data warehouse software. http://hive.apache.org/
Briukhov, D.O., Skvortsov, N.A., Stupnikov, S.A.: Methods of integration of multistructured data on Arctic zone for extraction of information aimed at support of search and rescue operations. Highly Available Syst. 13(2), 3–19 (2017)
The Unified State System of Information on the Global Ocean. http://portal.esimo.ru/portal
Complex integrated information system MoRe. http://www.marsat.ru/ciis-more
Sea Rescue (Poisk-More) Software Suite. http://map.geopallada.ru/
Skvortsov, N.A., Briukhov, D.O.: Development of information warehouse schema for support of search and rescue activities in the Arctic region. Highly Available Syst. 13(2), 20–44 (2017)
Stupnikov, S.: Specification and implementation of multimodel data integration rules. In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, vol. 2022, pp. 197–205 (2017)
Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Frame-work, 2nd edn. Addison-Wesley Professional, Boston (2008)
EMFText Concrete Syntax Mapper. http://www.emftext.org/index.php/EMFText
Abstract and concrete syntax of RIF-FLD. GitHub Repository (2018). https://github.com/sstupnikov/ModelTransformation/tree/master/RIF_FLD/
Acknowledgement
The research is partially supported by Russian Foundation for Basic Research, projects 15-29-06045, 18-07-01434.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Stupnikov, S. (2018). Rule-Based Specification and Implementation of Multimodel Data Integration. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-96553-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96552-9
Online ISBN: 978-3-319-96553-6
eBook Packages: Computer ScienceComputer Science (R0)