Abstract
Transforming data is a fundamental operation in application scenarios involving data integration, legacy data migration, data cleaning, and extract-transform-load processes. Data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, relational query languages like SQL are not expressive enough to specify an important class of data transformations that produce several output tuples for a single input tuple. This class of data transformations is required for solving the data heterogeneities that occur when source data represents an aggregation of target data.
In this paper, we propose and formally define the data mapper operator as an extension of the relational algebra to address one-to-many data transformations. We supply an algebraic rewriting technique that enables the optimization of data transformation expressions that combine filters expressed as standard relational operators with mappers. Furthermore, we identify the two main factors that influence the expected optimization gains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Ullman, J.D.: Universality of data retrieval languages. In: Proc. of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 110–119. ACM Press, New York (1979)
Bernstein, P.A., Rahm, E.: Data wharehouse scenarios for model management. In: Int’l Conf. on Conceptual Modeling / the Entity Relationship Approach (2000)
Carreira, P., Galhardas, H.: Efficient development of data migration transformations. In: ACM SIGMOD Int’l Conf. on the Managment of Data (June 2004)
Carreira, P., Galhardas, H.: Execution of Data Mappers. In: Int’l Workshop on Information Quality in Information Systems. ACM, New York (2004)
Carreira, P., Galhardas, H., Lopes, A., Pereira, J.: Extending the relational algebra with the Mapper operator. DI/FCUL TR 05–2, Department of Informatics, University of Lisbon (January 2005), Available at the url, http://www.di.fc.ul.pt/tech-reports
Chaudhuri, S., Shim, K.: Query optimization in the presence of foreign functions. In: Proc. of the Int’l Conf. on Very Large Data Bases, VLDB 1993 (1993)
Galhardas, H., Florescu, D., Shasha, D., Simon, E.: Ajax: An extensible data cleaning tool. In: ACM SIGMOD Int’l Conf. on Management of Data, vol. 2(29) (2000)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative Data Cleaning: Language, Model, and Algorithms. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)
Haas, L., Miller, R., Niswonger, B., Roth, M.T., Scwarz, P.M., Wimmers, E.L.: Transforming heterogeneous data with database middleware: Beyond integration. Special Issue on Data Transformations. IEEE Data Eng. Bulletin 22(1) (1999)
Hellerstein, J.M.: Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems 22(2), 113–157 (1998)
Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., Lee, D.: A taxonomy of dirty data. Data Mining and Knowledge Discovery 7(1), 81–99 (2003)
Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: SchemaSQL - A Language for Querying and Restructuring Database Systems. In: Proc. Int’l Conf. on Very Large Databases (VLDB 1996), Bombay, India, September 1996, pp. 239–250 (1996)
Miller, R.J.: Using Schematically Heterogeneous Structures. In: Proc. of ACM SIGMOD Int’l Conf. on the Managment of Data, June 1998, vol. 2(22), pp. 189–200 (1998)
Rahm, E., Do, H.-H.: Data Cleaning: Problems and current approaches. IEEE Bulletin of the Technical Comittee on Data Engineering 24(4) (2000)
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the Int’l Conf. on Very Large Data Bases, VLDB 2001 (2001)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: ACM SIGMOD Int’l Conf. on the Managment of Data (1979)
Shu, N.C., Housel, B.C., Lum, V.Y.: CONVERT: A High Level Translation Definition Language for Data Conversion. Communications of the ACM 18(10), 557–567 (1975)
Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: A Data EXtraction, Processing and REStructuring System. ACM Transactions on Database Systems 2(2), 134–174 (1977)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proc. of the 21st Int’l Conf. on Data Engineering (ICDE) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carreira, P., Galhardas, H., Pereira, J., Lopes, A. (2005). Data Mapper: An Operator for Expressing One-to-Many Data Transformations. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_14
Download citation
DOI: https://doi.org/10.1007/11546849_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)