Abstract
Arrays are a common and important class of data in many applications. Arrays can model data such as digital images, digital video, scientific and experimental data, matrices, and finite element grids. Although array manipulations are diverse and domain-specific, they often exhibit structural regularities. This paper describes an algorithm called sub-pushdown to trace data lineage in such array computations. Lineage tracing is a type of data-flow analysis that relates parts of a result array to those parts of the argument (base) arrays that have bearings on the result array parts. Sub-pushdown can be used to trace data lineage in array-manipulating computations expressed in the Array Manipulation Language (AML) that was introduced previously. Sub-pushdown has several useful features. First, the lineage computation is expressed as an AML query. Second, it is not necessary to evaluate the AML lineage query to compute the array data lineage. Third, sub-pushdown never gives false-negative answers. Sub-pushdown has been implemented as part of the ArrayDB prototype array database system that we have built.
Similar content being viewed by others
References
Agrawal, R., Gupta, A., and Sarawagi., S. (1997). Modeling Multidimensional Databases. In Proceedings of the Thirteenth International Conference on Data Engineering, Birmingham, UK (pp. 232-243).
Buneman, P., Khanna, S., and Tan, W.-C. (2001). Why and Where: A Characterization of Data Provenance. In Jan Van den Bussche and Victor Vianu (Eds.), ICDT, Lecture Notes in Computer Science, Vol. 1973. London, U.K.: Springer (pp. 316-330).
Cui, Y. and Widom, J. (2000). Practical Lineage Tracing in Data Warehouses. In Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, California (pp. 367-378).
Furtado, P. and Baumann, P. (1999). Storage of Multidimensional Arrays Based on Arbitrary Tiling. In Proceedings of the 15th International Conference on Data Enginering, Sydney, Australia (pp. 480-489).
Garcia-Molina, H., Ullman, J.D., and Widom, J. (2000). Database System Implementation. Upper Saddle River, New Jersey: Prentice Hall.
Guibas, L.J. and Wyatt, D.K. (1978). Compilation and Delayed Evaluation in APL. In Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona (pp. 1-8).
Gyssens, M. and Lakshmanan, L.V.S. (1997). A Foundation for Multi-Dimensional Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece (pp. 106-115). San Mateo, CA: Morgan Kaufmann.
Mallat. S. (1998). A Wavelet Tour of Signal Processing. San Diego, California: Academic Press.
Marathe, A.P. (2001a). Query Processing Techniques for Arrays. Ph. D. thesis, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada.
Marathe, A.P. (2001b). Tracing Lineage of Array Data. In Proceedings of the Thirteenth International Conference Scientific and Statistical Database Management, Fairfax, Virginia.
Marathe, A.P. and Salem, K. (1997). A Language for Manipulating Arrays. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece (pp. 46-55). San Matio, CA: Morgan Kaufmann.
Marathe, A.P. and Salem, K. (1999). Query ProcessingTechniques for Arrays. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA (pp. 323-334). New York: ACM Press.
Raymond, D. (1993). Visualizing Texts. In Proceedings of the Ninth Annual Conference of the UW Centre for the New OED and Text Research, Oxford, England (pp. 19-32).
Sarawagi, S. and Stonebraker,M. (1994). Efficient Organization of Large Multidimensional Arrays. In Proceedings of the 10th International Conference on Data Engineering, Houston, Texas (pp. 328-336). Los Alamitos, CA: IEEE Computer Society Press.
Wallace, G.K. (1991). The JPEG Still Picture Compression Standard. Communications of the ACM, 34(4), 30-44.
Woodruff, A. and Stonebraker, M. (1997). Supporting Fine-Grained Data Lineage in a Database Visualization Environment. In Proceedings of the Thirteenth International Conference on Data Engineering, Birmingham, UK (pp. 91-102).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Marathe, A.P. Tracing Lineage of Array Data. Journal of Intelligent Information Systems 17, 193–214 (2001). https://doi.org/10.1023/A:1012857830230
Issue Date:
DOI: https://doi.org/10.1023/A:1012857830230