Skip to main content
Log in

Tracing Lineage of Array Data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Arrays are a common and important class of data in many applications. Arrays can model data such as digital images, digital video, scientific and experimental data, matrices, and finite element grids. Although array manipulations are diverse and domain-specific, they often exhibit structural regularities. This paper describes an algorithm called sub-pushdown to trace data lineage in such array computations. Lineage tracing is a type of data-flow analysis that relates parts of a result array to those parts of the argument (base) arrays that have bearings on the result array parts. Sub-pushdown can be used to trace data lineage in array-manipulating computations expressed in the Array Manipulation Language (AML) that was introduced previously. Sub-pushdown has several useful features. First, the lineage computation is expressed as an AML query. Second, it is not necessary to evaluate the AML lineage query to compute the array data lineage. Third, sub-pushdown never gives false-negative answers. Sub-pushdown has been implemented as part of the ArrayDB prototype array database system that we have built.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal, R., Gupta, A., and Sarawagi., S. (1997). Modeling Multidimensional Databases. In Proceedings of the Thirteenth International Conference on Data Engineering, Birmingham, UK (pp. 232-243).

  • Buneman, P., Khanna, S., and Tan, W.-C. (2001). Why and Where: A Characterization of Data Provenance. In Jan Van den Bussche and Victor Vianu (Eds.), ICDT, Lecture Notes in Computer Science, Vol. 1973. London, U.K.: Springer (pp. 316-330).

    Google Scholar 

  • Cui, Y. and Widom, J. (2000). Practical Lineage Tracing in Data Warehouses. In Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, California (pp. 367-378).

  • Furtado, P. and Baumann, P. (1999). Storage of Multidimensional Arrays Based on Arbitrary Tiling. In Proceedings of the 15th International Conference on Data Enginering, Sydney, Australia (pp. 480-489).

  • Garcia-Molina, H., Ullman, J.D., and Widom, J. (2000). Database System Implementation. Upper Saddle River, New Jersey: Prentice Hall.

    Google Scholar 

  • Guibas, L.J. and Wyatt, D.K. (1978). Compilation and Delayed Evaluation in APL. In Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona (pp. 1-8).

  • Gyssens, M. and Lakshmanan, L.V.S. (1997). A Foundation for Multi-Dimensional Databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece (pp. 106-115). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Mallat. S. (1998). A Wavelet Tour of Signal Processing. San Diego, California: Academic Press.

    Google Scholar 

  • Marathe, A.P. (2001a). Query Processing Techniques for Arrays. Ph. D. thesis, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada.

    Google Scholar 

  • Marathe, A.P. (2001b). Tracing Lineage of Array Data. In Proceedings of the Thirteenth International Conference Scientific and Statistical Database Management, Fairfax, Virginia.

  • Marathe, A.P. and Salem, K. (1997). A Language for Manipulating Arrays. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece (pp. 46-55). San Matio, CA: Morgan Kaufmann.

    Google Scholar 

  • Marathe, A.P. and Salem, K. (1999). Query ProcessingTechniques for Arrays. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA (pp. 323-334). New York: ACM Press.

    Google Scholar 

  • Raymond, D. (1993). Visualizing Texts. In Proceedings of the Ninth Annual Conference of the UW Centre for the New OED and Text Research, Oxford, England (pp. 19-32).

  • Sarawagi, S. and Stonebraker,M. (1994). Efficient Organization of Large Multidimensional Arrays. In Proceedings of the 10th International Conference on Data Engineering, Houston, Texas (pp. 328-336). Los Alamitos, CA: IEEE Computer Society Press.

    Google Scholar 

  • Wallace, G.K. (1991). The JPEG Still Picture Compression Standard. Communications of the ACM, 34(4), 30-44.

    Google Scholar 

  • Woodruff, A. and Stonebraker, M. (1997). Supporting Fine-Grained Data Lineage in a Database Visualization Environment. In Proceedings of the Thirteenth International Conference on Data Engineering, Birmingham, UK (pp. 91-102).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marathe, A.P. Tracing Lineage of Array Data. Journal of Intelligent Information Systems 17, 193–214 (2001). https://doi.org/10.1023/A:1012857830230

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1012857830230

Navigation