Distributed and Parallel Databases

, Volume 36, Issue 2, pp 265–300 | Cite as

Executable schema mappings for statistical data processing

  • Paolo Atzeni
  • Luigi Bellomarini
  • Francesca Bugiotti
  • Marco De Leonardis
Article
  • 53 Downloads

Abstract

Data processing is the core of any statistical information system. Statisticians are interested in specifying transformations and manipulations of data at a high level, in terms of entities of statistical models. We illustrate here a proposal where a high-level language, EXL, is used for the declarative specification of statistical programs, and a translation into executable form in various target systems is available. The language is based on the theory of schema mappings, in particular those defined by a specific class of tgds, which we actually use to optimize user programs and facilitate the translation towards several target systems. The characteristics of such class guarantee good tractability properties and the applicability in Big Data settings. A concrete implementation, EXLEngine, has been carried out and is currently used at the Bank of Italy.

Keywords

Schema mappings Statistical data Scalable data processing ETL 

References

  1. 1.
    Arenas, M., Fagin, R., Nash, A.: Composition with target constraints. Logical Methods Comput. Sci. 7(3) (2011)Google Scholar
  2. 2.
    Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: PODS, pp. 14–26 (2014)Google Scholar
  3. 3.
    Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P., Gianforme, G.: Model-independent schema translation. VLDB J. 17, 1347–1370 (2008)CrossRefGoogle Scholar
  4. 4.
    Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: MISM: a platform for model-independent solutions to model management problems. J. Data Semant. 14, 133–161 (2009)CrossRefGoogle Scholar
  5. 5.
    Atzeni, P., Bellomarini, L., Bugiotti, F., Celli, F., Gianforme, G.: A runtime approach to model-generic translation of schema and data. Inf. Syst. 37, 269–287 (2012)CrossRefGoogle Scholar
  6. 6.
    Atzeni, P., Bellomarini, L., Bugiotti, F.: Exlengine: executable schema mappings for statistical data processing. In: EDBT, pp. 672–682 (2013)Google Scholar
  7. 7.
    Bellomarini, L., Gottlob, G., Pieris, A., Sallinger, E.: Swift logic for big data and knowledge graphs. In: IJCAI, pp. 2–10 (2017)Google Scholar
  8. 8.
    Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)Google Scholar
  9. 9.
    Boehm, M., Tatikonda, S., Reinwald, B., Sen, P., Tian, Y., Burdick, D., Vaithyanathan, S.: Hybrid parallelization strategies for large-scale machine learning in systemml. PVLDB 7(7), 553–564 (2014)Google Scholar
  10. 10.
    Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R.: Heptox: Marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005)Google Scholar
  11. 11.
    Brockwell, P.J., Davis, R.A. (eds.): Introduction to Time Series and Forecasting. Springer, New York (2002)MATHGoogle Scholar
  12. 12.
    Calì, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. In: PODS, pp. 77–86 (2009)Google Scholar
  13. 13.
    Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data complexity of query answering in description logics (extended abstract). In: IJCAI, pp. 4163–4167 (2015)Google Scholar
  14. 14.
    Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, PODS ’98, pp. 34–43, New York, NY, USA, (1998). ACMGoogle Scholar
  15. 15.
    Chaudhuri, S., Shim, K.: Including group-by in query optimization. In: VLDB, pp. 354–366. Morgan Kaufmann, Burlington (1994)Google Scholar
  16. 16.
    Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)Google Scholar
  17. 17.
    Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating R and hadoop. In: SIGMOD, pp. 987–998 (2010)Google Scholar
  18. 18.
    Del Vecchio, V.: Statistical data and concepts representation. Bank of Italy (1997). http://goo.gl/YIAqDp
  19. 19.
    Del Vecchio, V., Di Giovanni, F., Pambianco, S.: The “matrix” model. Bank of Italy (2007). http://goo.gl/Dj2XT0
  20. 20.
    Dessloch, S., Hernández, M., Wisnesky, R., Radwan, A., Zhou, J.: Orchid: integrating schema mapping and ETL. In: ICDE, pp. 1307–1316 (2008)Google Scholar
  21. 21.
    Di Giovanni, F., Piazza, D.: Processing and managing statistical data: a national central bank experience. Bank of Italy (2009). http://goo.gl/ZNi5zh
  22. 22.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. In: ICDT, pp. 207–224 (2003)Google Scholar
  23. 23.
    Fagin, R., Kolaitis, P.G., Popa, L.: Data exchange: getting to the core. ACM Trans. Database Syst. 30(1), 174–210 (2005)CrossRefMATHGoogle Scholar
  24. 24.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings: second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)CrossRefGoogle Scholar
  25. 25.
    Fagin, R., Haas, L., Hernández, M., Miller, R., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Conceptual Modeling: Foundations and Applications, pp. 198–236 (2009)Google Scholar
  26. 26.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: Schema Matching and Mapping, pp. 191–222 (2011)Google Scholar
  27. 27.
    Gottlob, G., Pichler, R., Savenkov, V.: Normalization and optimization of schema mappings. PVLDB 2(1), 1102–1113 (2009)Google Scholar
  28. 28.
    Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: SIGMOD, pp. 805–810. ACM (2005)Google Scholar
  29. 29.
    Kolaitis, P.: Schema mappings, data exchange, and metadata management. In: PODS, pp. 61–75 (2005)Google Scholar
  30. 30.
    Kolaitis, P.G., Panttaja, J., Tan, W.C.: The complexity of data exchange. In: SIGMOD, pp. 30–39 (2006)Google Scholar
  31. 31.
    Mahdi, E.: A survey of r software for parallel computing. Am. J. Appl. Math. Stat. 2(4), 224–230 (2014)CrossRefGoogle Scholar
  32. 32.
    Mecca, G., Papotti, P., Raunich, S.: Core schema mappings: scalable core computations in data exchange. Inf. Syst. 37(7), 677–711 (2012)CrossRefGoogle Scholar
  33. 33.
    Mumick, I.S., Pirahesh, H., Ramakrishnan, R.: The magic of duplicates and aggregates. In: VLDB, pp. 264–277 (1990)Google Scholar
  34. 34.
    Ramsay, J.O., Hooker, G., Graves, S. (eds.): Functional Data Analysis with R and Matlab. Springer, New York (2009)MATHGoogle Scholar
  35. 35.
    Sallinger, E.: Reasoning about schema mappings. In: Data Exchange, Integration, and Streams, pp. 97–127 (2013)Google Scholar
  36. 36.
    Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., Mansmann, U.: State of the art in parallel computing with r. J. Stat. Softw. 31(1), 1–27 (2009). 8CrossRefGoogle Scholar
  37. 37.
    Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K., Maier, D., Ratzesberger, O., Zdonik, S.B.: Requirements for science data bases and SCIDB. In: CIDR (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Università Roma TreRomeItaly
  2. 2.Università Roma Tre & Bank of ItalyRomeItaly
  3. 3.LRI, CentraleSupélecUniversité Paris-SaclayGif-sur-YvetteFrance
  4. 4.Hewlett-PackardRomeItaly

Personalised recommendations