Data Transformation Knowledge Reuse in Spreadsheet-Based Mashup Development Platform

  • Vu Hung
  • Boualem Benatallah
  • Angel Lagares Lemos
Chapter

Abstract

Data transformation is a key task in mashup development (e.g., access to heterogeneous services, data flow). It is considered as a labour-intensive and error-prone process. The possibility of reusing previously specified mappings promises a significant reduction in manual and time-consuming transformation tasks, nevertheless its potential has not been fully realized in current approaches and systems. In this chapter, we study the problem of data transformation logic reuse in mashup development platforms. We formulate the problem and propose a solution that features novel reuse abstractions and techniques including spreadsheet templates, mapping generalization, and similarity join. Given a spreadsheet instance that is being mapped to the target schema, we recommend a list of mapping formulas that can be potentially reused for the instance. We implemented a prototype of the proposed solution and evaluated its performance via synthetic datasets.

Keywords

Prefix Editing Suffix Glean 

References

  1. 1.
    Merrill, D.: Mashups: The new breed of web app. IBM Web Architecture Technical Library, pp. 1–13 (2006)Google Scholar
  2. 2.
    Yu, J., Benatallah, B., Casati, F., Daniel, F.: Understanding mashup development. IEEE Internet Comput. 12(5), 44–52 (2008)CrossRefGoogle Scholar
  3. 3.
    Y. Corp. Yahoo! pipes. http://pipes.yahoo.com/pipes. Accessed 03 July 2012
  4. 4.
    Intel Mash Maker. http://mashmaker.intel.com. Accessed 15 June 2012
  5. 5.
    Kovanovic, V., Djuric, D.: Highway: a domain specific language for enterprise application integration. In: Proceedings of the 5th India Software Engineering Conference, pp. 33–36. ACM (2012)Google Scholar
  6. 6.
    Kongdenfha, W., Benatallah, B., Vayssière, J., Saint-Paul, R., Casati, F.: Rapid development of spreadsheet-based web mashups. In: Proceedings of the 18th International Conference on World Wide Web, pp. 851–860. ACM (2009)Google Scholar
  7. 7.
    Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1749–1754. ACM (2011)Google Scholar
  8. 8.
    Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: VLHCC ’05: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207–214. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  9. 9.
    Abraham, R., Erwig, M.: Header and unit inference for spreadsheets through spatial analyses. In: VLHCC ’04: Proceedings of the 2004 IEEE Symposium on Visual Languages—Human Centric Computing, pp. 165–172. IEEE Computer Society, Washington, DC, USA (2004)Google Scholar
  10. 10.
    Jones, S., Blackwell, A., Burnett, M.: A user-centered approach to functions in excel. In: Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming, pp. 165–176. ACM Press (2003)Google Scholar
  11. 11.
    Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 805–810. ACM, New York, NY, USA (2005)Google Scholar
  12. 12.
    Roth, M., Hernandez, M.A., Coulthard, P., Yan, L., Popa, L., Ho, H.C.-T., Salter, C.C.: Xml mapping technology: making connections in an xml-centric world. IBM Syst. J. 45(2), 389–409 (2006)CrossRefGoogle Scholar
  13. 13.
    Hernandez, M., Miller, R., Haas, L.: Clio: a semi-automatic tool for schema mapping. In: SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, Inc., One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA (2001)Google Scholar
  14. 14.
    Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernandez, M.: Clip: a visual language for explicit schema mappings. In: 24th International Conference on Data Engineering (2008)Google Scholar
  15. 15.
    Altova. Mapforce—graphical data mapping, conversion, and integration tool. http://www.altova.com/mapforce.html. Accessed 25 May 2011
  16. 16.
    IBM. Infosphere Data Architect. http://www-01.ibm.com/software/data/optim/data-architect/. Accessed 25 Oct 2010
  17. 17.
    Microsoft. Creating Maps Using Biztalk Mapper. http://msdn.microsoft.com/en-us/library/aa559261(v=BTS.70).aspx. Accessed 13 Apr 2011
  18. 18.
    Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 918–929. VLDB Endowment (2006)Google Scholar
  19. 19.
    Chaudhuri, S., Ganti, V., Kaushik, R., A primitive operator for similarity joins in data cleaning. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, p. 5. IEEE (2006)Google Scholar
  20. 20.
    Xiao, C., Wang, W., Lin, X., Yu, J., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst. 36(3), 15 (2011)CrossRefGoogle Scholar
  21. 21.
    Lakshmanan, L.V.S., Subramanian, S.N., Goyal, N., Krishnamurthy, R.: On query spreadsheets. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 134–141. IEEE Computer Society, Washington, DC, USA (1998)Google Scholar
  22. 22.
    Mecca, G., Papotti, P., Raunich, S.: Core schema mappings. In: SIGMOD (2009)Google Scholar
  23. 23.
    Robertson, G.G., Czerwinski, M.P., Churchill, J.E.: Visualization of mappings between schemas. In: CHI ’05: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 431–439. ACM, New York, NY, USA (2005)Google Scholar
  24. 24.
    Rice, F.: Creating xml mappings in excel 2003. Technical Report, Microsoft Corporation (2005)Google Scholar
  25. 25.
    Brauer, B.: Next evolution of data integration into microsoft excel. Technical Report, StrikeIron (2005)Google Scholar
  26. 26.
    Erwig, M., Abraham, R., Cooperstein, I., Kollmansberger, S.: Automatic generation and maintenance of correct spreadsheets. In: ICSE ’05: Proceedings of the 27th International Conference on Software Engineering, pp. 136–145. ACM, New York, NY, USA (2005)Google Scholar
  27. 27.
    Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06: Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)Google Scholar
  28. 28.
    Fisher, M., Rothermel, G.: The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: ACM SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1–5. ACM (2005)Google Scholar
  29. 29.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)CrossRefGoogle Scholar
  30. 30.
    Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3, 484–493 (2010)Google Scholar
  31. 31.
    Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proceedings of the International Conference on Very Large Data Bases, pp. 491–500 (2001)Google Scholar
  32. 32.
    Do, H.-H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: VLDB ’02: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 610–621. VLDB Endowment (2002)Google Scholar
  33. 33.
    Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pp. 57–68. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  34. 34.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J.: Very Large Data Bases 10(4), 334–350 (2001)CrossRefMATHGoogle Scholar
  35. 35.
    Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, New York, NY, USA (2005)Google Scholar
  36. 36.
    Saha, B., Stanoi, I., Clarkson, K.L.: Schema covering: a step towards enabling reuse in information integration. In: ICDE, pp. 285–296 (2010)Google Scholar
  37. 37.
    Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 67–78. VLDB Endowment (2006)Google Scholar
  38. 38.
    Rice, F.: Introducing the office (2007) open xml file formats. Technical Report, Microsoft Corporation (2006)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Vu Hung
    • 1
  • Boualem Benatallah
    • 1
  • Angel Lagares Lemos
    • 1
  1. 1.University of New South WalesSydneyAustralia

Personalised recommendations