Fast Detection of Functional Dependencies in XML Data

  • Hang Shi
  • Toshiyuki Amagasa
  • Hiroyuki Kitagawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6309)


In this paper we discuss a scheme for efficiently detecting functional dependency in XML data (XFD). The ability to detect XFD in XML data is useful in many real-life applications, such as XML schema design, relational schema design based on XML data, and redundancy detection in XML data. However, detection of XFD is an expensive task, and an efficient algorithm is essential in order to deal with large XML data collection. For this reason, we propose an efficient way to detect XFD in XML data. We assume that XML data being processed are represented as hierarchically organized relational tables. Given such data, we attempt to detect XFDs existing within and among the tables. Our basic idea is to adopt the PipeSort algorithm, which has been successfully used in OLAP, to detect XFDs within a table. We modify the basic PipeSort algorithm by incorporating a pruning mechanism by taking the features of XFDs into account, thereby making the whole process even faster. Having obtained a set of XFDs existing in tables, we attempt to detect XFDs existing among tables. In this process, we also make use of the features of XFDs for pruning. We show the feasibility of our scheme by some experiments.


Functional Dependency Schema Element Fast Detection Relational Table Path Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Codd, E.F.: Further normalization of the data base relational model. IBM Research Report, San Jose, California RJ909 (1971)Google Scholar
  2. 2.
    Arenas, M., Libkin, L.: A normal form for XML documents. In: Proc. PODS 2002, pp. 85–96 (2002)Google Scholar
  3. 3.
    W3C: Extensible Markup Language (XML) 1.0, 5th edn., Recommendation (November 2008),
  4. 4.
    Vincent, M.W., Liu, J., Liu, C.: Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst. 29(3), 445–462 (2004)CrossRefGoogle Scholar
  5. 5.
    Yu, C., Jagadish, H.V.: Efficient discovery of XML data redundancies. In: Proc. VLDB 2006, pp. 103–114 (2006)Google Scholar
  6. 6.
    Yu, C., Jagadish, H.V.: XML schema refinement through redundancy detection and normalization. VLDB J. 17(2), 203–223 (2008)CrossRefGoogle Scholar
  7. 7.
    Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc. VLDB 1996, pp. 506–521 (1996)Google Scholar
  8. 8.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRefGoogle Scholar
  9. 9.
    W3C: XML Schema Part 2: Datatypes, 2nd edn., Recommendation (October 2004),
  10. 10.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefzbMATHGoogle Scholar
  11. 11.
    May, W.: Information Extraction and Integration with Florid: The Mondial Case Study,
  12. 12.
    Ley, M.: DBLP Bibliography,
  13. 13.
    Grahne, G., Zhu, J.: Discovering approximate keys in XML data. In: Proc. CIKM 2002, pp. 453–460 (2002)Google Scholar
  14. 14.
    Hartmann, S., Link, S.: Unlocking keys for XML trees. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 104–118. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Lee, M.L., Ling, T.W., Low, W.L.: Designing functional dependencies for XML. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 124–141. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Liu, J., Vincent, M.W., Liu, C.: Functional dependencies, from relational to XML. In: Broy, M., Zamulin, A.V. (eds.) PSI 2003. LNCS, vol. 2890, pp. 531–538. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Hartmann, S., Link, S.: More functional dependencies for XML. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 355–369. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Lv, T., Yan, P.: A survey study on XML functional dependencies. In: Proc. ISDPE 2007, pp. 143–145 (2007)Google Scholar
  19. 19.
    Fassetti, F., Fazzinga, B.: Approximate functional dependencies for XML data. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 86–95. Springer, Heidelberg (2007)Google Scholar
  20. 20.
    Shahriar, M.S., Liu, J.: On defining functional dependency for XML. In: Proc. IEEE ICSC 2009, pp. 595–600 (2009)Google Scholar
  21. 21.
    Zhao, X., Xin, J., Zhang, E.: XML functional dependency and schema normalization. In: Proc. HIS 2009, pp. 307–312 (2009)Google Scholar
  22. 22.
    Mok, W.Y., Ng, Y.-K., Embley, D.W.: A normal form for precisely characterizing redundancy in nested relations. ACM Trans. Database Syst. 21(1), 77–106 (1996)CrossRefGoogle Scholar
  23. 23.
    Arenas, M., Libkin, L.: An information-theoretic approach to normal forms for relational and XML data. In: Proc. PODS 2003, pp. 15–26 (2003)Google Scholar
  24. 24.
    Pankowski, T., Pilka, T.: Transformation of XML data into XML normal form. Informatica 33(4), 417–430 (2009)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: Proc. VLDB 2006, pp. 691–702 (2006)Google Scholar
  26. 26.
    Sarawagi, S., Agrawal, R., Gupta, A.: Research report on computing the data cube. Technical report, IBM Almaden Research CenterGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Hang Shi
    • 1
  • Toshiyuki Amagasa
    • 2
  • Hiroyuki Kitagawa
    • 2
  1. 1.Department of Computer ScienceGraduate School of Systems and Information EngineeringJapan
  2. 2.Center for Computational SciencesUniversity of TsukubaTsukubaJapan

Personalised recommendations