Journal of Intelligent Information Systems

, Volume 26, Issue 1, pp 25–40 | Cite as

Fuzzy semantic tagging and flexible querying of XML documents extracted from the Web

  • Patrice Buche
  • Juliette Dibie-Barthélemy
  • Ollivier Haemmerlé
  • Gaëlle Hignette


The relational database model is widely used in real applications. We propose a way of complementing such a database with an XML data warehouse. The approach we propose is generic, and driven by a domain ontology. The XML data warehouse is built from data extracted from the Web, which are semantically tagged using terms belonging to the domain ontology. The semantic tagging is fuzzy, since, instead of tagging the values of the Web document with one value of the domain ontology, we propose to use tags expressed in terms of a possibility distribution representing a set of possible terms, each term being weighted by a possibility degree. The querying of the XML data warehouse is also fuzzy: the end-users can express their preferences by means of fuzzy selection criteria. We present our approach on a first application domain: predictive microbiology.


Flexible querying Semantic tagging Fuzzy data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aguilé ra, V., Cluet, S., Vetri, P., Vodislav, D., & Wattez, F. (2000). Querying the XML documents on the Web. In Proceedings of the ACMSIGIR Workshop on XML and I.R., Athens, July 2000.Google Scholar
  2. Bosc, P., Lietard, L., & Pivert, O. (1994). Soft querying, a new feature for database management system. In Proceedings DEXA'94 (Database and EXpert system Application), Lecture Notes in Computer Science #856 (pp. 631–640). Springer-Verlag.Google Scholar
  3. Bosc, P., Lietard, L., & Pivert, O. (1999). Fuzziness in D atabase M anagement S ystems, chapter Fuzzy theory techniques and applications in data-base management systems, (pp. 666–671). Academic Press.Google Scholar
  4. Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1995). Measurement in information science. New York: Academic Press.Google Scholar
  5. Bosc, P., & Pivert, O. (1995). SQL f: A relational database language for fuzzy querying. IEEE Transactions on Fuzzy Systems, 3(1), 1–17.CrossRefMathSciNetGoogle Scholar
  6. Bordogna, G., & Pasi, G. (1999). A fuzzy object oriented data model managing vague and uncertain information. International Journal of Intelligent Systems, 14(6), SCI 3495.Google Scholar
  7. Bordogna, G. & Pasi, G. (2001). Modeling vagueness in information retrieval. In Proceedings of ESSIR 2000, Lecture Notes in Computer Science #1980, (pp. 207–241).Google Scholar
  8. Bordogna, G., & Pasi, G, (2002). Flexible querying of web documents. In Proceedings of the ACM Symposium Applied Computing, (pp. 675–680). Madrid, Spain, 2002.Google Scholar
  9. Buche, P., Dervin, C., Haemmerlé, O., & Thomopoulos, R. (2005). Fuzzy querying of incomplete, imprecise and heterogeneously structured data in the relational model using ontologies and rules. IEEE Transactions on Fuzzy Systems, 13(3), 373–383.CrossRefGoogle Scholar
  10. De Cock, M., Guadarrama, S., & Nikravesh, M. (2004). Fuzzy thesauri for and from the www. In M. Nikravesh, L. Zadeh, J. Kacprzyk (Eds.), soft computing for information processing and Analysis, (pp. 275–284).Google Scholar
  11. Dubois, D., & Prade, H. (1988). Possibility theory—An approach to computerized processing of uncertainty. New York: Plenum Press.Google Scholar
  12. Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing and Management, 38, 823–848.CrossRefGoogle Scholar
  13. Hignette, G., Buche, P., Dibie-Barthélemy, J., & Haemmerlé, O. (2005). Fuzzy semantic annotation of xml documents. In E. T. J. Castro (Ed.), In Proceedings of CAiSE'05 Workshops. The 17th conference on advanced information systems engineering, DisWeb'05, (pp. 319–332). Porto, Portugal, 2005. FEUP edicoes.Google Scholar
  14. Lin, Dekang, (1998). An information-theoretic definition of similarity. In ICML '98: Proceedings of the Fifteenth International Conference on Machine Learning (pp. 296–304). San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.Google Scholar
  15. Miyamoto, S. (1990). Information retrieval based on fuzzy associations. Fuzzy Sets and Systems, 38, 191–205.CrossRefMATHGoogle Scholar
  16. Prade, H. (1984). Lipski's approach to incomplete information data bases restated and generalized in the setting of Z adeh's possibility theory. Information Systems, 9(1), 27–42.CrossRefMATHMathSciNetGoogle Scholar
  17. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.CrossRefGoogle Scholar
  18. Salton, G., & Gill, M.J.Mc. (1987). Introduction to modern information retrieval. New York: Mc Graw-Hill.Google Scholar
  19. Saïs, F., Gagliardi, H., Haemmerlé, O., & Pernelle, N., janvier (2005). Enrichissement sémantique de documents SML représentant des tableaux. In Actes des 5émes journÈes Extraction et Gestion des Connaissances, EGC'2005, Revue des Nouvelles Technologies de l'Information—RNTI, (pp. 407–419), Paris, France, Janvier 2005.Google Scholar
  20. Spark Jones, K. A. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–20.Google Scholar
  21. Xyleme, Lucie, (2001). A dynamic warehouse for xml data of the web. IEEE Data Engineering Bulletin.Google Scholar
  22. Yager, R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetics, 18(1), 183–190.CrossRefMATHMathSciNetGoogle Scholar
  23. Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338–353.CrossRefMATHMathSciNetGoogle Scholar
  24. Zadeh, L., (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3–28.CrossRefMATHMathSciNetGoogle Scholar
  25. Zadeh, L. A. (1983). A computational approach to fuzzy quantifiers in natural languages. Computing and Mathematics with Applications, 9, 149–184.MATHMathSciNetGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Patrice Buche
    • 1
  • Juliette Dibie-Barthélemy
    • 1
  • Ollivier Haemmerlé
    • 2
  • Gaëlle Hignette
    • 1
  1. 1.INRA, Département Mathématiques et Informatique AppliquéesUnité Mét@riskParis, Cedex 05
  2. 2.GRIMM-ISYCOM,Département de Mathématiques-InformatiqueUniversité de Toulouse le MirailToulouse Cedex

Personalised recommendations