Skip to main content

Aggregating Web Services with Active Invocation and Ensembles of String Distance Metrics

  • Conference paper
Engineering Knowledge in the Age of the Semantic Web (EKAW 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3257))

Abstract

The adoption of standards for exchanging information across the web could present a new world of opportunities for data integration and aggregation systems.Although Web Services simplify the discovery and access of information sources, the problem of semantic heterogeneity remains: how to find semantic correspondences within the data being integrated. In this paper, we propose OATS, a novel algorithm for schema matching that is specifically suited to Web Service data aggregation. We show how probing Web Services with a small set of related queries results in semantically correlated data instances which greatly simplifies the matching process, and demonstrate that the use of an ensemble of string distance metrics in matching data instances performs better than individual metrics. We also propose a method for adaptively combining distance metrics, and evaluate OATS on a large number of real-world Web Service operations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wiederhold, G.: Value-added mediation in large-scale information systems. DS-6, 34–56 (1995)

    Google Scholar 

  2. Levy, A.Y., Rajaraman, A., Ordille, J.J.: Query-answering algorithms for information agents. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pp. 40–47. AAAI Press / MIT Press (1996)

    Google Scholar 

  3. Rahm, E., Bernstein, P.: On matching schemas automatically. Technical report, Microsoft Research Technical Report (2001)

    Google Scholar 

  4. Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Cohen, W.W., Hirsh, H. (eds.) Proceedings of ICML 1994, 11th International Conference on Machine Learning, New Brunswick, US, pp. 148–156. Morgan Kaufmann Publishers, San Francisco (1994)

    Chapter  Google Scholar 

  5. Levenshtein, V.I.: Binary codes capable of correcting spurious insertions and deletions of ones (original in russian). Russian Problemy Peredachi Informatsii 1, 12–25 (1965)

    Google Scholar 

  6. Dietterich, T.: Ensemble methods in machine learning. In: Proc. 1st Int. Workshop on Multiple Classifier Systems (2000)

    Google Scholar 

  7. Embley, D.W., Jackman, D., Xu, L.: Multifaceted exploitation of metadata for attribute match discovery in information integration. In: Workshop on Information Integration on the Web, pp. 110–117 (2001)

    Google Scholar 

  8. Popa, L., Velegrakis, Y., Miller, R.J., Hernandez, M.A., Fagin, R.: Translating web data. In: Proceedings of VLDB 2002, pp. 598–609 (2002)

    Google Scholar 

  9. Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G., Melchiori, M., Vincini, M.: Information integration: The MOMIS project demonstration. The VLDB Journal, 611–614 (2000)

    Google Scholar 

  10. Do, H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches (2002)

    Google Scholar 

  11. Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: A multistrategy approach. Mach. Learn. 50, 279–301 (2003)

    Article  MATH  Google Scholar 

  12. Perkowitz, M., Etzioni, O.: Category translation: Learning to understand information on the internet. In: International Joint Conference on Artificial Intelligence, IJCAI 1995, Montreal, Canada (1995)

    Google Scholar 

  13. Kushmerick, N., Heß, A.: Learning to attach semantic metadata to web services. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 258–273. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Heß, A., Kushmerick, N.: Iterative ensemble classification for relational data: A case study of semantic web services. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 156–167. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  16. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI 2003 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 73–78 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Johnston, E., Kushmerick, N. (2004). Aggregating Web Services with Active Invocation and Ensembles of String Distance Metrics. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds) Engineering Knowledge in the Age of the Semantic Web. EKAW 2004. Lecture Notes in Computer Science(), vol 3257. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30202-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30202-5_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23340-4

  • Online ISBN: 978-3-540-30202-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics