Advertisement

Validating an Access Cost Model for Wide Area Applications

  • Vladimir Zadorozhny
  • Louiqa Raschid
  • Tao Zhan
  • Laura Bright
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2172)

Abstract

In this paper, we describe a case study in developing an access cost model for WebSources in the context of a wrapper mediator architecture. We document our experiences in validating this model, and note successes and lessons learned. Using experimental data of query feedback from severalWebSources, we develop a Catalog and Access Cost model. We identify WebSource characteristics of the query feedback that are reflective of the particular WebSource behavior and identify groupings of WebSources based on these characteristics. We also characterize the Access Cost model as having High or Low Prediction Accuracy, with respect to its ability to predict access costs for the WebSources. We then correlate WebSource characteristics and groupings of WebSources with High or Low prediction accuracy of the model.

Keywords

Prediction Accuracy Online Learning Cost Model Query Response Time Very Large Data Base 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    S. Adali et al. Query caching and optimization in distributed mediator systems. Proc. of the ACM Sigmod Conf., 1996. 372, 376Google Scholar
  3. 3.
    A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence databank and its supplement TrEMBL. Nucleic Acids Res, 1(27):49–54, January 1999. http://www.expasy.ch/sprot.CrossRefGoogle Scholar
  4. 4.
    D. Benson, I. Karsch-Mizrachi, D. Lipman, J. Ostell, B. Rapp, and D. Wheeler. GenBank. Nucleic Acids Res, 1(28):15–8, January 2000. http://www.ncbi.nlm.nih.gov/Genbank.CrossRefGoogle Scholar
  5. 5.
    L. Bright, J-R Gruser, L. Raschid, and M. E. Vidal. A wrapper generation toolkit to specify and construct wrappers for web accessible data sources (websources). Journal of Computer Systems Science & Engineering. Special Issue: Semantics on the World Wide Web, 14(2), March 1999.Google Scholar
  6. 6.
    L. Bright, L. Raschid, V. Zadorozhny, and T. Zhan. A comparison of a web prediction tool and a neural network in learning response times for websources using query feedback. Proceedings of the International Conference on Cooperative Information Systems, 1999.Google Scholar
  7. 7.
    FAA Aviation Safety Data. http://nasdac.faa.gov/internet/. 376
  8. 8.
    EPA Toxic Releases Inventory Database. http://www.epa.gov/enviro/html/tris/tris query java.html. 376
  9. 9.
    W. Du et al. Query optimization in a heterogeneous dbms. Proc. of the Very Large Data Bases Conference (VLDB), 1992. 372Google Scholar
  10. 10.
    Landings Aviation Search Engines. http://www.landings.com/ landings/pages/search.html. 376
  11. 11.
    P. Francis, S. Jamin, V. Paxson, L. Zhang, D. Gryniewicz, and Y. Jin. An architecture for a global internet host distance estimation service. In Proceedings of IEEE InfoComm, 1999. 372Google Scholar
  12. 12.
    G. Gardarin et al. IRO-DB: A Distributed System Federating Object and Relational Databases, In Object-Oriented Multidatabase Systems: A solution for Advanced Applications, Bukhres, O. and Elmagarmid, A. Prentice Hall, 1996. 372Google Scholar
  13. 13.
    GeneCards. http://bioinformatics.weizmann.ac.il/cards/. Weizmann Institute Genome Center and Bioinformatics Unit.
  14. 14.
    Open System Group. An explanation of the specweb96 benchmark. http://www.specbench.org/osg/web96/webpaper.html, 1996.
  15. 15.
    J. R. Gruser, L. Raschid, V. Zadorozhny, and T. Zhan. Learning response time for websources using query feedback and application in query optimization. To appear in the Very Large Data Base Journal, Special Issue on Databases and the Web. Mendelzon, A. and Atzeni, P., editors., 2000. 375Google Scholar
  16. 16.
    SAS Institute Inc. Sas(r) proprietary system for unix(r) environments, release 6.12 (ts060). 377Google Scholar
  17. 17.
    S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavin, and L. Zhang. On the placement of internet instrumentation. In Proceedings of IEEE InfoComm, 2000. 372Google Scholar
  18. 18.
    D. Karger, T. Leighton, D. Lewin, and A. Sherman. Web caching with consistent hashing. Proc. of WWW8, 1999. 372Google Scholar
  19. 19.
    ACM Digital Library. http://www.acm.org/dl/Search.html. 376
  20. 20.
    L. Haas M. Tork Roth, F. Ozcan. Cost models do matter: Providing cost information for diverse data sources in a federated system. Proc. of the Very Large Data Bases Conference (VLDB), 1999. 372Google Scholar
  21. 21.
    L. Ott. An Introduction to Statistical Methods and Data Analysis. PWS-Kent, 1984.Google Scholar
  22. 22.
    R. Ramakrishnan P. Seshadri, M. Livny. The case for enhanced abstract data types. Proc. of VLDB, 1997.Google Scholar
  23. 23.
    M. Rabinovich and A. Aggarwal. Radar: A scalable architecture for a global web hosting service. Proc. of WWW8, 1999. 372Google Scholar
  24. 24.
    M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: A novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics, July 1998. available at http://bioinformatics.weizmann.ac.il/cards/CABIOS paper.html.
  25. 25.
    A. Sayal, P. Scheuermann, and P. Vingralek. Selection algorithms for replicated web servers. Proc. of the Internet Server Performance Workshop (in conjunction with SIGMETRICS’98), 1998.Google Scholar
  26. 26.
    M. Stemm, S. Seshan, and R. Katz. A network measurement architecture for adaptive applications. In Proceedings of IEEE InfoComm, 2000. 372Google Scholar
  27. 27.
    Geographic Names Information System. http://mapping.usgs.gov/www/gnis/antform.html. 376
  28. 28.
    K. Thompson, G. Miller, and R. Wilder. Wide-area internet traffic patterns and characteristics. IEEE Network, November/December, 1997. 372Google Scholar
  29. 29.
    G. Trent and M. Sake. Webstone: The first generation in http server benchmarking. http://www.mindcraft.com/webstone/paper.html, 1995.
  30. 30.
    C. Wills and M. Mikhailov. Towards a better understanding of web resources and server responses for improved caching. Proc. of WWW8, 1999. 372Google Scholar
  31. 31.
    R. Wolski. Dynamically forecasting network performance to support dynamic scheduling using the network weather service. Proc. of the 6th High-Performance Distributed Computing Conference, 1997. 372Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Vladimir Zadorozhny
    • 1
  • Louiqa Raschid
    • 1
  • Tao Zhan
    • 1
  • Laura Bright
    • 1
  1. 1.University of MarylandCollege Park

Personalised recommendations