Validating an Access Cost Model for Wide Area Applications
In this paper, we describe a case study in developing an access cost model for WebSources in the context of a wrapper mediator architecture. We document our experiences in validating this model, and note successes and lessons learned. Using experimental data of query feedback from severalWebSources, we develop a Catalog and Access Cost model. We identify WebSource characteristics of the query feedback that are reflective of the particular WebSource behavior and identify groupings of WebSources based on these characteristics. We also characterize the Access Cost model as having High or Low Prediction Accuracy, with respect to its ability to predict access costs for the WebSources. We then correlate WebSource characteristics and groupings of WebSources with High or Low prediction accuracy of the model.
KeywordsPrediction Accuracy Online Learning Cost Model Query Response Time Very Large Data Base
Unable to display preview. Download preview PDF.
- 1.CGIAR FishBase 99. http://www.cgiar.org/iclarm/fishbase/search.cfm. 376
- 2.S. Adali et al. Query caching and optimization in distributed mediator systems. Proc. of the ACM Sigmod Conf., 1996. 372, 376Google Scholar
- 5.L. Bright, J-R Gruser, L. Raschid, and M. E. Vidal. A wrapper generation toolkit to specify and construct wrappers for web accessible data sources (websources). Journal of Computer Systems Science & Engineering. Special Issue: Semantics on the World Wide Web, 14(2), March 1999.Google Scholar
- 6.L. Bright, L. Raschid, V. Zadorozhny, and T. Zhan. A comparison of a web prediction tool and a neural network in learning response times for websources using query feedback. Proceedings of the International Conference on Cooperative Information Systems, 1999.Google Scholar
- 7.FAA Aviation Safety Data. http://nasdac.faa.gov/internet/. 376
- 8.EPA Toxic Releases Inventory Database. http://www.epa.gov/enviro/html/tris/tris query java.html. 376
- 9.W. Du et al. Query optimization in a heterogeneous dbms. Proc. of the Very Large Data Bases Conference (VLDB), 1992. 372Google Scholar
- 10.Landings Aviation Search Engines. http://www.landings.com/ landings/pages/search.html. 376
- 11.P. Francis, S. Jamin, V. Paxson, L. Zhang, D. Gryniewicz, and Y. Jin. An architecture for a global internet host distance estimation service. In Proceedings of IEEE InfoComm, 1999. 372Google Scholar
- 12.G. Gardarin et al. IRO-DB: A Distributed System Federating Object and Relational Databases, In Object-Oriented Multidatabase Systems: A solution for Advanced Applications, Bukhres, O. and Elmagarmid, A. Prentice Hall, 1996. 372Google Scholar
- 13.GeneCards. http://bioinformatics.weizmann.ac.il/cards/. Weizmann Institute Genome Center and Bioinformatics Unit.
- 14.Open System Group. An explanation of the specweb96 benchmark. http://www.specbench.org/osg/web96/webpaper.html, 1996.
- 15.J. R. Gruser, L. Raschid, V. Zadorozhny, and T. Zhan. Learning response time for websources using query feedback and application in query optimization. To appear in the Very Large Data Base Journal, Special Issue on Databases and the Web. Mendelzon, A. and Atzeni, P., editors., 2000. 375Google Scholar
- 16.SAS Institute Inc. Sas(r) proprietary system for unix(r) environments, release 6.12 (ts060). 377Google Scholar
- 17.S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavin, and L. Zhang. On the placement of internet instrumentation. In Proceedings of IEEE InfoComm, 2000. 372Google Scholar
- 18.D. Karger, T. Leighton, D. Lewin, and A. Sherman. Web caching with consistent hashing. Proc. of WWW8, 1999. 372Google Scholar
- 19.ACM Digital Library. http://www.acm.org/dl/Search.html. 376
- 20.L. Haas M. Tork Roth, F. Ozcan. Cost models do matter: Providing cost information for diverse data sources in a federated system. Proc. of the Very Large Data Bases Conference (VLDB), 1999. 372Google Scholar
- 21.L. Ott. An Introduction to Statistical Methods and Data Analysis. PWS-Kent, 1984.Google Scholar
- 22.R. Ramakrishnan P. Seshadri, M. Livny. The case for enhanced abstract data types. Proc. of VLDB, 1997.Google Scholar
- 23.M. Rabinovich and A. Aggarwal. Radar: A scalable architecture for a global web hosting service. Proc. of WWW8, 1999. 372Google Scholar
- 24.M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: A novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics, July 1998. available at http://bioinformatics.weizmann.ac.il/cards/CABIOS paper.html.
- 25.A. Sayal, P. Scheuermann, and P. Vingralek. Selection algorithms for replicated web servers. Proc. of the Internet Server Performance Workshop (in conjunction with SIGMETRICS’98), 1998.Google Scholar
- 26.M. Stemm, S. Seshan, and R. Katz. A network measurement architecture for adaptive applications. In Proceedings of IEEE InfoComm, 2000. 372Google Scholar
- 27.Geographic Names Information System. http://mapping.usgs.gov/www/gnis/antform.html. 376
- 28.K. Thompson, G. Miller, and R. Wilder. Wide-area internet traffic patterns and characteristics. IEEE Network, November/December, 1997. 372Google Scholar
- 29.G. Trent and M. Sake. Webstone: The first generation in http server benchmarking. http://www.mindcraft.com/webstone/paper.html, 1995.
- 30.C. Wills and M. Mikhailov. Towards a better understanding of web resources and server responses for improved caching. Proc. of WWW8, 1999. 372Google Scholar
- 31.R. Wolski. Dynamically forecasting network performance to support dynamic scheduling using the network weather service. Proc. of the 6th High-Performance Distributed Computing Conference, 1997. 372Google Scholar