Advertisement

Sequential Sampling Algorithms: Unified Analysis and Lower Bounds

  • Ricard Gavaldà
  • Osamu Watanabe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2264)

Abstract

Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential Sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.

Keywords

Random sampling sequential sampling adaptive sampling Chernoff bounds Data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    H. Cherno., A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23, pp.493–509, 1952.CrossRefMathSciNetGoogle Scholar
  2. 2.
    P. Dagum, R. Karp, M. Luby, and S. Ross, An optimal algorithm for monte carlo estimation, SIAM J. Comput. Vol. 29(5), pp.1484–1496, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp.317–328, 2000.Google Scholar
  4. 4.
    C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp.150–161, 1998.Google Scholar
  5. 5.
    C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive samplingmetho ds for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp.172–183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from http://www.is.titech.ac.jp/research/research-report/C/.
  6. 6.
    P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases, ACM Press, pp.71–80, 2000.Google Scholar
  7. 7.
    P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning, Morgan Kaufmann, pp.106–113, 2001.Google Scholar
  8. 8.
    W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.Google Scholar
  9. 9.
    B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation, Wiley, 1997.Google Scholar
  10. 10.
    P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report, RJ 9101 (80915), 1992.Google Scholar
  11. 11.
    W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58, pp.13–30, 1963.zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.367–370, 1996.Google Scholar
  13. 13.
    J. Kivinen and H. Mannila, The power of samplingin knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS’94), ACM Press, pp.77–85, 1994.Google Scholar
  14. 14.
    R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri, Efficient sampling strategies for relational database operations, Theoretical Computer Science 116, pp.195–226, 1993.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    R.J. Lipton and J.F. Naughton, Query size estimation by adaptive sampling, Journal of Computer and System Science 51, pp.18–25, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99), ACM Press, pp.260–267, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Ricard Gavaldà
    • 1
  • Osamu Watanabe
    • 2
  1. 1.BarcelonaSpain
  2. 2.Tokyo Institute of TechnologyTokyoJapan

Personalised recommendations