Sequential Sampling Algorithms: Unified Analysis and Lower Bounds
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential Sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.
KeywordsRandom sampling sequential sampling adaptive sampling Chernoff bounds Data mining
Unable to display preview. Download preview PDF.
- 3.C. Domingo and O. Watanabe, Scaling up a boosting-based learner via adaptive sampling, in Proc. of Knowledge Discovery and Data Mining (PAKDD’00), Lecture Notes in Artificial Intelligence 1805, Springer-Verlag, pp.317–328, 2000.Google Scholar
- 4.C. Domingo, R. Gavaldà, and O. Watanabe, Practical algorithms for on-line selection, in Proc. of the First Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1532, Springer-Verlag, pp.150–161, 1998.Google Scholar
- 5.C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive samplingmetho ds for scaling up knowledge discovery algorithms, in Proc. of the Second Intl. Conference on Discovery Science, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp.172–183, 1999. The final version will appear in J. Knowledge Discovery and Data Mining and is also available as research report C-136, Dept. of Math. and Computing Sciences, Tokyo Institute of Technology, from http://www.is.titech.ac.jp/research/research-report/C/.
- 6.P. Domingos and G. Hulten, Mining high-speed data streams, in Proc. 6th Intl. Conference on Knowledge Discovery in Databases, ACM Press, pp.71–80, 2000.Google Scholar
- 7.P. Domingos and G. Hulten, A general method for scaling up machine learning algorithms and its applications to clustering, in Proc. 8th Intl. Conference on Machine Learning, Morgan Kaufmann, pp.106–113, 2001.Google Scholar
- 8.W. Feller, An Introduction to Probability Theory and its Applications (Third Edition), John Wiley & Sons, 1968.Google Scholar
- 9.B.K. Ghosh, M. Mukhopadhyay, P.K. Sen, Sequential Estimation, Wiley, 1997.Google Scholar
- 10.P. Haas and A. Swami, Sequential sampling, procedures for query size estimation, IBM Research Report, RJ 9101 (80915), 1992.Google Scholar
- 12.G.H. John and P. Langley, Static versus dynamic sampling for data mining, in Proc. of the Second Intl. Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.367–370, 1996.Google Scholar
- 13.J. Kivinen and H. Mannila, The power of samplingin knowledge discovery, in Proc. of the 14th ACM SIGACT-SIGMOD-SIGACT Symposium on Principles of Database Systems (PODS’94), ACM Press, pp.77–85, 1994.Google Scholar
- 16.J.F. Lynch, Analysis and application of adaptive sampling, in Proc. of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99), ACM Press, pp.260–267, 1999.Google Scholar