Abstract
The empirical curve bounding problem is defined as follows. Suppose data vectors X, Y are presented such that E(Y[i])=f(X[i]) where f(x) is an unknown function. The problem is to analyze X, Y and obtain complexity bounds O(g u (x)) and Ω(g i (x)) on the function −f(x). As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions f(x) and real datasets are described. A hybrid method is shown to have very good performance on some kinds of functions, suggesting a general, iterative refinement procedure in which diagnostic features of the results of applying particular methods can be used to select additional methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. C. Atkinson (1987) Plots, Transformations and Regression: an Introduction to Graphical Methods of Diagnostic Regression Analysis, Oxford Science.
R. A. Becker, J. A. Chambers, and A. R. Wilks (1988) The New S Language: A Programming Enviornment for Data Analysis and Graphics, Wadsworth & Brooks/Cole.
J. L. Bentley, D. S. Johnson, F. T. Leighton, and C. C. McGeoch (1983) “An experimental study of bin packing,” Proceedings of the 21st Allerton Conference on Communication, Control, and Computing, University of Illinois, Urbana-Champaign. pp 51–60.
J. L. Bentley, D. S. Johnson, C. C. McGeoch and L. A. McGeoch (1984). “Some unexpected expected behavior results for bin packing,” Proceedings of the 16th Symposium on Theory of Computing, ACM, NY. pp 279–298.
G. P. Box, W. G. Hunter, and J. S. Hunter (1978) Statistics for Experimenters, Wiley & Sons.
J. M. Chambers et al. (1983) Graphical Methods for Data Analysis, Duxbury Press.
P. R. Cohen (1995) Empirical Methods for Artificial Intelligence, the MIT Press.
T. Cormen, C. Leiserson and R. Rivest (1990) Introduction to Algorithms, the MIT Press.
D. E. Knuth (1981), The Art of Computer Programming: Vol. 3 Sorting and Searching, Addison Wesley.
C. C. McGeoch (1992), “Analyzing algorithms by simulation: Variance reduction techniques and simulation speedups,” ACM Computing Surveys. (245)2, pp. 195–212.
C. C. McGeoch (1995) “All pairs shortest paths and the essential subgraph,” Algorithmica (13), pp. 426–441.
J. O. Rawlings (1988) Applied Regression Analysis: A Research Tool, Wadsworth & Brooks/Cole.
C. Schaffer (1990) Domain-Independent Scientific Function Finding, Ph.D. Thesis, Technical Report LCSR-TR-149, Department of Computer Science, Rutgers University.
R. Sedgewick (1975), Quicksort. Ph. D. Thesis, Stanford University.
J. Soer and R. Bulirsch (1993) Introduction to Numerical Analysis, Springer-Verlag.
J. W. Tukey (1977) Exploratory Data Analysis, Addison-Wesley.
L. Weisner (1938) Introduction to the Theory of Equations., Macmillan.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag
About this paper
Cite this paper
McGeoch, C.C., Precup, D., Cohen, P.R. (1997). How to find big-oh in your data set (and how not to). In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052828
Download citation
DOI: https://doi.org/10.1007/BFb0052828
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive