Skip to main content

How to find big-oh in your data set (and how not to)

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Abstract

The empirical curve bounding problem is defined as follows. Suppose data vectors X, Y are presented such that E(Y[i])=f(X[i]) where f(x) is an unknown function. The problem is to analyze X, Y and obtain complexity bounds O(g u (x)) and Ω(g i (x)) on the function −f(x). As no algorithm for empirical curve bounding can be guaranteed correct, we consider heuristics. Five heuristic algorithms are presented here, together with analytical results guaranteeing correctness for certain families of functions. Experimental evaluations of the correctness and tightness of bounds obtained by the rules for several constructed functions f(x) and real datasets are described. A hybrid method is shown to have very good performance on some kinds of functions, suggesting a general, iterative refinement procedure in which diagnostic features of the results of applying particular methods can be used to select additional methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. C. Atkinson (1987) Plots, Transformations and Regression: an Introduction to Graphical Methods of Diagnostic Regression Analysis, Oxford Science.

    Google Scholar 

  2. R. A. Becker, J. A. Chambers, and A. R. Wilks (1988) The New S Language: A Programming Enviornment for Data Analysis and Graphics, Wadsworth & Brooks/Cole.

    Google Scholar 

  3. J. L. Bentley, D. S. Johnson, F. T. Leighton, and C. C. McGeoch (1983) “An experimental study of bin packing,” Proceedings of the 21st Allerton Conference on Communication, Control, and Computing, University of Illinois, Urbana-Champaign. pp 51–60.

    Google Scholar 

  4. J. L. Bentley, D. S. Johnson, C. C. McGeoch and L. A. McGeoch (1984). “Some unexpected expected behavior results for bin packing,” Proceedings of the 16th Symposium on Theory of Computing, ACM, NY. pp 279–298.

    Google Scholar 

  5. G. P. Box, W. G. Hunter, and J. S. Hunter (1978) Statistics for Experimenters, Wiley & Sons.

    Google Scholar 

  6. J. M. Chambers et al. (1983) Graphical Methods for Data Analysis, Duxbury Press.

    Google Scholar 

  7. P. R. Cohen (1995) Empirical Methods for Artificial Intelligence, the MIT Press.

    Google Scholar 

  8. T. Cormen, C. Leiserson and R. Rivest (1990) Introduction to Algorithms, the MIT Press.

    Google Scholar 

  9. D. E. Knuth (1981), The Art of Computer Programming: Vol. 3 Sorting and Searching, Addison Wesley.

    Google Scholar 

  10. C. C. McGeoch (1992), “Analyzing algorithms by simulation: Variance reduction techniques and simulation speedups,” ACM Computing Surveys. (245)2, pp. 195–212.

    Article  Google Scholar 

  11. C. C. McGeoch (1995) “All pairs shortest paths and the essential subgraph,” Algorithmica (13), pp. 426–441.

    Article  MATH  MathSciNet  Google Scholar 

  12. J. O. Rawlings (1988) Applied Regression Analysis: A Research Tool, Wadsworth & Brooks/Cole.

    Google Scholar 

  13. C. Schaffer (1990) Domain-Independent Scientific Function Finding, Ph.D. Thesis, Technical Report LCSR-TR-149, Department of Computer Science, Rutgers University.

    Google Scholar 

  14. R. Sedgewick (1975), Quicksort. Ph. D. Thesis, Stanford University.

    Google Scholar 

  15. J. Soer and R. Bulirsch (1993) Introduction to Numerical Analysis, Springer-Verlag.

    Google Scholar 

  16. J. W. Tukey (1977) Exploratory Data Analysis, Addison-Wesley.

    Google Scholar 

  17. L. Weisner (1938) Introduction to the Theory of Equations., Macmillan.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag

About this paper

Cite this paper

McGeoch, C.C., Precup, D., Cohen, P.R. (1997). How to find big-oh in your data set (and how not to). In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052828

Download citation

  • DOI: https://doi.org/10.1007/BFb0052828

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63346-4

  • Online ISBN: 978-3-540-69520-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics