Advertisement

Next Generation Data Mining Tools: Power Laws and Self-similarity for Graphs, Streams and Traditional Data

  • Christos Faloutsos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)

Abstract

What patterns can we find in a bursty web traffic? On the web or internet graph itself? How about the distributions of galaxies in the sky, or the distribution of a company’s customers in geographical space? How long should we expect a nearest-neighbor search to take, when there are 100 attributes per patient or customer record? The traditional assumptions (uniformity, independence, Poisson arrivals, Gaussian distributions), often fail miserably. Should we give up trying to find patterns in such settings?

Self-similarity, fractals and power laws are extremely successful in describing real datasets (coast-lines, rivers basins, stock-prices, brain-surfaces, communication-line noise, to name a few). We show some old and new successes, involving modeling of graph topologies (internet, web and social networks); modeling galaxy and video data; dimensionality reduction; and more.

Keywords

Fractal Dimension Mining Association Rule Iterate Function System Modeling Galaxy Internet Topology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. ACF+93.
    Arya, M., Cody, W., Faloutsos, C., Richardson, J., Toga, A.: QBISM: A prototype 3-D medical image database system. IEEE Data Engineering Bulletin 16(1), 38–42 (1993)Google Scholar
  2. AIS93.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD, Washington, DC, May 26-28, pp. 207–216 (1993)Google Scholar
  3. AJB99.
    Albert, R., Jeong, H., Barabasi, A.-L.: Diameter of the world-wide web. Nature 401, 130–131 (1999)CrossRefGoogle Scholar
  4. AS94.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of VLDB Conf., Santiago, Chile, September 12-15, pp. 487–499 (1994)Google Scholar
  5. Bak96.
    Bak, P.: How nature works: The science of self-organized criticality (September 1996)Google Scholar
  6. Bar02.
    Barabasi, A.-L.: Linked: The New Science of Networks, 1st edn. Perseus Publishing, Cambridge (2002)Google Scholar
  7. BBB+97.
    Berchtold, S., Boehm, C., Braunmueller, B., Keim, D.A., Kriegel, H.-P.: Fast similarity search in multimedia databases. In: SIGMOD Conference, pp. 1–12 (1997)Google Scholar
  8. BBKK97.
    Berchtold, S., Boehm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: PODS, pp. 78–86 (1997)Google Scholar
  9. BF95.
    Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: Proc. of VLDB, Zurich, Switzerland, September 1995, pp. 299–310 (1995)Google Scholar
  10. BJR94.
    Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)zbMATHGoogle Scholar
  11. BKM+00.
    Broder, A., Kumar, R., Maghoul1, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models. In: WWW Conf. (2000)Google Scholar
  12. BP98.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual (web) search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  13. BS88.
    Barnsley, M.F., Sloan, A.D.: A better way to compress images. Byte, 215–223 (January 1988)Google Scholar
  14. CE92.
    Castagli, M., Eubank, S.: Nonlinear Modeling and Forecasting. Addison-Wesley, Reading (1992)Google Scholar
  15. Chr84.
    Christodoulakis, S.: Implication of certain assumptions in data base performance evaluation. ACM TODS (June 1984)Google Scholar
  16. CSH+03.
    Chen, H., Schroeder, J., Hauck, R., Ridgeway, L., Atabaksh, H., Gupta, H., Boarman, C., Rasmussen, K., Clements, A.: Coplink connect: Information and knowledge management for law enforcement. CACM 46(1), 28–34 (2003)Google Scholar
  17. FFF99.
    Faloutsos, M., Faloutsos, P., Faloutsos, C.: On powerlaw relationships of the internet topology. In: SIGCOMM, pp. 251–262 (1999)Google Scholar
  18. FG96.
    Faloutsos, C., Gaede, V.: Analysis of the z-ordering method using the hausdorff fractal dimension. VLDB (September 1996)Google Scholar
  19. FK94.
    Faloutsos, C., Kamel, I.: Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In: Proc. ACM SIGACT-SIGMOD-SIGART PODS, Minneapolis, MN, May 24-26, pp. 4–13 (1994); Also available as CS-TR-3198, UMIACS-TR-93-130Google Scholar
  20. FRM94.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proc. ACM SIGMOD, Minneapolis, MN, May 25-27, pp. 419–429 (1994). ‘Best Paper’ award; also available as CS-TR-3190, UMIACS-TR-93-131, ISR TR-93-86.Google Scholar
  21. GGR02.
    Garofalakis, M.N., Gehrke, J., Rastogi, R.: Querying and mining data streams: You only get one look. ACM SIGMOD, p. 635 (June 2002) (tutorial)Google Scholar
  22. HS93.
    Hastings, H.M., Sugihara, G.: Fractals: A User’s Guide for the Natural Sciences. Oxford University Press, Oxford (1993)Google Scholar
  23. Kle99.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  24. LTWW94.
    Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the selfsimilar nature of ethernet traffic. IEEE Transactions on Networking 2(1), 1–15 (1994) (earlier version in SIGCOMM 1993, pp 183-193)CrossRefGoogle Scholar
  25. Man77.
    Mandelbrot, B.: Fractal Geometry of Nature. W.H. Freeman, New York (1977)Google Scholar
  26. MF01 .
    Montgomery, A.L., Faloutsos, C.: Identifying web browsing trends and patterns. IEEE Computer 34(7), 94–95 (2001)Google Scholar
  27. OJW03.
    Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. ACM SIGMOD (2003)Google Scholar
  28. PBF03.
    Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, hands-off stream mining. VLDB (September 2003)Google Scholar
  29. PF01.
    Proietti, G., Faloutsos, C.: Accurate modeling of region data. IEEE TKDE 13(6), 874–883 (2001)Google Scholar
  30. RD02.
    Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: SIGKDD, Edmonton, Canada, pp. 61–70 (2002)Google Scholar
  31. RFI02.
    Ripeanu, M., Foster, I., Iamnitchi, A.: Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing Journal 6(1) (2002)Google Scholar
  32. Sch91.
    Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, New York (1991)zbMATHGoogle Scholar
  33. TTPF01.
    Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-plots: Scalable tools for multidimensional data mining. KDD (August 2001)Google Scholar
  34. TTWF00.
    Traina, C., Traina, A., Wu, L., Faloutsos, C.: Fast feature selection using the fractal dimension. In: XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil (October 2000)Google Scholar
  35. WF02.
    Wu, L., Faloutsos, C.: Making every bit count: Fast nonlinear axis scaling. KDD (July 2002)Google Scholar
  36. WKE00.
    Wang, C., Knight, J.C., Elder, M.C.: On computer viral infection and the effect of immunization. In: ACSAC, pp. 246–256 (2000)Google Scholar
  37. Zip49.
    Zipf, G.K.: Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge (1949)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Christos Faloutsos
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburgh

Personalised recommendations