Advertisement

Journal of Computer Science and Technology

, Volume 23, Issue 4, pp 497–515 | Cite as

PGG: An Online Pattern Based Approach for Stream Variation Management

  • Lu-An Tang
  • Bin Cui
  • Hong-Yan LiEmail author
  • Gao-Shan Miao
  • Dong-Qing Yang
  • Xin-Biao Zhou
Regular Paper

Abstract

Many database applications require efficient processing of data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge of underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny diffeerences as time elapses. This feature, so called pseudo periodicity, poses a new challenge to stream variation management. This study focuses on the online management for variations over such streams. The idea can be applied to many scenarios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with following features: 1) adopts the wave-pattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wave-pattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence substantially compresses the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme.

Keywords

data stream noise reorganization pattern representation variation management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_9149_MOESM1_ESM.pdf (69 kb)
(PDF 69.2 kb)

References

  1. [1]
    Papadimitriou S, Yu P S. Optimal multi-scale patterns in time series streams. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, June 27–29, 2006, pp.647–658.Google Scholar
  2. [2]
    Aggarwal C C, Han J, Wang J, Yu P S. A framework for projected clustering of high dimensional data streams. In Proc. the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31–September 3, 2004, Vol.30, pp.852–863.Google Scholar
  3. [3]
    Wang H, Fan W, Yu P S, Han J. Mining concept-drifting data streams using ensemble classifiers. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'03, Washington D C, August 24–27, 2003, pp.226–235.Google Scholar
  4. [4]
    Babcock B, Datar M, Motwani R, O'Callaghan L. Maintaining variance and k-medians over data stream windows. In Proc. the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'03, San Diego, California, June 9–11, 2003, pp.234–243.Google Scholar
  5. [5]
    Wang H, Pei J, Yu P S. Online mining data streams: Problems, applications and progress. In Proc. the 21st International Conference on Data Engineering, ICDE'05, Tokyo, Japan, April 5–8, 2005.Google Scholar
  6. [6]
    Keogh E, Lin J, Fu A. HOT SAX: Efficiently finding the most unusual time series subsequence. In Proc. Fifth IEEE International Conference on Data Mining, Houston, Texas, USA, Nov. 2005, pp.27–30.Google Scholar
  7. [7]
    Keogh E, Lonardi S, Chiu B. Finding surprising patterns in a time series database in linear time and space. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 23–26, 2002, pp.550–556.Google Scholar
  8. [8]
    Wu H, Salzberg B, Zhang D. Online event-driven subsequence matching over financial data streams. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, June 13–18, 2004, pp.23–34.Google Scholar
  9. [9]
    Varon Joseph, Marik PE. Clinical information systems and the electronic medical record in the intensive care unit. Current Option in Critical Care, 2002, 8(6): 616–624.CrossRefGoogle Scholar
  10. [10]
    Zhou X, Miao G, Li H, Tang L,Wei X. PEDS-VM: A variation management prototype for pattern evolving data streams. In Proc. the Ninth International Conference on Web-Age Information Management, WAIM 08, Zhangjiajie, China, July 20–22, 2008. (To appear)Google Scholar
  11. [11]
    Tang L, Cui B, Li H, Miao G, Yang D, Zhou X. Effective variation management for pseudo periodical streams. In Proc. the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June 11–14, 2007, pp.257–268.Google Scholar
  12. [12]
    Papadimitriou S, Sun J, Faloutsos C. Streaming pattern discovery in multiple time-series. In Proc. the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 – September 02, 2005, pp.697–708.Google Scholar
  13. [13]
    Babu S, Widom J. Continuous queries over data streams. SIGMOD Rec., Sept. 2001, 30(3): 109–120.CrossRefGoogle Scholar
  14. [14]
    Abadi D, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S. Aurora: A new model and architecture for data stream management. The VLDB Journal, August 2003, 12(2): 120–139.CrossRefGoogle Scholar
  15. [15]
    Cortes C, Fisher K, Pregibon D, Rogers A. Hancock: A language for extracting signatures from data streams. In Proc. the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, United States, August 20–23, 2000, pp.9–17.Google Scholar
  16. [16]
    Chandrasekaran S, Cooper O, Deshpande A, Franklin M J, Hellerstein J M, Hong W, Krishnamurthy S, Madden S R, Reiss F, Shah M A. TelegraphCQ: Continuous dataflow processing. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, June 9–12, 2003, pp.668–668.Google Scholar
  17. [17]
    Sullivan M. Tribeca: A stream database manager for network traffic analysis. In Proc. the 22nd International Conference on Very Large Data Bases, San Francisco, CA, September 3–6, 1996, p.594.Google Scholar
  18. [18]
    Yao Y, Gehrke J. The cougar approach to in-network query processing in sensor networks. SIGMOD Rec., Sept. 2002, 31(3): 9–18.CrossRefGoogle Scholar
  19. [19]
    Cormode G, Datar M, Indyk P, Muthukrishnan S. Comparing data streams using Hamming norms (how to zero in). In Proc. the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 20–23, 2002, pp.335–345.Google Scholar
  20. [20]
    Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, January 6–8, 2002, pp.635–644.Google Scholar
  21. [21]
    Hu Z, Li H, Qiu B, Tang L, Fan Y, Liu H, Gao J, Zhou X. Using control theory to guide load shedding in medical data stream management system. In Proc. the 10th Asian Computing Science Conference, Advances in Computer Science, 2005, Kunming, China, LNCS 3818, pp.236–248.Google Scholar
  22. [22]
    Ganguly S, Garofalakis M, Rastogi R. Processing set expressions over continuous update streams. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, June 9–12, 2003, pp.265–276.Google Scholar
  23. [23]
    David J Fraenkel, Melleesa Cowie, Peter Daley. Quality benefits of an intensive care clinical information system. Crit. Care Medi., 2003, 31: 120–125.CrossRefGoogle Scholar
  24. [24]
    Axel Junger, Achim Michel et al. Evaluation of the suitability of a patient data management system for ICUs on a general ward. International Journal of Medical Informatics, 2001, 64: 57–66.CrossRefGoogle Scholar
  25. [25]
    Liu Y B, Cai J R, Yin J, Fu W A. Clustering text data streams. Journal of Computer Science and Technology, Jan. 2008, 23(1): 112–128.CrossRefGoogle Scholar
  26. [26]
    Hu X G, Li P P, Wu X D, Wu G Q. A semi-random multiple decision-tree algorithm for mining data streams. Journal of Computer Science and Technology, Sept. 2007, 22(5): 711–724.CrossRefGoogle Scholar
  27. [27]
    Chong Z H, Yu J X, Zhang Z J, Lin X M, Wang W, Zhou A Y. Efficient computation of k-medians over data streams under memory constraints. Journal of Computer Science and Technology, Mar. 2006, 21(2): 284–296.CrossRefGoogle Scholar
  28. [28]
    Chang J H, Lee W S. Effect of count estimation in finding frequent itemsets over online transactional data streams. Journal of Computer Science and Technology, Jan. 2005, 20(1): 63–69.CrossRefGoogle Scholar
  29. [29]
    Cai Y D, Clutter D, Pape G, Han J, Welge M, Auvil L. MAIDS: Mining alarming incidents from data streams. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, June 13–18, 2004, pp.919–920.Google Scholar
  30. [30]
    Teng W, Chen M, Yu P S. A regression-based temporal pattern mining scheme for data streams. In Proc. the 29th International Conference on Very Large Data Bases, Berlin, Germany, September 09–12, 2003, pp.93–104.Google Scholar
  31. [31]
    Zhu Y, Shasha D. Efficient elastic burst detection in data streams. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington D C, August 24–27, 2003, pp.336–345.Google Scholar
  32. [32]
    Ma J, Perkins S. Online novelty detection on temporal sequences. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington D C, August 24–27, 2003, pp.613–618.Google Scholar
  33. [33]
    Aggarwal C C. On abnormality detection in spuriously populated data streams. In Proc. SIAM International Conference on Data Mining, Newport Beach, CA, USA, 2005.Google Scholar
  34. [34]
    Lin J, Keogh E, Lonardi S, Chiu B. A symbolic representation of time series, with implications for streaming algorithms. In Proc. the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California, June 13, 2003, pp.2–11.Google Scholar
  35. [35]
    Gilbert A C, Kotidis Y, Muthukrishnan S et al. One-Pass Wavelet Decompositions of Data Streams. IEEE Trans. Knowl. and Data Eng., Mar. 2003, 15(3): 541–554.CrossRefGoogle Scholar
  36. [36]
    Papadimitriou S, Brockwell A, Faloutsos C. Adaptive, unsupervised stream mining. The VLDB Journal, Sept. 2004, 13(3): 222–239.CrossRefGoogle Scholar
  37. [37]
    Gao L, Wang X. Continuous similarity-based queries on streaming time series. IEEE Trans. Knowl. Data Eng., Oct. 2005, 17(10): 1320–1332.CrossRefMathSciNetGoogle Scholar
  38. [38]
    Wu H, Salzberg B, Zhang D. Online event-driven subsequence matching over financial data streams. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, June 13–18, 2004, pp.23–34.Google Scholar
  39. [39]
    Wu H, Sharp G, Salzberg B, Kaeli D, Shirato H, Jiang S. A finite state model for respiratory motion analysis in image guided radiation therapy. Physics in Medicine and Biology (PMB), 2004, 49(23): 5357–5372.CrossRefGoogle Scholar
  40. [40]
    Wu H, Salzberg B, Sharp G C, Jiang S B, Shirato H, Kaeli D. Subsequence matching on structured time series data. In Proc. the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, June 14–16, 2005, pp.682–693.Google Scholar
  41. [41]
    Aggarwal C C. A framework for diagnosing changes in evolving data streams. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, June 09–12, 2003, pp.575–586.Google Scholar
  42. [42]
    Wang H, Pei J. A random method for quantifying changing distributions in data streams. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Porto, Portugal, October 2005, pp.684–691.Google Scholar
  43. [43]
    Keogh E J, Chu S, Hart D, Pazzani M J. An online algorithm for segmenting time series. In Proc. the 2001 IEEE International Conference on Data Mining, Cercone N (ed), November 29–December 02, 2001, pp.289–296.Google Scholar
  44. [44]
  45. [45]

Copyright information

© Springer 2008

Authors and Affiliations

  • Lu-An Tang
    • 1
  • Bin Cui
    • 2
    • 4
  • Hong-Yan Li
    • 2
    • 3
    Email author
  • Gao-Shan Miao
    • 2
    • 3
  • Dong-Qing Yang
    • 2
    • 4
  • Xin-Biao Zhou
    • 2
    • 3
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaU.S.A.
  2. 2.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina
  3. 3.Key Laboratory of Machine Perception (Ministry of Education)Peking UniversityBeijingChina
  4. 4.Key Laboratory of High Confidence Software Technologies (Ministry of Education)Peking UniversityBeijingChina

Personalised recommendations