Skip to main content

Advertisement

Log in

A general framework for never-ending learning from time series streams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Time series classification has been an active area of research in the data mining community for over a decade, and significant progress has been made in the tractability and accuracy of learning. However, virtually all work assumes a one-time training session in which labeled examples of all the concepts to be learned are provided. This assumption may be valid in a handful of situations, but it does not hold in most medical and scientific applications where we initially may have only the vaguest understanding of what concepts can be learned. Based on this observation, we propose a never-ending learning framework for time series in which an agent examines an unbounded stream of data and occasionally asks a teacher (which may be a human or an algorithm) for a label. We demonstrate the utility of our ideas with experiments that consider real-world problems in domains as diverse as medicine, entomology, wildlife monitoring, and human behavior analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Notes

  1. For our purposes, a “never-ending” stream may only last for days or hours. The salient point is the contrast with the batch learning algorithms that the vast majority of time series papers consider (Ding et al. 2008).

  2. Recall from Sect. 3.3 that the Subsequence Processing Module may choose to discard a subsequence rather than pass it to Frequent Pattern Maintenance.

  3. And for some sexually dimorphic species such as mosquitoes, the sex.

  4. Dr. John Michael Criley, MD, FACC, MACP is Professor Emeritus at the David Geffen School of Medicine at UCLA.

  5. Usually the top thirteen coefficients are used for audio analysis. The first coefficient is a normalized energy parameter, which is not used for speech recognition (Mermelstein 1976).

References

  • Achtert E, Bohm C, Kriegel H-P, Kröger P (2005) Online hierarchical clustering in a data warehouse environment data mining. ICDM, pp 10–17

  • Ambert JD, Hodgman TP, Laurent EJ, Brewer GL, Iliff MJ, Dettmers R (2009) The northeast bird monitoring handbook. American Bird Conservancy, The Plains, VA

    Google Scholar 

  • Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

  • Bardeli R, Wolff D, Kurth F, Koch M, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31:1524–1534

    Article  Google Scholar 

  • Barrenetxea G et al (2008) Sensorscope: out-of-the-box environmental monitoring. In: IPSN, San Francisco

  • Batista G, Keogh E, Mafra-Neto A, Rowton E (2011) Sensors and software to allow computational entomology, an emerging application of data mining. In: KDD, pp 761–764

  • Berges M, Goldman E, Matthews HS, Soibelman L (2010) Enhancing electricity audits in residential buildings with non-intrusive load monitoring. J Ind Ecol 14(5):844–858

    Article  Google Scholar 

  • Berlin E, Laerhoven K (2012) Detecting leisure activities with dense motif discovery. In: Proceedings of the 2012 intl conference on uniquitous computing, pp 250–259

  • Borazio M, Laerhoven K (2012) Combining wearable and environmental sensing into an unobtrusive tool for long-term sleep studies. In: 2nd ACM SIGHIT

  • Campanharo ASLO, Sirer MI, Malgren RD, Ramos FM, Nunes LAN (2011) Duality between time series and networks. Plos One 6:e23378

    Article  Google Scholar 

  • Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka Jr ER, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: Proc’ AAAI

  • Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. InL Proceedings of the 29th ICALP international colloquium on automata, languages and programming, pp 693–703

  • Chen Y (2014) Project webpage. https://sites.google.com/site/nelframework/. Accessed 03 April 2014

  • Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E (2014) Flying insect classification with inexpensive sensors. J Insect Behav 27(5):657–677

    Article  Google Scholar 

  • Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498

  • Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20

    Article  Google Scholar 

  • Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: ICML, vol 95, pp 150–157

  • Dawson DK, Efford MG (2009) Bird Population Density Estimated from Acoustic Signals. Journal of Applied Ecology. 46(6):1201–1209

    Article  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data. Experimental comparison of representations and distance measures. PVLDB 1(2):1542–1552

  • Dodge Y (2003) Oxford dictionary of statistical terms. OUP, Oxford. ISBN 0-19-850994-4

    MATH  Google Scholar 

  • Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. KDD, pp 213–220

  • Estan C, Varghese G (2003) New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans Comput Syst 21(3):270–313

    Article  Google Scholar 

  • Ferreira F, Bota D, Bross A, Mélot C, Vincent J (2001) Serial evaluation of the SOFA score to predict outcome in critically ill patients. JAMA 286(14):1754–1758

    Article  Google Scholar 

  • Fujii A, Tokunaga T, Inui K, Tanaka H (1998) Selective sampling for example-based word sense disambiguation. Comput Linguist 24(4):573–597

    Google Scholar 

  • Goldberger A et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):215–220

    Article  Google Scholar 

  • Goldberger A et al (2013) Physionet. http://physionet.ph.biu.ac.il/physiobank/database/chfdb/. Accessed 04 Feb 2013

  • Google Prediction API. https://developers.google.com/prediction/docs/pricing. Accessed 31 Jul 2013

  • Gupta S, Reynolds S, Patel SN (2010) ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In: Proceedings of the conference on ubiquitous computing

  • Hinman J, Hickey E (2009) Modeling and forecasting short-term electricity load using regression analysis. Working paper, Illinois State University, Normal (US), Fall

  • Holyoak DT (2001) Nightjars and their allies: the caprimulgiformes. Oxford University Press, Oxford, New York. ISBN 0-19-854987-3

    Google Scholar 

  • Hu B, Chen Y, Keogh EJ (2013) Time series classification under more realistic assumptions. In: SDM

  • Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th ACM CIKM international conference on information and knowledge management, pp 287–294

  • Jin S, Chen Z, Backus E, Sun X, Xiao B (2012) Characterization of EPG waveforms for the tea green leafhopper on tea plants and their correlation with stylet activities. J Insect Physiol 58:1235–1244

    Article  Google Scholar 

  • Karp R, Papadimitriou C, Shenker S (2003) A simple algorithm for finding frequent elements in sets and bags. ACM Trans Database Syst 28:51–55

    Article  Google Scholar 

  • Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

  • Kolter J, Jaakkola T (2012) Approximate inference in additive factorial HMMs with application to energy disaggregation. J Mach Learn Res 22:1472–1482

    Google Scholar 

  • Krishnamurthy A, Balakrishnan S, Xu M, Singh A (2012) Efficient active algorithms for hierarchical clustering. arXiv:1206.4672

  • Lines J, Bagnall A, Caiger-Smith P, Anderson S (2011) Classification of household devices by electricity usage profiles. In: IDEAL, pp 403–412

  • MacLeod J, Greene T, MacKenzie DI, Allen RB (2012) Monitoring widespread and common bird species on New Zealand’s conservation lands: a pilot study. N Z J Ecol 36(3):300–311

    Google Scholar 

  • Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: International conference on very large databases, pp 346–357

  • Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. In: Chen CH (ed) Pattern recognition and artificial intelligence. Academic Press, New York

    Google Scholar 

  • Metwally A, Agrawal D, Abbadi AE (2005) Efficient computation of frequent and top-k elements in data streams. In: International conference on database theory

  • Mitchell L (1981) Time segregated mosquito collections with a CDC miniature light trap. Mosquito News 42:12

    Google Scholar 

  • Morales L, Arbetman MP, Cameron SA, Aizen MA (2013) Rapid ecological replacement of a native bumble bee by invasive species. Frontiers Ecol Environ 11:529–534

    Article  Google Scholar 

  • Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD, pp 1154–1162

  • Mueen A, Keogh EJ (2010) Online discovery and maintenance of time series motifs. In: KDD, pp 1089–1098

  • Nassar S, Sander J, Cheng C (2004) Incremental and effective data summarization for dynamic hierarchical clustering. In: SIGMOD Conference, pp 467–478

  • Norris JR (1998) Markov chains. Cambridge university press, Cambridge

    MATH  Google Scholar 

  • PAMAP (2012) Physical activity monitoring for aging people. www.pamap.org/demo.html. Retrieved 2012

  • Robbins CS (1981) Effect of time of day on bird activity. Stud Avian Biol 6:275–286

    Google Scholar 

  • Roggen D et al (2012) Collecting complex activity data sets in highly rich networked sensor environments. In: Proc’ 7th IEEE INSS, pp 233–240

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q et al (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD intersnational conference on knowledge discovery and data mining. ACM, New York, pp 262–270

  • Rowling JK (1997) Harry Potter and the chamber of secrets. Levine Books (scholastics), New York, Read by Stephan Fry, Arthur A

  • Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York. ISBN 0-07-054484-0

    Google Scholar 

  • Settles B (2012) Active learning. Morgan & Claypool, San Rafael

    MATH  Google Scholar 

  • Settles B, Craven M, Friedland L (2008) Active learning with real annotation costs. In: Proceedings of the NIPS workshop on cost-sensitive learning

  • Shrivastava N, Buragohain C, Agrawal D, Suri S (2004) Medians and beyond: new aggregation techniques for sensor networks. In: ACM SenSys

  • Stikic M, Huynh T, Laerhoven KV, Schiele B (2008) ADL recognition based on the combination of RFID and accelerometer sensing. In: PervasiveHealth, pp 258–263

  • Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186

    Article  Google Scholar 

  • Van Rijsbergen J (1979) Information retrieval, 2nd edn. Butterworths, London

    Google Scholar 

  • Veeraraghavan A, Chellappa R, Roy-Chowdhury AK (2006) The function space of an activity. In: Computer vision and pattern recognition

  • Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber HJ (2010) Evidence for a bimodal distribution in human communication. Proc Natl Acad Sci USA 107:18803–18808

    Article  Google Scholar 

  • Yu H (2005) SVM selective sampling for ranking with application to data retrieval. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York

  • Zhai S, Kristensson PO, Appert C, Anderson TH, Cao X (2012) Foundational issues in touch-surface stroke gesture design-an integrative review. Found Trends Hum Comput Interact 5(2):97–205

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanping Chen.

Additional information

Responsible editor: Thomas Seidl.

Appendix 1: the relationship between buffer size and pattern discovery time

Appendix 1: the relationship between buffer size and pattern discovery time

In the main text, we provided an equation to calculate the probability of discovering a repeated pattern within \(n\) steps when the buffer size is \(w\), but did not discuss how we derived the equation. We relegated the derivation to this appendix to enhance the flow of the paper.

1.1 Theoretical analysis

To calculate the probability of seeing at least two examples of the pattern in the buffer with no more than \(n\) steps when the buffer size is \(w\) under the assumptions given in the main text (c.f. Sect. 3.5), we model the problem with the classic urn and ball choice model (Dodge 2003) as follows.

An infinitely large urn contains colored balls in the ratio of one red ball to ninety-nine blue balls. At each time step, a ball is randomly sampled from the urn, and put into a box. The size of the box is \(w\). When the box is full, we randomly discard a ball from the box, and replace it with the new sampled ball. The question at hand is: what are the chances of seeing at least two red balls in the box within \(n\) samplings when the box size is \(w\)?

This model simulates our problem by having the red ball represent the pattern, and the blue balls represent random data. The box simulates the buffer, and the data stream is generated through sampling the balls at each time step. The pattern is discovered when exactly two red balls are seen in the box (recall that we stop when we see two red balls, so we will never see three or more), so the goal is to figure out the probability of discovering the pattern within \(n\) steps when the box size is \(w\).

According to the model, the probability of seeing at least two red balls in the next step is dependent on the current state of the box and independent of the previous states. For example, it is possible to see two red balls in the next step only when the box currently contains at least one red ball; and given the current state of balls in the box, the probability of seeing two red balls in the next step is independent of how many red balls there were previously. As such, this problem is a typical Markov process and is modeled using the Markov Chain (Norris 1998) as follows.

We take as states the number of red balls in the boxes \(\hbox {S}_{0}\), \(\hbox {S}_{1}\) and \(\hbox {S}_{2}\), where \(\hbox {S}_{0}/\hbox {S}_{1}\) denotes that there is zero/one red ball in the box, and \(\hbox {S}_{2}\) denotes that there are at least two red balls. For simplicity, we explicitly assume that the box is full of blue balls at the beginning; that is, the initial state is \(\hbox {S}_{0}\). As such, the initial probability vector \(u\) is as follows:

$$\begin{aligned} u=\left[ {{ \begin{array}{ccc} 1&{} 0&{} 0 \\ \end{array}}}\right] \end{aligned}$$

We need to specify the transition matrix. The conditions that are required to transition from one state to another are listed in Table 13, where ‘impossible’ means with zero probability and ‘certainty’ means with hundred percent probability.

Table 13 Conditions for transitioning states

At each step, the probability of sampling a red ball from the urn is 1/100, and of sampling a blue ball is 99/100; the probability of discarding a red ball from the box is \(\frac{n_{red}}{w}\), and of discarding a blue ball is \(\frac{w-n_{red}}{w}\), where \(n_{red}\) is the number of red balls currently in the box. Based on the conditions in Table 13, we can compute the transition probabilities and derive the transition matrix \(T\) as follows:

$$\begin{aligned} T= \left[ {{ \begin{array}{ccc} {\frac{99}{100}}&{} {\frac{1}{100}}&{} 0 \\ {\frac{1}{w}*\frac{99}{100}}&{} {\frac{1}{w}*\frac{1}{100}+\frac{w-1}{w}*\frac{99}{100}}&{} {\frac{w-1}{w}*\frac{1}{100}} \\ 0&{} 0&{} 1 \\ \end{array}}}\right] \end{aligned}$$

According to the Markov Chain, the probability of seeing at least two red balls in the box (in state \(S_{2}\)) after \(n\) steps is the \(3^{\mathrm{rd}}\) entry in the vector \(u^{(n)}=uT^{n}\), and thus, we have:

$$\begin{aligned} Pr(w,n)=\left( {\left[ {{ \begin{array}{ccc} 1&{} 0&{} 0 \\ \end{array}}}\right] \left[ {{\begin{array}{ccc} {\frac{99}{100}}&{} {\frac{1}{100}}&{} 0 \\ {\frac{1}{w}*\frac{99}{100}}&{} {\frac{1}{w}*\frac{1}{100}+\frac{w-1}{w}*\frac{99}{100}}&{} {\frac{w-1}{w}*\frac{1}{100}} \\ 0&{} 0&{} 1 \\ \end{array}}}\right] ^{n}}\right) (1,3) \end{aligned}$$
(1)

which is the equation we provided in the main text.

To verify our analysis, we also performed an empirical study with simulations. We did simulations for different values of \(w\), and compare the simulation results with the theoretical results calculated using Eq. 1. Figure 32 shows the comparison results for \(w = 10\). As can be seen, the two results are essentially identical.

Fig. 32
figure 32

Comparison of simulation results (red/bold) with theoretical results (green/thin) for \(w =10\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Hao, Y., Rakthanmanon, T. et al. A general framework for never-ending learning from time series streams. Data Min Knowl Disc 29, 1622–1664 (2015). https://doi.org/10.1007/s10618-014-0388-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0388-4

Keywords

Navigation