A general framework for never-ending learning from time series streams

Chen, Yanping; Hao, Yuan; Rakthanmanon, Thanawin; Zakaria, Jesin; Hu, Bing; Keogh, Eamonn

doi:10.1007/s10618-014-0388-4

A general framework for never-ending learning from time series streams

Published: 12 October 2014

Volume 29, pages 1622–1664, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Yanping Chen¹,
Yuan Hao¹,
Thanawin Rakthanmanon²,
Jesin Zakaria¹,
Bing Hu¹ &
…
Eamonn Keogh¹

1726 Accesses
28 Citations
Explore all metrics

Abstract

Time series classification has been an active area of research in the data mining community for over a decade, and significant progress has been made in the tractability and accuracy of learning. However, virtually all work assumes a one-time training session in which labeled examples of all the concepts to be learned are provided. This assumption may be valid in a handful of situations, but it does not hold in most medical and scientific applications where we initially may have only the vaguest understanding of what concepts can be learned. Based on this observation, we propose a never-ending learning framework for time series in which an agent examines an unbounded stream of data and occasionally asks a teacher (which may be a human or an algorithm) for a label. We demonstrate the utility of our ideas with experiments that consider real-world problems in domains as diverse as medicine, entomology, wildlife monitoring, and human behavior analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing the contrast profile: a novel time series primitive that allows real world classification

Article 17 March 2022

Ryan Mercer, Sara Alaee, … Eamonn Keogh

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

Early Classification of Time Series as a Non Myopic Sequential Decision Making Problem

Notes

For our purposes, a “never-ending” stream may only last for days or hours. The salient point is the contrast with the batch learning algorithms that the vast majority of time series papers consider (Ding et al. 2008).
Recall from Sect. 3.3 that the Subsequence Processing Module may choose to discard a subsequence rather than pass it to Frequent Pattern Maintenance.
And for some sexually dimorphic species such as mosquitoes, the sex.
Dr. John Michael Criley, MD, FACC, MACP is Professor Emeritus at the David Geffen School of Medicine at UCLA.
Usually the top thirteen coefficients are used for audio analysis. The first coefficient is a normalized energy parameter, which is not used for speech recognition (Mermelstein 1976).

References

Achtert E, Bohm C, Kriegel H-P, Kröger P (2005) Online hierarchical clustering in a data warehouse environment data mining. ICDM, pp 10–17
Ambert JD, Hodgman TP, Laurent EJ, Brewer GL, Iliff MJ, Dettmers R (2009) The northeast bird monitoring handbook. American Bird Conservancy, The Plains, VA
Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption
Bardeli R, Wolff D, Kurth F, Koch M, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31:1524–1534
Article Google Scholar
Barrenetxea G et al (2008) Sensorscope: out-of-the-box environmental monitoring. In: IPSN, San Francisco
Batista G, Keogh E, Mafra-Neto A, Rowton E (2011) Sensors and software to allow computational entomology, an emerging application of data mining. In: KDD, pp 761–764
Berges M, Goldman E, Matthews HS, Soibelman L (2010) Enhancing electricity audits in residential buildings with non-intrusive load monitoring. J Ind Ecol 14(5):844–858
Article Google Scholar
Berlin E, Laerhoven K (2012) Detecting leisure activities with dense motif discovery. In: Proceedings of the 2012 intl conference on uniquitous computing, pp 250–259
Borazio M, Laerhoven K (2012) Combining wearable and environmental sensing into an unobtrusive tool for long-term sleep studies. In: 2nd ACM SIGHIT
Campanharo ASLO, Sirer MI, Malgren RD, Ramos FM, Nunes LAN (2011) Duality between time series and networks. Plos One 6:e23378
Article Google Scholar
Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka Jr ER, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: Proc’ AAAI
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. InL Proceedings of the 29th ICALP international colloquium on automata, languages and programming, pp 693–703
Chen Y (2014) Project webpage. https://sites.google.com/site/nelframework/. Accessed 03 April 2014
Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E (2014) Flying insect classification with inexpensive sensors. J Insect Behav 27(5):657–677
Article Google Scholar
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498
Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20
Article Google Scholar
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: ICML, vol 95, pp 150–157
Dawson DK, Efford MG (2009) Bird Population Density Estimated from Acoustic Signals. Journal of Applied Ecology. 46(6):1201–1209
Article Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data. Experimental comparison of representations and distance measures. PVLDB 1(2):1542–1552
Dodge Y (2003) Oxford dictionary of statistical terms. OUP, Oxford. ISBN 0-19-850994-4
MATH Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. KDD, pp 213–220
Estan C, Varghese G (2003) New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans Comput Syst 21(3):270–313
Article Google Scholar
Ferreira F, Bota D, Bross A, Mélot C, Vincent J (2001) Serial evaluation of the SOFA score to predict outcome in critically ill patients. JAMA 286(14):1754–1758
Article Google Scholar
Fujii A, Tokunaga T, Inui K, Tanaka H (1998) Selective sampling for example-based word sense disambiguation. Comput Linguist 24(4):573–597
Google Scholar
Goldberger A et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):215–220
Article Google Scholar
Goldberger A et al (2013) Physionet. http://physionet.ph.biu.ac.il/physiobank/database/chfdb/. Accessed 04 Feb 2013
Google Prediction API. https://developers.google.com/prediction/docs/pricing. Accessed 31 Jul 2013
Gupta S, Reynolds S, Patel SN (2010) ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In: Proceedings of the conference on ubiquitous computing
Hinman J, Hickey E (2009) Modeling and forecasting short-term electricity load using regression analysis. Working paper, Illinois State University, Normal (US), Fall
Holyoak DT (2001) Nightjars and their allies: the caprimulgiformes. Oxford University Press, Oxford, New York. ISBN 0-19-854987-3
Google Scholar
Hu B, Chen Y, Keogh EJ (2013) Time series classification under more realistic assumptions. In: SDM
Jin C, Qian W, Sha C, Yu J, Zhou A (2003) Dynamically maintaining frequent items over a data stream. In: Proceedings of the 12th ACM CIKM international conference on information and knowledge management, pp 287–294
Jin S, Chen Z, Backus E, Sun X, Xiao B (2012) Characterization of EPG waveforms for the tea green leafhopper on tea plants and their correlation with stylet activities. J Insect Physiol 58:1235–1244
Article Google Scholar
Karp R, Papadimitriou C, Shenker S (2003) A simple algorithm for finding frequent elements in sets and bags. ACM Trans Database Syst 28:51–55
Article Google Scholar
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Kolter J, Jaakkola T (2012) Approximate inference in additive factorial HMMs with application to energy disaggregation. J Mach Learn Res 22:1472–1482
Google Scholar
Krishnamurthy A, Balakrishnan S, Xu M, Singh A (2012) Efficient active algorithms for hierarchical clustering. arXiv:1206.4672
Lines J, Bagnall A, Caiger-Smith P, Anderson S (2011) Classification of household devices by electricity usage profiles. In: IDEAL, pp 403–412
MacLeod J, Greene T, MacKenzie DI, Allen RB (2012) Monitoring widespread and common bird species on New Zealand’s conservation lands: a pilot study. N Z J Ecol 36(3):300–311
Google Scholar
Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: International conference on very large databases, pp 346–357
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. In: Chen CH (ed) Pattern recognition and artificial intelligence. Academic Press, New York
Google Scholar
Metwally A, Agrawal D, Abbadi AE (2005) Efficient computation of frequent and top-k elements in data streams. In: International conference on database theory
Mitchell L (1981) Time segregated mosquito collections with a CDC miniature light trap. Mosquito News 42:12
Google Scholar
Morales L, Arbetman MP, Cameron SA, Aizen MA (2013) Rapid ecological replacement of a native bumble bee by invasive species. Frontiers Ecol Environ 11:529–534
Article Google Scholar
Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: KDD, pp 1154–1162
Mueen A, Keogh EJ (2010) Online discovery and maintenance of time series motifs. In: KDD, pp 1089–1098
Nassar S, Sander J, Cheng C (2004) Incremental and effective data summarization for dynamic hierarchical clustering. In: SIGMOD Conference, pp 467–478
Norris JR (1998) Markov chains. Cambridge university press, Cambridge
MATH Google Scholar
PAMAP (2012) Physical activity monitoring for aging people. www.pamap.org/demo.html. Retrieved 2012
Robbins CS (1981) Effect of time of day on bird activity. Stud Avian Biol 6:275–286
Google Scholar
Roggen D et al (2012) Collecting complex activity data sets in highly rich networked sensor environments. In: Proc’ 7th IEEE INSS, pp 233–240
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q et al (2012) Searching and mining trillions of time series subsequences under dynamic time warping. Proceedings of the 18th ACM SIGKDD intersnational conference on knowledge discovery and data mining. ACM, New York, pp 262–270
Rowling JK (1997) Harry Potter and the chamber of secrets. Levine Books (scholastics), New York, Read by Stephan Fry, Arthur A
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York. ISBN 0-07-054484-0
Google Scholar
Settles B (2012) Active learning. Morgan & Claypool, San Rafael
MATH Google Scholar
Settles B, Craven M, Friedland L (2008) Active learning with real annotation costs. In: Proceedings of the NIPS workshop on cost-sensitive learning
Shrivastava N, Buragohain C, Agrawal D, Suri S (2004) Medians and beyond: new aggregation techniques for sensor networks. In: ACM SenSys
Stikic M, Huynh T, Laerhoven KV, Schiele B (2008) ADL recognition based on the combination of RFID and accelerometer sensing. In: PervasiveHealth, pp 258–263
Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186
Article Google Scholar
Van Rijsbergen J (1979) Information retrieval, 2nd edn. Butterworths, London
Google Scholar
Veeraraghavan A, Chellappa R, Roy-Chowdhury AK (2006) The function space of an activity. In: Computer vision and pattern recognition
Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber HJ (2010) Evidence for a bimodal distribution in human communication. Proc Natl Acad Sci USA 107:18803–18808
Article Google Scholar
Yu H (2005) SVM selective sampling for ranking with application to data retrieval. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York
Zhai S, Kristensson PO, Appert C, Anderson TH, Cao X (2012) Foundational issues in touch-surface stroke gesture design-an integrative review. Found Trends Hum Comput Interact 5(2):97–205
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of California, Riverside, Riverside, CA, USA
Yanping Chen, Yuan Hao, Jesin Zakaria, Bing Hu & Eamonn Keogh
Department of Computer Science & Engineering, Kasetsart University, Bangkok, Thailand
Thanawin Rakthanmanon

Authors

Yanping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Hao
View author publications
You can also search for this author in PubMed Google Scholar
Thanawin Rakthanmanon
View author publications
You can also search for this author in PubMed Google Scholar
Jesin Zakaria
View author publications
You can also search for this author in PubMed Google Scholar
Bing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Eamonn Keogh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanping Chen.

Additional information

Responsible editor: Thomas Seidl.

Appendix 1: the relationship between buffer size and pattern discovery time

In the main text, we provided an equation to calculate the probability of discovering a repeated pattern within $n$ steps when the buffer size is $w$, but did not discuss how we derived the equation. We relegated the derivation to this appendix to enhance the flow of the paper.

1.1 Theoretical analysis

To calculate the probability of seeing at least two examples of the pattern in the buffer with no more than $n$ steps when the buffer size is $w$ under the assumptions given in the main text (c.f. Sect. 3.5), we model the problem with the classic urn and ball choice model (Dodge 2003) as follows.

An infinitely large urn contains colored balls in the ratio of one red ball to ninety-nine blue balls. At each time step, a ball is randomly sampled from the urn, and put into a box. The size of the box is $w$. When the box is full, we randomly discard a ball from the box, and replace it with the new sampled ball. The question at hand is: what are the chances of seeing at least two red balls in the box within $n$ samplings when the box size is $w$?

This model simulates our problem by having the red ball represent the pattern, and the blue balls represent random data. The box simulates the buffer, and the data stream is generated through sampling the balls at each time step. The pattern is discovered when exactly two red balls are seen in the box (recall that we stop when we see two red balls, so we will never see three or more), so the goal is to figure out the probability of discovering the pattern within $n$ steps when the box size is $w$.

According to the model, the probability of seeing at least two red balls in the next step is dependent on the current state of the box and independent of the previous states. For example, it is possible to see two red balls in the next step only when the box currently contains at least one red ball; and given the current state of balls in the box, the probability of seeing two red balls in the next step is independent of how many red balls there were previously. As such, this problem is a typical Markov process and is modeled using the Markov Chain (Norris 1998) as follows.

We take as states the number of red balls in the boxes $\hbox {S}_{0}$, $\hbox {S}_{1}$ and $\hbox {S}_{2}$, where $\hbox {S}_{0}/\hbox {S}_{1}$ denotes that there is zero/one red ball in the box, and $\hbox {S}_{2}$ denotes that there are at least two red balls. For simplicity, we explicitly assume that the box is full of blue balls at the beginning; that is, the initial state is $\hbox {S}_{0}$. As such, the initial probability vector $u$ is as follows:

$$\begin{aligned} u=\left[ {{ \begin{array}{ccc} 1&{} 0&{} 0 \\ \end{array}}}\right] \end{aligned}$$

We need to specify the transition matrix. The conditions that are required to transition from one state to another are listed in Table 13, where ‘impossible’ means with zero probability and ‘certainty’ means with hundred percent probability.

Table 13 Conditions for transitioning states

Full size table

At each step, the probability of sampling a red ball from the urn is 1/100, and of sampling a blue ball is 99/100; the probability of discarding a red ball from the box is $\frac{n_{red}}{w}$, and of discarding a blue ball is $\frac{w-n_{red}}{w}$, where $n_{red}$ is the number of red balls currently in the box. Based on the conditions in Table 13, we can compute the transition probabilities and derive the transition matrix $T$ as follows:

$$\begin{aligned} T= \left[ {{ \begin{array}{ccc} {\frac{99}{100}}&{} {\frac{1}{100}}&{} 0 \\ {\frac{1}{w}*\frac{99}{100}}&{} {\frac{1}{w}*\frac{1}{100}+\frac{w-1}{w}*\frac{99}{100}}&{} {\frac{w-1}{w}*\frac{1}{100}} \\ 0&{} 0&{} 1 \\ \end{array}}}\right] \end{aligned}$$

According to the Markov Chain, the probability of seeing at least two red balls in the box (in state $S_{2}$) after $n$ steps is the $3^{\mathrm{rd}}$ entry in the vector $u^{(n)}=uT^{n}$, and thus, we have:

$$\begin{aligned} Pr(w,n)=\left( {\left[ {{ \begin{array}{ccc} 1&{} 0&{} 0 \\ \end{array}}}\right] \left[ {{\begin{array}{ccc} {\frac{99}{100}}&{} {\frac{1}{100}}&{} 0 \\ {\frac{1}{w}*\frac{99}{100}}&{} {\frac{1}{w}*\frac{1}{100}+\frac{w-1}{w}*\frac{99}{100}}&{} {\frac{w-1}{w}*\frac{1}{100}} \\ 0&{} 0&{} 1 \\ \end{array}}}\right] ^{n}}\right) (1,3) \end{aligned}$$

(1)

which is the equation we provided in the main text.

To verify our analysis, we also performed an empirical study with simulations. We did simulations for different values of $w$, and compare the simulation results with the theoretical results calculated using Eq. 1. Figure 32 shows the comparison results for $w = 10$. As can be seen, the two results are essentially identical.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Hao, Y., Rakthanmanon, T. et al. A general framework for never-ending learning from time series streams. Data Min Knowl Disc 29, 1622–1664 (2015). https://doi.org/10.1007/s10618-014-0388-4

Download citation

Received: 16 August 2013
Accepted: 12 September 2014
Published: 12 October 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10618-014-0388-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A general framework for never-ending learning from time series streams

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

Early Classification of Time Series as a Non Myopic Sequential Decision Making Problem

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: the relationship between buffer size and pattern discovery time

1.1 Theoretical analysis

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A general framework for never-ending learning from time series streams

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

A Minimum Description Length Technique for Semi-Supervised Time Series Classification

Early Classification of Time Series as a Non Myopic Sequential Decision Making Problem

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: the relationship between buffer size and pattern discovery time

Appendix 1: the relationship between buffer size and pattern discovery time

1.1 Theoretical analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation