Abstract
The case-based reasoning community has extensively studied competence-based methods for case base compression. This work has focused on compressing a case base at a single point in time, under the assumption that the current case base provides a representative sample of cases to be seen. Large-scale streaming case sources present a new challenge for competence-based case deletion. First, in such contexts, it may be infeasible or too expensive to maintain more than a very small fraction of the overall set of cases, and the current system snapshot of the cases may not be representative of future cases, especially for domains with concept drift. Second, the interruption of processing required to compress the full case base may not be practical for large case bases in real-time streaming contexts. Consequently, such settings require maintenance methods enabling continuous incremental updates and robust to limited information. This paper presents research on addressing these problems through the use of sieve streaming, a submodular data summarization method developed for streaming data. It demonstrates how the approach enables the maintenance process to trade off between maintenance cost and competence retention and assesses its performance compared to baseline competence-based deletion methods for maintenance. Results support the benefit of the approach for large-scale streaming data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun. 7(1), 39–52 (1994)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 671–680. ACM (2014)
Cunningham, P., Nowlan, N., Delany, S., Haahr, M.: A case-based approach to spam filtering that can track concept drift. Technical report TCD-CS-2003-16, Computer Science Department, Trinity College Dublin (2003)
Delany, S.J., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. Artif. Intell. Rev. 24(3), 359–378 (2005)
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Kaul, M., Yang, B., Jensen, C.S.: Building accurate 3D spatial networks to enable next generation intelligent transportation systems. In: 2013 IEEE 14th International Conference on Mobile Data Management (MDM), vol. 1, pp. 137–146. IEEE (2013)
Krause, A., Gomes, R.G.: Budgeted nonparametric learning from data streams. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 391–398 (2010)
Leake, D.B., Wilson, D.C.: Categorizing case-base maintenance: dimensions and directions. In: Smyth, B., Cunningham, P. (eds.) EWCBR 1998. LNCS, vol. 1488, pp. 196–207. Springer, Heidelberg (1998). doi:10.1007/BFb0056333
Leake, D.B., Wilson, D.C.: Remembering why to remember: performance-guided case-base maintenance. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS, vol. 1898, pp. 161–172. Springer, Heidelberg (2000). doi:10.1007/3-540-44527-7_15
Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)
Redmond, S., Lovell, N., Yang, G., Horsch, A., Lukowicz, P., Murrugarra, L., Marschollek, M.: What does big data mean for wearable sensor systems? Yearb. Med. Inform. 9(1), 135–142 (2014)
Schlimmer, J.C., Granger, R.H.: Incremental learning from noisy data. Mach. Learn. 1(3), 317–354 (1986)
Smyth, B., Keane, M.: Remembering to forget: a competence-preserving case deletion policy for case-based reasoning systems. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 377–382. Morgan Kaufmann, San Mateo (1995)
Smyth, B., McKenna, E.: Building compact competent case-bases. In: Althoff, K.-D., Bergmann, R., Branting, L.K. (eds.) ICCBR 1999. LNCS, vol. 1650, pp. 329–342. Springer, Heidelberg (1999). doi:10.1007/3-540-48508-2_24
Smyth, B., McKenna, E.: Competence models and the maintenance problem. Comput. Intell. 17(2), 235–249 (2001)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. (TOMS) 11(1), 37–57 (1985)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Wilson, D., Leake, D.: Maintaining case-based reasoners: dimensions and directions. Comput. Intell. 17(2), 196–213 (2001)
Wilson, D., Martinez, T.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000). http://dx.doi.org/10.1023/A%3A1007626913721
Zhang, Y., Zhang, S., Leake, D.: Case-base maintenance: a streaming approach (2016)
Zhu, J., Yang, Q.: Remembering to add: competence-preserving case-addition policies for case base maintenance. In: Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 234–241. Morgan Kaufmann (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, Y., Zhang, S., Leake, D. (2017). Maintenance for Case Streams: A Streaming Approach to Competence-Based Deletion. In: Aha, D., Lieber, J. (eds) Case-Based Reasoning Research and Development. ICCBR 2017. Lecture Notes in Computer Science(), vol 10339. Springer, Cham. https://doi.org/10.1007/978-3-319-61030-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-61030-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61029-0
Online ISBN: 978-3-319-61030-6
eBook Packages: Computer ScienceComputer Science (R0)