Finding It Now: Construction and Configuration of Networked Classifiers in Real-Time Stream Mining Systems

Ducasse, Raphaël; van der Schaar, Mihaela

doi:10.1007/978-1-4614-6859-2_4

Finding It Now: Construction and Configuration of Networked Classifiers in Real-Time Stream Mining Systems

Raphaël Ducasse⁵ &
Mihaela van der Schaar⁶

Chapter
First Online: 01 January 2013

6025 Accesses
2 Citations

Abstract

As data is becoming more and more prolific and complex, the ability to process it and extract valuable information has become a critical requirement. However, performing such signal processing tasks requires to solve multiple challenges. Indeed, information must frequently be extracted (a) from many distinct data streams, (b) using limited resources, and (c) in real time to be of value. The aim of this chapter is to describe and optimize the specifications of signal processing systems, aimed at extracting in real time valuable information out of large-scale decentralized datasets. A first section will explain the motivations and stakes which have made stream mining a new and emerging field of research and describe key characteristics and challenges of stream mining applications. We then formalize an analytical framework which will be used to describe and optimize distributed stream mining knowledge extraction from large scale streams. In stream mining applications, classifiers are organized into a connected topology mapped onto a distributed infrastructure. We will study linear chains of classifiers and determine how the ordering of the classifiers in the chain impacts accuracy of classification and delay and determine how to choose the most suitable order of classifiers. Finally, we present a decentralized decision framework upon which distributed algorithms for joint topology construction and local classifier configuration can be constructed. Stream mining is an active field of research, at the crossing of various disciplines, including multimedia signal processing, distributed systems, machine learning etc. As such, we will indicate several areas for future research and development.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
As we will discuss later, there are two types of configuration choices we must make: the topological ordering of classifiers and the local operating points at each classifier
2.
For example, since the operating point p ^F = 0 corresponds to a saddle point of the utility function, it would achieve steepest utility slope. Furthermore, the slope of the DET curve is maximal at p ^F = 0 (due to concavity of the DET curve), such that high detection probabilities can be obtained under low false alarm probabilities near the origin.
3.
Furthermore, when classifiers are independent, the transition matrices \(T_{i}^{}\) are diagonal and therefore commute. As a consequence the end throughput t _N(x) and goodput g _N(x) are independent of the order. However, intermediate throughputs do depend on the ordering—leading to varying expected delays for the overall processing.
4.
Observe that for a perfect classifier (\(p_{\sigma (h)}^{D} = 1\) and \(p_{\sigma (h)}^{F} = 0\)), the a-priori conditional probability \(\phi _{h}^{\sigma }\) and the ex-post conditional probabilities \(\psi _{h}^{\sigma }\) are equal.
5.
t _i − 1 and g _i − 1 are not required since: \(\mathop{\mathrm{argmax}}\limits_{\ }\ \ U_{i} =\mathop{\mathrm{ argmax}}\limits_{\ }\ \frac{U_{i}} {g_{i-1}} =\) \(\left (-\left [\begin{array}{cccc} \rho _{i}&0\end{array} \right ]+\right.\) \(\left.\left [\begin{array}{cccc} v_{i+1} & w_{i+1} \end{array} \right ]T_{i}^{}\right )\left [\begin{array}{cccc} \theta _{i} \\ 1 \end{array} \right ]\).
6.
This can take τ > 5 min for seven classifiers.
7.
The utility parameters \(\left [\begin{array}{cccc} v_{j}&w_{j} \end{array} \right ]\) fed back from classifier C _j to classifier C _i are independent of any classifiers’ operating points.

References

Abadi, D., Carney, D., Centintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. The Very Large Databases Journal 12(2), 120–139 (2003)
Article Google Scholar
Amini, L., Andrade, H., Eskesen, F., King, R., Park, Y., Selo, P., Venkatramani, C.: The stream processing core. Technical Report RSC 23798 (2005)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 253–264 (2003)
Google Scholar
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: ACM SIGMOD International Conference on Management of Data (2004)
Google Scholar
Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press (1994)
Google Scholar
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M., Hellerstein, J., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proc. CIDR, pp. 668–668 (2003)
Google Scholar
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: Proc. of Conference on Innovative Data Systems Research, Asilomar (2003)
Google Scholar
Cherniack, M., Balakrishnan, H., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: Proc. CIDR (2003)
Google Scholar
Condon, A., Deshpande, A., Hellerstein, L., Wu, N.: Flow algorithm for two pipelined filter ordering problems. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2006)
Google Scholar
Douglis, F., Branson, M., Hildrum, K., Rong, B., Ye, F.: Multi-site cooperative data stream analysis. ACM SIGOPS 40(3) (2006)
Google Scholar
Ducasse, R., Turaga, D.S., van der Schaar, M.: Adaptive topologic optimization for large-scale stream mining. IEEE Journal on Selected Topics in Signal Processing 4(3), 620–636 (2010)
Article Google Scholar
Eide, V., Eliassen, F., Granmo, O., Lysne, O.: Supporting timeliness and accuracy in distributed real-time content-based video analysis. In: Proc. of 11th ACM International Conference on Multimedia, pp. 21–32 (2003)
Google Scholar
Foo, B., van der Schaar, M.: Distributed classifier chain optimization for real-time multimedia stream-mining systems. In: Proceedings IS&T / SPIE Multimedia Content Access, Algorithms and Systems II (2008)
Google Scholar
Fu, F., Turaga, D.S., Verscheure, O., van der Schaar, M., Amini, L.: Configuring competing classifier chains in distributed stream mining systems. IEEE Journal on Selected Topics in Signal Processing (2007)
Google Scholar
Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Resource-aware knowledge discovery in data streams. In: Proc. First Intl. Workshop on Knowledge Discovery in Data Streams (2004)
Google Scholar
Garg, A., Pavlovic, V.: Bayesian networks as ensemble of classifiers. In: Proc. 16th International Conference on Pattern Recognition (ICPR), pp. 779–784 (2002)
Google Scholar
Gupta, A., Smith, K., Shalley, C.: The interplay between exploration and exploitation. Academy of Management Journal (2006)
Google Scholar
Heckerman, D.: Bayesian networks for data mining. Data Mining and Knowledge Discovery 1(1) (1997)
Google Scholar
Hero, A., Kim, J.: Simultaneous signal detection and classification under a false alarm constraint. In: Proc. ICASSP, vol. 5, pp. 2759–2762 (1990)
Google Scholar
Hu, J., Wellman, M.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings of the Fifteenth International Conference on Machine Learning (1998)
Google Scholar
Low, S., Lapsley, D.E.: Optimization flow control I: Basic algorithm and convergence. IEEE/ACM Trans. Networking 7(6), 861–874 (1999)
Article Google Scholar
Marden, J., Young, H., Arslan, G., Shamma, J.: Payoff based dynamics for multi-player weakly acyclic games. SIAM Journal on Control and Optimization, special issue on Control and Optimization in Cooperative Networks (2007)
Google Scholar
Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: Proc. of 3rd International Conference on Management of Data, pp. 211–218 (2003)
Google Scholar
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G.S., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: Proc. CIDR (2003)
Google Scholar
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proc. ACM SIGMOD Intl. Conf. Management of Data, pp. 563–574 (2003)
Google Scholar
Palomar, D., Chiang, M.: On alternative decompositions and distributed algorithms for network utility problems. In: Proc. IEEE Globecom (2005)
Google Scholar
Park, H., Turaga, D.S., Verscheure, O., van der Schaar, M.: Foresighted tree configuring games in resource constrained distributed stream mining systems. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Process. (2009)
Google Scholar
Saul, L., Jordan, M.I.: Learning in Boltzman trees. Neural Computation (1994)
Google Scholar
Schapire, Y.: A brief introduction to boosting. In: Proc. International Conference on Algorithmic Learning Theory (1999)
Google Scholar
Tatbul, N., Cetintemel, U., Zdonik, S.B.: Staying fit: Efficient load shedding techniques for distributed stream processing. The Very Large Databases Journal (2007)
Google Scholar
Teng, W., Chen, M., Yu, P.: Resource-aware mining with variable granularities in data streams. In: Proc. ICDM (2004)
Google Scholar
Turaga, D., Verscheure, O., Chaudhari, U., Amini, L.: Resource management for networked classifiers in distributed stream mining systems. In: IEEE ICDM (2006)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)
Google Scholar
Varshney, P.: Distributed Detection and Data Fusion. Springer (1997). ISBN: 978-0-387-94712-9
Google Scholar
Vazirani, V.: Approximation Algorithms. Springer Verlag, Inc., New York, NY, USA (2001)
Google Scholar
Viglas, S., Naughton, J.: Rate-based query optimization for streaming information sources. In: Proc. ACM SIGMOD Intl. Conf. Management of Data (2002)
Google Scholar
Xing, Y., Zdonik, S., Hwang, J.H.: Dynamic load distribution in the Borealis stream processor. In: Proc. of the 21st International Conference on Data Engineering (ICDE’05). Tokyo, Japan (2005)
Google Scholar

Download references

Acknowledgements

This work is based upon work supported by the National Science Foundation under Grant No. 1016081. We would like to thank Dr. Deepak Turaga (IBM Research) for introducing us to the topic of stream mining and for many productive conversation associated with the material of this chapter as well as providing us with Figs. 1 and 3 of this chapter. We also would like to thank Dr. Fangwen Fu and Dr. Brian Foo, who have been PhD students in Prof. van der Schaar group and have made contributions to the area of stream mining from which this chapter benefited. Finally, we thank Mr. Siming Song for kindly helping us with formatting and polishing the final version of the chapter.

Author information

Authors and Affiliations

The Boston Consulting Group, Paris, France
Raphaël Ducasse
University of California, Los Angeles, CA, 90095, USA
Mihaela van der Schaar

Authors

Raphaël Ducasse
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela van der Schaar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raphaël Ducasse .

Editor information

Editors and Affiliations

, Dept. of Electrical and, University of Maryland, A. V. Williams Bldg. 2311, College Park, 20742, Maryland, USA
Shuvra S. Bhattacharyya
Leiden Inst. Advanced Computer Science, Leiden Embedded Research Center, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, Netherlands
Ed F. Deprettere
Software for Systems on Silicon, RWTH Aachen University, Templergraben 55, Aachen, 52056, Germany
Rainer Leupers
, Department of Pervasive Computing, Tampere University of Technology, Korkeakoulunkatu 1, Tampere, 33720, Finland
Jarmo Takala

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ducasse, R., van der Schaar, M. (2013). Finding It Now: Construction and Configuration of Networked Classifiers in Real-Time Stream Mining Systems. In: Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J. (eds) Handbook of Signal Processing Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6859-2_4

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6859-2_4
Published: 10 May 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6858-5
Online ISBN: 978-1-4614-6859-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics