Abstract
Classifying large datasets without any a-priori information poses a problem in many tasks. Especially in the field of bioinformatics, often huge unlabeled datasets have to be explored mostly manually by a biology expert. In this work we consider an application that is motivated by the development of high-throughput microscope screening cameras. These devices are able to produce hundreds of thousands of images per day. We propose a new adaptive active classification scheme which establishes ties between the two opposing concepts of unsupervised clustering of the underlying data and the supervised task of classification. Based on Fuzzy c-means clustering and Learning Vector Quantization, the scheme allows for an initial clustering of large datasets and subsequently for the adjustment of the classification based on a small number of carefully chosen examples. Motivated by the concept of active learning, the learner tries to query the most informative examples in the learning process and therefore keeps the costs for supervision at a low level. We compare our approach to Learning Vector Quantization with random selection and Support Vector Machines with Active Learning on several datasets.
Chapter PDF
Similar content being viewed by others
References
Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2004) (2004)
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Cohn, D., Ghahramani, Z., Jordan, M.: Active learning with statistical models. Advances in Neural Information Processing Systems 7, 705–712 (1995)
Cohn, D.A., Atlas, L., Ladner, R.E.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)
Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12(11), 657–664 (1991)
Gabrys, B., Petrakieva, L.: Combining labelled and unlabelled data in the design of pattern classification systems. International Journal of Approximate Reasoning (2004)
Grira, N., Crucianu, M., Boujemaa, N.: Active semi-supervised clustering for image database categorization. Content-Based Multimedia Indexing (2005)
Hochbaum, Shmoys: A best possible heuristic for the k-center problem. Mathematics of Operations Research 10(2), 180–184 (1985)
Jantzen, J., Norup, J., Dounias, G., Bjerregaard3, B.: Pap-smear benchmark data for pattern classification (2006)
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Luo, T., Kramer, K., Goldgof, D., Hall, L., Samson, S., Remsen, A., Hopkins, T.: Active learning to recognize multiple types of plankton. Journal of Machine Learning Research, 589–613 (2005)
Nguyen, H., Smeulders, A.: Active learning using pre-clustering. In: ICML (2004)
Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 330–337 (2005)
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Transactions on systems, man and cybernetics —Part B: Cybernetics 27, 177–185 (1997)
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: ICMLProceedings, 17th International Conference on Machine Learning, pp. 839–846 (2000)
Wang, L., Chan, K.L., Zhang, Z.h.: Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 629–634 (2003)
Warmuth, M.K., Raetsch, G., Mathieson, M., Liao, J., Lemmen, C.: Support vector machines for active learning in the drug discovery process. Journal of Chemical Information Sciences, 667–673 (2003)
Windham, M.: Cluster validity for fuzzy clustering algorithms. Fuzzy Sets and Systems 5, 177–185 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cebron, N., Berthold, M.R. (2006). Adaptive Active Classification of Cell Assay Images. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_12
Download citation
DOI: https://doi.org/10.1007/11871637_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)