Abstract
This chapter describes a work that uses the background knowledge of the clustering algorithms previously presented in the book to focus on two distinct data mining tasks-the tasks of labeling and summarizing large sets of complex data. Given a large collection of complex objects, very few of which have labels, how can we guess the labels of the remaining majority, and how can we spot those objects that may need brand new labels, different from the existing ones? The work presented here provides answers to these questions. Specifically, this chapter describes in detail QMAS [2], one third algorithm that focuses on data mining in large sets of complex data, which is a fast and scalable solution to the problem of automatically analyzing, labeling and understanding this kind of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The data is publicly available at: âgeoeye.comâ.
- 2.
References
Bhattacharya, A., Ljosa, V., Pan, J.Y., Verardo, M.R., Yang, H.J., Faloutsos, C., Singh, A.K.: Vivo: visual vocabulary construction for mining biomedical images. In: ICDM, pp. 50â57. IEEE Computer Society (2005)
Cordeiro, R.L.F., Guo, F., Haverkamp, D.S., Horne, J.H., Hughes, E.K., Kim, G., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Qmas: querying, mining and summarization of multi-modal databases. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM, pp. 785â790. IEEE Computer Society (2010)
Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr., C.: Finding clusters in subspaces of very large, multi-dimensional datasets. In: Li, F., Moro, M.M., Ghandeharizadeh, S., Haritsa, J.R., Weikum, G., Carey, M.J., Casati, F., Chang, E.Y., Manolescu, I., Mehrotra, S., Dayal, U., Tsotras, V.J. (eds.) ICDE, pp. 625â636. IEEE (2010)
Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina Jr., C.: Halite: fast and scalable multiresolution local-correlation clustering. IEEE Trans. Knowl. Data Eng. 99(PrePrints) (2011). doi:10.1109/TKDE.2011.176
Gibson, L., Lucas, D.: Spatial data processing using generalized balanced ternary. In: IEEE conference on pattern recognition and image analysis (1982)
Golub, G.H., Van Loan, C.F.: Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore, USA (1996)
Huang, K., Murphy, R.F.: From quantitative microscopy to automated image understanding. J. Biomed. Optics 9, 893â912 (2004)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans Info Theory 28(2), 129â137 (1982). doi:10.1109/TIT.1982.1056489
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281â297. University of California Press, California (1967)
Pan, J.Y., Balan, A.G.R., Xing, E.P., Traina, A.J.M., Faloutsos, C.: Automatic mining of fruit fly embryo images. KDD pp. 693â698 (2006)
Steinhaus, H.: Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci. 1, 801â804 (1956). (in French)
Zhang, B., Hsu, M., Dayal, U.: K-harmonic meansâa spatial clustering algorithm with boosting. In: Roddick, J.F., Hornsby, K. (eds.) TSDM, lecture notes in computer science, vol. 2007, pp. 31â45. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Cordeiro, R.L., Faloutsos, C., Traina JĂșnior, C. (2013). QMAS. In: Data Mining in Large Sets of Complex Data. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-4890-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4890-6_6
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4889-0
Online ISBN: 978-1-4471-4890-6
eBook Packages: Computer ScienceComputer Science (R0)