Abstract
Automated tools for knowledge discovery are frequently invoked in databases where objects already group into some known classification scheme. In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are both meaningful and novel. A quantification of cluster novelty can be looked upon as the degree of separation between each new cluster and its most similar class. Our approach models each cluster and class as a Gaussian distribution and estimates the degree of overlap between both distributions by measuring their intersecting area. Unlike other metrics, our method quantifies the novelty of each cluster individually, and enables us to rank classes according to its similarity to each new cluster. We test our algorithm on Martian landscapes using a set of known classes called geological units; experimental results show a new interpretation for the characterization of Martian landscapes.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Byron, D.: An Information-Theoretic External Cluster-Validity Measure. Research Report, IBM T.J. Watson Research Center RJ 10219 (2001)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, Chichester (2001)
Fowlkes, E., Mallows, C.: A Method for Comparing Two Hierarchical Clusterings. Journal of American Statistical Association 78, 553–569 (1983)
Kanungo, T., Dom, B., Niblack, W., Steele, D.: A Fast Algorithm for MDL-Based Multi-Band Image Segmentation. In: Sanz, J. (ed.) Image Technology, Springer, Heidelberg (1996)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, Chichester (1997)
Milligan, G.W., Soon, S.C., Sokol, L.M.: The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure. IEEE Transactions on Patterns Analysis and Machine Intelligence 5(1), 40–47 (1983)
Rand, W.M.: Objective Criterion for Evaluation of Clustering Methods. Journal of American Statistical Association 66, 846–851 (1971)
Scott, D.H., Carr, M.H.: Geological Map of Mars. U.S.G.S. Misc Geol. Inv. Map I-1093 (1977)
Smith, D.E., et al.: Mars Orbiter Laser Altimeter: Experiment summary after the first year of global mapping of Mars. J. Geophys. Res. 106, 23,689–23,722 (2001)
Stepinski, T., Marinova, M.M., McGovern, P.J., Clifford, S.M.: Fractal Analysis of Drainage Basins on Mars. Geophysical Research Letters 29(8) (2002)
Vaithyanathan, S., Dom, B.: Model Selection in Unsupervised Learning with Applications to Document Clustering. In: Proceedings of the Sixteenth International Conference on Machine Learning, Stanford University, CA (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Academic Press, London (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilalta, R., Stepinski, T., Achari, M., Ocegueda-Hernandez, F. (2004). A Quantification of Cluster Novelty with an Application to Martian Topography. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive