Advertisement

Computational Statistics

, Volume 16, Issue 3, pp 465–479 | Cite as

Analyzing XploRe Download Profiles with Intelligent Miner

  • Hizir Sofyan
  • Axel Werwatz
Article

Summary

This paper is an example of data mining in action. The database we are mining contains 1085 profiles of individuals who have downloaded the statistical software XploRe. Each profile contains the responses to an online questionnaire comprised of questions about such things as an individuals’ computing preferences (operating system, favorite statistical software) or professional affiliation. After formatting and cleaning the raw data using MS Excel, we use IBM’s Intelligent Miner to perform a cluster analysis of the download profiles. We try to identify a small number of “types” of users by employing a clustering algorithm based on the New Condorcet Criterion, which is particularly well-suited for our all-categorical data. We identify three clusters in the mining run to which we refer as Academia, Unix/Linux users and Researchers, respectively. Based on the characteristics of the cluster members, we briefly outline how the results of the data analysis may be used for targeted marketing of XploRe.

Keywords

Data Mining Cluster Analysis 

References

  1. Chen, M. S., Han, J., & Yu, P. S. (1996). Data Mining: an Overview from a Database Perspective, IEEE Trans. on Knowledge and Data Engineering, 8:866–883.CrossRefGoogle Scholar
  2. Ester, M, Kriegel, H., Sander, J., & Xu, X. (1996). A Density Based Algorith for Discovering Clusters in large Spatial Databases with Noise, Proc. of Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon.Google Scholar
  3. Gordon, A. D. (1999). Classification, Chapman and Hall, 2nd ed., London.Google Scholar
  4. Grabmeier, J. & Rudolph, A. (1998). Techniques of Cluster Algorithms in Data Mining, Technical Report IBM, http://www.ibm.com/software/data/iminer/fordata/clusttechn.pdf.
  5. Guha. S, Rastogi. R, & Shim. K (1998). CURE: An efficient clustering algorithm for large databases, Proc. of ACM SIGMOD Int’l Conf. on Management of Data, New York, pp. 73–84.Google Scholar
  6. Ha, S. H. & Park, S. C. (1998). Application of data mining tools to hotel data mart on the Intranet for database marketing, Expert System with Application, 15:1–31.CrossRefGoogle Scholar
  7. Härdle, W., Klinke, S., & Müller, M. (1999). XploRe Learning Guide, Springer Verlag, Heidelberg.CrossRefGoogle Scholar
  8. Michaud, P. (1987). Condorcet — A man of the Avant-garde, Applied Stochastic Models and Data Analysis, 3:173–189.CrossRefGoogle Scholar
  9. Michaud, P. (1997). Clustering Techniques, Future Generation Computer Systems, 13:135–147.CrossRefGoogle Scholar
  10. Ng, R.T, & Han, J. (1994). Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. of the 20th Int’l Conf. on Very large databases, Santiago, Chile, pp.144–155.Google Scholar
  11. Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. of the 1996 ACM SIGMOD Int’l Conf. on Management of Data, Montreal, Canada, pp. 103–114.Google Scholar

Copyright information

© Physica-Verlag 2001

Authors and Affiliations

  • Hizir Sofyan
    • 1
  • Axel Werwatz
    • 1
  1. 1.Institut für Statistik und ÖkonometrieHumboldt Universität zu BerlinBerlinGermany

Personalised recommendations