An Automated Report Generation Tool for the Data Understanding Phase

  • Juha Vesanto
  • Jaakko Hollmén
Part of the Advances in Soft Computing book series (AINSC, volume 14)


To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical measures with visualizations. Together these provide both quantitative and qualitative information and help the user to form a mental model of the data. The main focus is on describing the cluster structure and the contents of the clusters. The data is clustered using a novel algorithm based on the Self-Organizing Map. The rules describing the clusters are selected using a significance measure based on the confidence on their characterizing and discriminating properties.


Component Plane Linguistic Description Cluster Hierarchy Intelligent Data Analysis Association Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Esa Alhoniemi, Jaakko Hollmén, Olli Simula, and Juha Vesanto. Process Monitoring and Modeling Using the Self-Organizing Map. Integrated Computer-Aided Engineering, 6 (1): 3–14, 1999.Google Scholar
  2. 2.
    Stephen D. Bay and Michael J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5 (3): 213–246, July 2001.MATHCrossRefGoogle Scholar
  3. 3.
    Eric Boudaillier and Georges Hebrail. Interactive Interpretation of Hierarchical Clustering. Intelligent Data Analysis, 2 (3), August 1998.Google Scholar
  4. 4.
    Pete Chapman, Julian Clinton, Thomas Khabaza, Thomas Reinartz, and Rüdiger Wirth. The CRISP-DM process model. Technical report, CRISM-DM consortium, March 1999. Scholar
  5. 5.
    David L. Davies and Donald W. Bouldin. A Cluster Separation Measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-1(2): 224–227, April 1979.Google Scholar
  6. 6.
    Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, second edition, 2001.Google Scholar
  7. 7.
    Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim CURE: an efficient clustering algorithm for large databases. In Proceedings of SIGMOD International Conference on Management of Data,pages 73–84, New York, 1998. ACM.Google Scholar
  8. 8.
    Jiawei Han, Yandong Cai, and Nick Cercone. Knowledge discovery in databases: An attribute-oriented approach. In Li-Yan Yuan, editor, Proceedings of the 18th International Conference on Very Large Databases, pages 547–559, San Francisco, U.S.A., 1992. Morgan Kaufmann Publishers.Google Scholar
  9. 9.
    R. Hilderman and H. Hamilton. Knowledge discovery and interestingness measures: A survey. Technical Report CS 99–04, Department of Computer Science, University of Regina, October 1999.Google Scholar
  10. 10.
    Johan Himberg. A SOM based cluster visualization and its application for false coloring. In Proceedings of International Joint Conference in Neural Networks (IJCNN) 2000, Como, Italy, 2000.Google Scholar
  11. 11.
    Teuvo Kohonen. Self-Organizing Maps,volume 30 of Springer Series in Information Sciences. Springer, Berlin, Heidelberg, 3rd edition, 1995.Google Scholar
  12. 12.
    Andreas König. A survey of methods for multivariate data projection, visualization and interactive analysis. In T. Yamakawa and G. Matsumoto, editors, Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems (IIZUKA’98), pages 55–59. World Scientific, October 1998.Google Scholar
  13. 13.
    Krista Lagus and Samuel Kaski. Keyword selection method for characterizing text document maps. In Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks, volume 1, pages 371–376. IEE, London, 1999.CrossRefGoogle Scholar
  14. 14.
    Jouko Lampinen and Timo Kostiainen. Recent advances in self-organizing neural networks,chapter Generative probability density model in the Self-Organizing Map. Springer Verlag, To appear.Google Scholar
  15. 15.
    R. S. Michalski and R. Stepp. Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5: 396–410, 1983.CrossRefGoogle Scholar
  16. 16.
    G. Piatetsky-Shapiro and C. Matheus. The interestingness of deviations. In Proceedings of KDD’94, pages 25–36, July 1994.Google Scholar
  17. 17.
    Dorian Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, 1999.Google Scholar
  18. 18.
    Andreas Rauber and Dieter Merkl. Automatic labeling of self-organizing maps: Making a treasure-map reveal its secrets. In Proceedings of the 3rd Pasific-Area Conference on Knowledge Discovery and Data Mining (PAKDD ’99), 1999.Google Scholar
  19. 19.
    Olli Simula, Jussi Ahola, Esa Alhoniemi, Johan Himberg, and Juha Vesanto. Kohonen Maps (E. Oja and S. Kaski, eds.), chapter Self-Organizing Map in Analysis of Large-Scale Industrial Systems. Elsevier, 1999.Google Scholar
  20. 20.
    Markus Siponen, Juha Vesanto, 011i Simula, and Petri Vasara. An approach to automated interpretation of SOM. In Nigel Allinson, Hujun Yin, Lesley Allinson, and Jon Slack, editors, Proceedings of Workshop on Self-Organizing Map 2001, pages 89–94. Springer, June 2001.Google Scholar
  21. 21.
    Edward Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983.Google Scholar
  22. 22.
    A. Ultsch, G. Guimaraes, D. Korns, and H. Li. Knowledge extraction from artificial neural networks and applications. In Proceedings of Transputer-Anwender-Treffen/World-Transputer-Congress (TAT/WTC) 1993,pages 194–203, Aachen, Tagungsband, September 1993. Springer Verlag.Google Scholar
  23. 23.
    A. Ultsch and H. P. Siemon. Kohonen’s Self Organizing Feature Maps for Exploratory Data Analysis. In Proceedings of International Neural Network Conference (INNC’90),pages 305–308, Dordrecht, Netherlands, 1990. Kluwer.Google Scholar
  24. 24.
    A. Vellido, P.J.G Lisboa, and K. Meehan. Segmentation of the on-line shopping market using neural networks. Expert Systems with Applications, 17: 303–314, 1999.CrossRefGoogle Scholar
  25. 25.
    Juha Vesanto. SOM-Based Data Visualization Methods. Intelligent Data Analysis, 3 (2): 111–126, 1999.MATHCrossRefGoogle Scholar
  26. 26.
    Juha Vesanto and Esa Alhoniemi. Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks, 11 (2): 586–600, March 2000.CrossRefGoogle Scholar
  27. 27.
    Colin Ware. Information Visualization: Perception for Design. Morgan Kaufmann Publishers, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Juha Vesanto
    • 1
  • Jaakko Hollmén
    • 1
  1. 1.Laboratory of Computer and Information ScienceHelsinki University of TechnologyFinland

Personalised recommendations