Skip to main content
Log in

SEWEBAR-CMS: semantic analytical report authoring for data mining results

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

SEWEBAR-CMS is a set of extensions for the Joomla! Content Management System (CMS) that extends it with functionality required to serve as a communication platform between the data analyst, domain expert and the report user. SEWEBAR-CMS integrates with existing data mining software through PMML. Background knowledge is entered via a web-based elicitation interface and is preserved in documents conforming to the proposed Background Knowledge Exchange Format (BKEF) specification. SEWEBAR-CMS offers web service integration with semantic knowledge bases, into which PMML and BKEF data are stored. Combining domain knowledge and mining model visualizations with results of queries against the knowledge base, the data analyst conveys the results of the mining through a semi-automatically generated textual analytical report to the end user. The paper demonstrates the use of SEWEBAR-CMS on a real-world task from the cardiological domain and presents a user study showing that the proposed report authoring support leads to a statistically significant decrease in the time needed to author the analytical report.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.dmg.org/pmml-v4-0.html

  2. http://ontopia.net

  3. http://www.oracle.com/technetwork/database/berkeleydb/

  4. Current sphygmomanometers (blood pressure measuring devices) mostly do not use mercury. Some newer devices already give readings in kilopascals (kPa), the SI measure of pressure.

  5. The PMML 4.0 specification http://www.dmg.org/pmml-v4-0.html states: “This information is not directly needed by a PMML consumer, but in many cases it is helpful for maintenance and visualization of the model. The particular content structure of MiningBuildTask is not defined by PMML”.

  6. For completeness, additional setting (for explanation refer to Rauch and Šimůnek (2005)) was: (a) Coefficient setting: Family status: Subset, length 1-2; BMI: Interval, length 1-3; all other: Subset, length 1-1, (b) Cedent setting: conjunction with minimum length 0; for condition the minimum length was 1.

  7. This information could have been used already in the task setting to prevent all rules involving normal diastolic blood pressure from being generated.

  8. We chose OKS, because it is a commercial-grade software with many deployments, open sourced in 2009.

  9. Depending on practical needs, different tolog queries can adopt different, e.g. looser or stricter definitions of confirmation.

  10. This is a simplifying heuristic replacing focused constraints on negation affecting a(ω a ) and b(ω b ), which could in our experience severely affect the complexity and the execution time of the tolog query.

  11. The remaining students either did not attend the course at all, or left it early in the semester, and thus did not have competence to answer the questions.

  12. All tasks were assumed to be accomplished by team work, but it was not strictly enforced.

  13. As outliers we considered points located more than 1.5 interquartile ranges below the 1st or above the 3rd quartiles.

  14. In this survey we omit approaches that consider background knowledge in numerical form, such as prior probability estimates or expertise-driven parameter setting for mining tools.

  15. We also omit knowledge-intensive, computationally costly approaches to learning over first-logic representation, such as Inductive Logic Programming, where prior background knowledge is an indispensable part of the learning process. These approaches have never penetrated industrial data mining except for very specific, inherently structural task settings such as those in molecular biology.

  16. Frequent subgroup mining can roughly be seen as GUHA-style association mining with fixed consequent.

  17. XSLT transformations need to be customized to fit the required PMML Mining Model and the possible DM tool’s extensions to PMML.

References

  • Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD (Vol. 22, No. 2, pp. 207–16). Washington, D.C.

  • Almuallim, H., Akiba, Y. A., & Kaneda, S. (2005). On handling tree-structured attributes in decision tree learning. In Proceedings of ICML 2005 (pp. 12–20). Morgan Kaufmann.

  • Amato, G., Gennaro, C., Savino, P., & Rabitti, F. (2005). Functionalities of a content management system specialised for digital library applications. In Proceedings of AVIVDiLib’05—7th international workshop of the EU NoE DELOS on audio-visual content and information visualisation in digital libraries. Cortona, Italy, 4–6 May 2005.

  • Antunes, C. (2009). Mining patterns in the presence of domain knowledge. In Proceedings of ICEIS (2) 2009 (pp. 188–193). Milan, Italy.

  • Aronis, J. M., Provost, F. J., & Buchanan, B. G. (1996). Exploiting background knowledge in automated discovery. In Proceedings of SIGKDD-96 (pp. 355–358). Portland, Oregon.

  • Atzmueller, M., & Puppe, F. (2009). A knowledge-intensive approach for semi-automatic causal subgroup discovery. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220, pp. 19–36). Springer.

  • Atzmueller, M., Lemmerich, F., Reutelshoefer, J., & Puppe, J. (2009). Wiki-enabled semantic data mining—task design, evaluation and refinement. In Proceedings of DERIS2009—design, evaluation and refinement of intelligent systems. Krakow, Poland, 28 November 2009, http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-545/.

  • Balhar, J., Kliegr, T., Štastný, D., & Vojíř, S. (2010). Elicitation of background knowledge for data mining. In Proceedings of Znalosti 2010, Jindrichuv Hradec (pp. 283–286). Prague: Oeconomica.

    Google Scholar 

  • Bernstein, A., Provost, F., & Hill, S. (2005). Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 17(4), 503–518.

    Article  Google Scholar 

  • Clark, P., & Matwin, S. (1993). Using qualitative models to guide inductive learning. In Proceedings of the 1993 international conference on machine learning (pp. 49–56). Amherst, MA.

  • Coulet, A., Smaïl-Tabbone M., Benlian, P., Napoli, A., & Devignes, M.-D. (2008). Ontology-guided data preparation for discovering genotype-phenotype relationships. BMC Bioinformatics, 9 (Suppl 4), S3.

    Article  Google Scholar 

  • Domingues, M. A., & Rezende, S. O. (2005). Using taxonomies to facilitate the analysis of the association rules. In Proceedings of KDO’05—2nd int’l workshop on knowledge discovery and ontologies, at ECML/PKDD (pp. 59-66). Porto.

  • Engels, R., Lindner, G., & Studer, R. (1998). Providing user support for developing knowledge discovery applications; a midterm report. In S. Wrobel (Ed.), Themenheft der Künstliche intelligenz (No. 1, pp. 38–39).

  • Euzenat, J., & Shvaiko, P. (2007). Ontology matching. Heidelberg: Springer-Verlag.

    MATH  Google Scholar 

  • Garshol, L. M. (2006). Tolog—A topic maps query language. In Proceedings of first international workshop on topic maps research and applications—TMRA 2006. LNCS (Vol. 3873). Leipzig: Springer.

    Google Scholar 

  • Garshol, L. M. (2007). TMRAP—Topic maps remote access protocol. In Proceedings of topic maps research and applications—TMRA 2006. LNAI (Vol. 4438). Leipzig: Springer.

    Google Scholar 

  • Garshol, L. M., & Moore, G. (2006). Topic Maps—XML Syntax. ISO/IEC JTC1/SC34. http://www.isotopicmaps.org/sam/sam-xtm/.

  • Guazzelli, A., Lin, W. L., & Jena, T. (2010). Unleashing the power of open standards for data mining and predictive analytics. CreateSpace. Lexington, KY.

  • Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation (Mathematical Foundations for a General Theory). Springer-Verlag.

  • Hazucha, A., Balhar, J., & Kliegr, T. (2010). A PHP library for Ontopia-CMS integration. In TMRA 2010. University of Leipzig, Leipzig, September 29- October 1, 2010.

  • Kliegr, T., Ovečka M., & Zemánek, J. (2009a). Topic maps for association rule mining. In Proceedings of topic maps research and applications—TMRA 2009. Leipziger Beitrage zur Informatik, Band XIX, 11–13 November 2009.

  • Kliegr, T., Ralbovský, M., Svátek, V, Šimůnek, M., Jirkovský, V., Nemrava, J., et al. (2009b). Semantic analytical reports: A framework for post-processing data mining results. In Foundations of intelligent systems (ISMIS’09). LNCS (pp. 88–98). Prague: Springer, 14–17 September 2009.

  • Kliegr, T., & Rauch, J. (2010). An XML format for association rule models based on GUHA method. In Proc. RuleML-2010, 4th international web rule symposium. LNCS. Washington: Springer.

    Google Scholar 

  • Kliegr, T., Svátek, V, Šimůnek, M., Stastný, D., & Hazucha, A. (2010). An XML schema and a topic map ontology for formalization of background knowledge in data mining. In IRMLeS-2010, 2nd ESWC workshop on inductive reasoning and machine learning for the semantic web. Heraklion, Crete, Greece. Online: http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-611/.

  • Kopanas, I., Avouris, N. M., & Daskalaki, S. (2002). The role of domain knowledge in a large scale data mining project. In Methods and applications of artificial intelligence. LNCS (Vol. 2308, pp. 288–299). Springer.

  • Kuo, Y.-T., Lonie, A., Sonenberg, L., & Paizis, K. (2007). Domain ontology driven data mining: A medical case study. In Proceedings of the 2007 international workshop on domain driven data mining at KDD’07. San Jose, California, 12–15 August 2007.

  • Nazeri, Z., & Bloedorn, E. (2004). Exploiting available domain knowledge to improve mining aviation safety and network security data. In: Proceedings of KDO-2004—workshop knowledge discovery and ontologies at ECML/PKDD 2004. Pisa, Italy.

  • Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning, 6, 231–250.

    Google Scholar 

  • Olaru, A., Marinica, C., & Guillet, F. (2009). Local mining of Association Rules with Rule Schemas. In CIDM 2009—symposium on computational intelligence and data mining (pp. 118–124). Nashville, TN, March 30–April 2 2009. http://www.claudiamarinica.com/pdf/CIDM2009.pdf.

  • OWL Web Ontology Language Overview. W3C Recommendation, 10 February 2004. http://www.w3.org/TR/owl-features/.

  • Phillips, J., & Buchanan, B. G. (2001). Ontology-guided knowledge discovery in databases. In Proceedings of the 1st international conference on knowledge capture (pp. 123–130). Victoria, Canada.

  • Podpečan, V., Lavrač, N., Kok, J. N., & de Bruin, J. (Eds.) (2009). SoKD’09’—third generation data mining: Towards service-oriented knowledge discovery. Slovenia, 7 September 2009.

  • Rauch, J. (2005). Logic of association rules. Applied Intelligence, 22, 9–28.

    Article  MATH  Google Scholar 

  • Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining. In Advances in data management. Studies in computational intelligence (Vol. 223). Springer.

  • Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the SEWEBAR project. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220). Springer.

  • Rauch, J., & Šimůnek, M. (2005). Alternative approach to mining association rules. In T. Y. Lin, S. Ohsuga, C. J. Liau & S. Tsumoto (Eds.), Data mining: Foundations, methods, and applications. Springer-Verlag.

  • Rauch, J., & Šimůnek, M. (2007). Semantic web presentation of analytical reports from data mining—preliminary considerations. In: Proceedings of web intelligence’07 (pp. 3–7). Silicon Valley: IEEE.

    Google Scholar 

  • Svátek, V. (1997). Exploiting value hierarchies in rule learning. In Proceedings of ECML’97—9th European conference on machine learning (pp. 108–117). Prague: Poster Papers.

    Google Scholar 

  • Suyama, A., & Yamaguchi, T. (1998). Specifying and learning inductive learning systems using ontologies. In Proceedings of AAAI’98 work. On the methodology of applying mach. learn (pp. 29–36). Madison, Wisconsin, July 26–30, 1998.

  • Thomas, J., Laublet, P., & Ganascia, J. G. (1993). A machine learning tool designed for a model-based knowledge acquisition approach. In EKAW-93—European knowledge acquisition workshop. LNCS (No. 723, pp. 123–138). Toulouse and Caylus: Springer.

  • Tomečková, M. (2004). Minimal data model of the cardiological patient—the selection of data. Cor et Vasa, 44(4), 123.

    Google Scholar 

  • Tseng, M.-C., Lin, W.-Y., & Jeng, R. (2007). Mining association rules with ontological information. In ICIC 2007—second international conference on innovative comp., inform. and control. Kumamoto, Japan.

  • van Dompseler, H. J. H., & van Someren, M. W. (1994). Using models of problem solving bias in automated knowledge acquisition. In Proceedings of ECAI’94—European conference on artificial intelligence (pp. 503–507). Amsterdam.

  • Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In Proceedings of ICML 2001 (pp. 577–584). Williamstown: Morgan Kaufmann.

    Google Scholar 

  • Zeman, M., Ralbovský, M., Svátek, V., & Rauch, J. (2009). Ontology-driven data preparation for association mining. In Proceedings of Znalosti 2009 (pp. 270–283). Brno, Czech Republic.

Download references

Acknowledgements

The work described here has been supported by Grant No. ME913 of Ministry of Education, Youth and Sports, of the Czech Republic, and by Grant No. 201/08/0802 of the Czech Science Foundation, and by Grant No. IGA 21/08 of the University of Economics, Prague. We would like to thank Marie Tomečková, who gave us a valuable feedback on the expert elicitation interface, and the following colleagues who significantly contributed to SEWEBAR-CMS: Jakub Balhar, Daniel Štastný, Vojtěch Jirkovský, Jan Nemrava, Stanislav Vojíř and Jan Zemánek. Last, but no least, we would like to thank teachers at the University of Economics, Prague, who devoted their time to the evaluation of the framework in the educational context.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomáš Kliegr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kliegr, T., Svátek, V., Ralbovský, M. et al. SEWEBAR-CMS: semantic analytical report authoring for data mining results. J Intell Inf Syst 37, 371–395 (2011). https://doi.org/10.1007/s10844-010-0137-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-010-0137-0

Keywords

Navigation