Advertisement

In-Database Rule Learning Under Uncertainty: A Variable Precision Rough Set Approach

  • Frank BeerEmail author
  • Ulrich Bühler
Chapter
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 377)

Abstract

Relational Database Systems are the predominant repositories to store mission-critical information collected from industrial sensor devices, business transactions and sourcing activities, among others. As such, they provide an exceptional gateway for data science. However, conventional knowledge discovery processes require data to be transported to external mining tools, which is a very challenging exercise in practice. To get over this dilemma, equipping databases with predictive capabilities is a promising direction. Using Rough Set Theory is particularly interesting for this subject, because it has the ability to discover hidden patterns while founded on well-defined set operations. Unfortunately, existing implementations consider data to be static, which is a prohibitive assumption in situations where data evolve over time and concepts tend to drift. Therefore, we propose an in-database rule learner for nonstationary environments in this chapter. The assessment under different scenarios with other state-of-the-art rule inducers demonstrate the algorithm is comparable with existing methods, but superior when applied to critical applications that anticipate further confidence from the decision-making process.

Notes

Acknowledgements

The authors would like to thank the German Federal Ministry of Education and Research (BMBF) for support within the project IntErA under grant number 03FH023PX3.

References

  1. 1.
    Tileston, T.: Have your cake & eat it too! accelerate data mining combining SAS & teradata. In: Teradata Partners 2005 Experience the Possibilities (2005)Google Scholar
  2. 2.
    Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis practices for big data. In: Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1481–1492. VLDB Endowment (2009)Google Scholar
  3. 3.
    Shreya, P., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I.: Large-scale predictive analytics in vertica: fast data transfer, distributed model creation, and In-database prediction. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1657–1668. ACM (2015)Google Scholar
  4. 4.
    Luo, S., Gao, Z.J., Gubanov, M., Perez, L.L., Jermaine, C.: Scalable linear algebra on a relational database system. In: Proceedings of the IEEE 33rd International Conference on Data Engineering (ICDE 2017), pp. 523–534. IEEE (2017)Google Scholar
  5. 5.
    Fernandez-Baizán, M.C., Menasalvas Ruiz, E., Peña Sánchez, J.M.: Integrating RDMS and data mining capabilities using rough sets. In: Proceedings of the 6th International Conference on Information Processing and Management of Uncertainty (IPMU’96), pp. 1439–1445 (1996)Google Scholar
  6. 6.
    Kumar, A.: New techniques for data reduction in a database system for knowledge discovery applications. J. Intell. Inf. Syst. 10(1), 31–48 (1998)CrossRefGoogle Scholar
  7. 7.
    Hu, X., Lin, T.Y., Han, J.: A new rough set model based on database systems (RSFDGrC 2003). In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol. 2639, pp. 114–121. Springer, LNCS (2003)Google Scholar
  8. 8.
    Vaithyanathan, K., Lin, T.Y.: High frequency rough set model based on database systems. In: Proceedings of the 2008 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS 2008), pp. 1–6. IEEE (2008)Google Scholar
  9. 9.
    Z̆liobaitė, I.: Learning under concept drift: an overview. Technical report, Faculty of Mathematics and Informatics, Vilnius University (2010)Google Scholar
  10. 10.
    Gama, J., Z̆liobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. In: ACM Computing Surveys, vol. 46, no. 4, pp. 1–37. ACM (2014)CrossRefGoogle Scholar
  11. 11.
    Rozsypal, A., Kubat, M.: Association mining in time-varying domains. Intell. Data Anal. 9(3), 273–288 (2005)CrossRefGoogle Scholar
  12. 12.
    Kukar, M.: Drifting concepts as hidden factors in clinical studies. In: Artificial Intelligence in Medicine, vol. 2780, pp. 355–364. Springer, LNCS (2003)CrossRefGoogle Scholar
  13. 13.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. In: ACM Computing Surveys, vol. 41, no. 3, 15, pp. 1–58. ACM (2009)CrossRefGoogle Scholar
  14. 14.
    Beer, F., Bühler, U.: Learning adaptive decision rules inside relational database systems. In: Proceedings of the 2nd International Symposium of Fuzzy and Rough Sets (ISFUROS), pp. 1–12 (2017)Google Scholar
  15. 15.
    Beer, F., Bühler, U.: In-database feature selection using rough set theory. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016), CCIS, vol. 611, pp. 393–407. Springer (2016)Google Scholar
  16. 16.
    Pawlak, Z.: Rough sets. In: International Journal of Computer and Information Science, vol. 11, no. 5, pp. 341–356. Kluwer (1982)Google Scholar
  17. 17.
    Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht (1991)zbMATHGoogle Scholar
  18. 18.
    Ziarko, W.: Variable precision rough set model. In: Journal of Computer and System Sciences, vol. 46, no. 1, pp. 39–59. Elsevier (1993)Google Scholar
  19. 19.
    Nguyen, H.S.: Approximate Boolean reasoning: foundations and applications in data mining. In: Transactions on Rough Sets V, vol. 4100, pp. 334–506. Springer, LNCS (2006)Google Scholar
  20. 20.
    Machuca,F., Millán, M.: Enhancing query processing in extended relational database systems via rough set theory to exploit data mining potentials. In: Knowledge Management in Fuzzy Databases. Studies in Fuzziness and Soft Computing, vol. 39, pp. 349–370. Physica (2000)Google Scholar
  21. 21.
    Han, J., Hu, X., Lin, T.Y.: A new computation model for rough set theory based on database systems. In: Data Warehousing and Knowledge Discovery (DaWaK 2003), vol. 2737, pp. 381–390. Springer, LNCS (2003)CrossRefGoogle Scholar
  22. 22.
    Beer, F., Bühler, U.: An In-database rough set Toolkit. In: Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR and FGDB (LWA’15), pp. 146–157. CEUR-WS (2015)Google Scholar
  23. 23.
    Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)Google Scholar
  24. 24.
    Michalski, R.S.: On the Quasi-minimal solution of the general covering problem. In: Proceedings of the 5th International Symposium on Information Processing, pp. 125–128 (1969)Google Scholar
  25. 25.
    Maloof, M.A.: Incremental rule learning with partial instance memory for changing concepts. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), pp. 2764–2769 (2003)Google Scholar
  26. 26.
    Ferrer-Troyano, F.J., Aguilar-Ruiz, J.S., Riquelme, J.C.: Incremental rule learning and border examples selection from numerical data streams. J. Univers. Comput. Sci. 11(8), 1426–1439 (2005)Google Scholar
  27. 27.
    Gama, J., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11), pp. 1255–1260. AAAI Press (2011)Google Scholar
  28. 28.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Journal of the American Statistical Association, vol. 58, no. 301, pp. 1330. Taylor & Francis (1963)Google Scholar
  29. 29.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 71-80. ACM (2000)Google Scholar
  30. 30.
    Kosina, P., Gama, J.: Handling time changing data with adaptive very fast decision rules. In: Machine Learning and Knowledge Discovery in Databases, vol. 7523, pp. 827–842. Springer, LNCS (2012)CrossRefGoogle Scholar
  31. 31.
    Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), pp. 523–528. ACM (2003)Google Scholar
  32. 32.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  33. 33.
    Stahl, F., Gaber, M.M., Salvador, M.M.: eRules: a modular adaptive classification rule learning algorithm for data streams. In: Research and Development in Intelligent Systems XXIX (SGAI 2012), pp. 65–78. Springer (2012)Google Scholar
  34. 34.
    Cendrowska, J.: PRISM: an algorithm for inducing modular rules. In: International Journal of Man-Machine Studies, vol. 27, no. 4, pp. 349–370. Academic Press (1987)Google Scholar
  35. 35.
    Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Di Fatta, G.: Computationally efficient rule-based classification for continuous streaming data. In: Research and Development in Intelligent Systems XXXI (SGAI 2014), pp. 21–34. Springer (2014)Google Scholar
  36. 36.
    Le, T., Stahl, F., Gaber, M.M., Gomes, J.B., Di Fatta, G.: On expressiveness and uncertainty awareness in rule-based classification for data streams. In: Neurocomputing, vol. 265(C), pp. 127–141. Elsevier (2017)Google Scholar
  37. 37.
    Deckert, M., Stefanowski, J.: RILL: algorithm for learning rules from streaming data with concept drift. In: Foundations of Intelligent Systems, vol. 8502, pp. 20–29. Springer, LNCS (2014)Google Scholar
  38. 38.
    Pawlak, Z.: Information systems - theoretical foundations. In: Information Systems, vol. 6, no. 3, pp. 205–218. Elsevier (1981)Google Scholar
  39. 39.
    Lin, T.Y.: An overview of rough set theory from the point of view of relational databases. In: Bulletin of International Rough Set Society, vol. 1, no. 1, pp. 30–34. IRSS (1997)Google Scholar
  40. 40.
    Michalski, R.S.: A theory and methodology of inductive learning. In: Artificial Intelligence, vol. 20, no. 2, pp. 111–161. Elsevier (1983)Google Scholar
  41. 41.
    Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. In: Data Mining and Knowledge Discovery, vol. 23, no. 1, pp. 128–168. Springer (2011)Google Scholar
  42. 42.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 139–148. ACM (2009)Google Scholar
  43. 43.
    Hulten, G.S., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 97–106. ACM (2001)Google Scholar
  44. 44.
    Losing, V., Hammer, B., Wersing, H.: Interactive online learning for obstacle classification on a mobile robot. In: Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)Google Scholar
  45. 45.
    Street, W., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 377–382. ACM (2001)Google Scholar
  46. 46.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. In: IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1517–1531 (2011)CrossRefGoogle Scholar
  47. 47.
    van Rijsbergen, C.J.: Foundations of evaluation. J. Doc. 30(4), 365–373 (1974)CrossRefGoogle Scholar
  48. 48.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  49. 49.
    Dawid, P.A.: Present position and potential developments: some personal views: statistical theory: the prequential approach. In: Journal of the Royal Statistical Society, vol. 147, no. 2, pp. 278–292. Wiley (1984)Google Scholar
  50. 50.
    Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 329–338. ACM (2009)Google Scholar
  51. 51.
    Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Wilcoxon, F.: Individual comparisons by ranking methods. In: Biometrics Bulletin, vol. 1, no. 6, pp. 80–83. Wiley (1945)Google Scholar
  53. 53.
    Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-oriented perspectives. In: International Journal of Intelligent Systems, vol. 16, no. 1, pp. 13–27. Wiley (2001)Google Scholar
  54. 54.
    McGarry, K.: A survey of interestingness measures for knowledge discovery. In: The Knowledge Engineering Review, vol. 20, no. 1, pp. 39–61. Cambridge University Press (2005)Google Scholar
  55. 55.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. In: ACM Computing Surveys, vol. 38, no. 3, p. 9. ACM (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Applied Sciences FuldaFuldaGermany

Personalised recommendations