Skip to main content

Towards User-Aware Rule Discovery

  • Conference paper
  • First Online:
Information Search, Integration, and Personlization (ISIP 2016)

Abstract

Rule discovery is a challenging but inevitable process in several data centric applications. The main challenges arise from the huge search space that needs to be explored, and from the noise in the data, which makes the mining results hardly useful. While existing state-of-the-art systems pose the users at the beginning and the end of the mining process, we argue that this paradigm must be revised and new rule mining algorithms should be developed to let the domain experts interact during the discovery process. We discuss how new systems that embrace this approach overcome current limitations and ultimately result in shorter time and smaller user effort for rule discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endow. 9(4), 336–347 (2015)

    Article  Google Scholar 

  2. Abedjan, Z., Chu, X., Deng, D., Fernandez, R.C., Ilyas, I.F., Ouzzani, M., Papotti, P., Stonebraker, M., Tang, N.: Detecting data errors: Where are we and what needs to be done? Proc. VLDB Endow. 9(12), 993–1004 (2016)

    Article  Google Scholar 

  3. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22(2), 207–216 (1993)

    Article  Google Scholar 

  4. Bhatla, T.P., Prabhu, V., Dua, A.: Understanding credit card frauds. In Cards Business Review 1.6 (2003)

    Google Scholar 

  5. Brause, R., Langsdorf, T., Hepp, M.: Neural data mining for credit card fraud detection. In: ICTAI (1999)

    Google Scholar 

  6. Chardin, B., Coquery, E., Pailloux, M., Petit, J.-M.: RQL: a query language for rule discovery in databases. Theoretical Computer Science, November 2016

    Google Scholar 

  7. Chen, Y., Goldberg, S., Wang, D.Z., Johri, S.S.: Ontological pathfinding: mining first-order knowledge from large knowledge bases. In: SIGMOD, pp. 835–846. ACM (2016)

    Google Scholar 

  8. Chiang, F., Miller, R.J.: Discovering data quality rules. PVLDB 1(1), 1166–1177 (2008)

    Google Scholar 

  9. Chiticariu, L., Li, Y., Reiss, F.: Transparent machine learning for information extraction. In: EMNLP (tutorial) (2015)

    Google Scholar 

  10. Chu, X., Ilyas, I.F., Papotti, P.: Discovering denial constraints. Proc. VLDB Endow. 6(13), 1498–1509 (2013)

    Article  Google Scholar 

  11. Chu, X., Morcos, J., Ilyas, I.F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: SIGMOD (2015)

    Google Scholar 

  12. Dieng, C.T., Jen, T.-Y., Laurent, D., Spyratos, N.: Mining frequent conjunctive queries using functional and inclusion dependencies. VLDB J. 22(2), 125–150 (2013)

    Article  Google Scholar 

  13. Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE TKDE 23(5), 683–698 (2011)

    Google Scholar 

  14. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)

    Article  Google Scholar 

  15. Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: EDBT, pp. 473–478 (2016)

    Google Scholar 

  16. Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)

    Article  Google Scholar 

  17. He, J., Veltri, E., Santoro, D., Li, G., Mecca, G., Papotti, P., Tang, N.: Interactive and deterministic data cleaning. In: SIGMOD (2016)

    Google Scholar 

  18. Heer, J., Hellerstein, J., Kandel, S.: Predictive interaction for data transformation. In: CIDR (2015)

    Google Scholar 

  19. Heise, A., Quiané-Ruiz, J.-A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4), 301–312 (2013)

    Article  Google Scholar 

  20. Hu, B., Patkos, T., Chibani, A., Amirat, Y.: Rule-based context assessment in smart cities. In: Web Reasoning and Rule Systems: RR, pp. 221–224 (2012)

    Chapter  Google Scholar 

  21. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  Google Scholar 

  22. Julisch, K., Dacier, M.: Mining intrusion detection alarms for actionable knowledge. In: KDD, pp. 366–375 (2002)

    Google Scholar 

  23. Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Quiane-Ruiz, J.-A., Papotti, P., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD (2015)

    Google Scholar 

  24. Milo, T., Novgorodov, S., Tan, W.-C.: RUDOLF: interactive rule refinement system for fraud detection. Proc. VLDB Endow. 9(13), 1465–1468 (2016)

    Article  Google Scholar 

  25. Naumann, F., Herschel, M.: An Introduction to Duplicate Detection. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2010)

    Google Scholar 

  26. Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)

    Google Scholar 

  27. Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R.J., Srivastava, D.: Combining quantitative and logical data cleaning. Proc. VLDB Endow. 9(4), 300–311 (2015)

    Article  Google Scholar 

  28. Roesch, M.: SNORT - Lightweight intrusion detection for networks. In: LISA, pp. 229–238 (1999)

    Google Scholar 

  29. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)

    Google Scholar 

  30. Singh, R., Meduri, V., Elmagarmid, A.K., Madden, S., Papotti, P., Quiané-Ruiz, J., Solar-Lezama, A., Tang, N.: Generating concise entity matching rules. In: SIGMOD, pp. 1635–1638 (2017)

    Google Scholar 

  31. Song, S., Chen, L., Cheng, H.: Efficient determination of distance thresholds for differential dependencies. IEEE Trans. Knowl. Data Eng. 26(9), 2179–2192 (2014)

    Article  Google Scholar 

  32. Suganthan, P., Sun, C., Gayatri, K., Zhang, H., Yang, F., Rampalli, N., Prasad, S., Arcaute, E., Krishnan, G., Deep, R., Raghavendra, V., Doan, A.: Why big data industrial systems need rules and what we can do about it. In: SIGMOD, pp. 265–276 (2015)

    Google Scholar 

  33. Wang, J., Li, G., Yu, J.X., Feng, J.: Entity matching: how similar is similar. Proc. VLDB Endow. 4(10), 622–633 (2011)

    Article  Google Scholar 

  34. Wyss, C., Giannella, C., Robertson, E.: FastFDs: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 101–110. Springer, Heidelberg (2001). doi:10.1007/3-540-44801-2_11

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Papotti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Meduri, V.V., Papotti, P. (2017). Towards User-Aware Rule Discovery. In: Kotzinos, D., Laurent, D., Petit, JM., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personlization. ISIP 2016. Communications in Computer and Information Science, vol 760. Springer, Cham. https://doi.org/10.1007/978-3-319-68282-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68282-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68281-5

  • Online ISBN: 978-3-319-68282-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics