Abstract
Rule discovery is a challenging but inevitable process in several data centric applications. The main challenges arise from the huge search space that needs to be explored, and from the noise in the data, which makes the mining results hardly useful. While existing state-of-the-art systems pose the users at the beginning and the end of the mining process, we argue that this paradigm must be revised and new rule mining algorithms should be developed to let the domain experts interact during the discovery process. We discuss how new systems that embrace this approach overcome current limitations and ultimately result in shorter time and smaller user effort for rule discovery.
References
Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endow. 9(4), 336–347 (2015)
Abedjan, Z., Chu, X., Deng, D., Fernandez, R.C., Ilyas, I.F., Ouzzani, M., Papotti, P., Stonebraker, M., Tang, N.: Detecting data errors: Where are we and what needs to be done? Proc. VLDB Endow. 9(12), 993–1004 (2016)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22(2), 207–216 (1993)
Bhatla, T.P., Prabhu, V., Dua, A.: Understanding credit card frauds. In Cards Business Review 1.6 (2003)
Brause, R., Langsdorf, T., Hepp, M.: Neural data mining for credit card fraud detection. In: ICTAI (1999)
Chardin, B., Coquery, E., Pailloux, M., Petit, J.-M.: RQL: a query language for rule discovery in databases. Theoretical Computer Science, November 2016
Chen, Y., Goldberg, S., Wang, D.Z., Johri, S.S.: Ontological pathfinding: mining first-order knowledge from large knowledge bases. In: SIGMOD, pp. 835–846. ACM (2016)
Chiang, F., Miller, R.J.: Discovering data quality rules. PVLDB 1(1), 1166–1177 (2008)
Chiticariu, L., Li, Y., Reiss, F.: Transparent machine learning for information extraction. In: EMNLP (tutorial) (2015)
Chu, X., Ilyas, I.F., Papotti, P.: Discovering denial constraints. Proc. VLDB Endow. 6(13), 1498–1509 (2013)
Chu, X., Morcos, J., Ilyas, I.F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: SIGMOD (2015)
Dieng, C.T., Jen, T.-Y., Laurent, D., Spyratos, N.: Mining frequent conjunctive queries using functional and inclusion dependencies. VLDB J. 22(2), 125–150 (2013)
Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE TKDE 23(5), 683–698 (2011)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)
Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: EDBT, pp. 473–478 (2016)
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)
He, J., Veltri, E., Santoro, D., Li, G., Mecca, G., Papotti, P., Tang, N.: Interactive and deterministic data cleaning. In: SIGMOD (2016)
Heer, J., Hellerstein, J., Kandel, S.: Predictive interaction for data transformation. In: CIDR (2015)
Heise, A., Quiané-Ruiz, J.-A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4), 301–312 (2013)
Hu, B., Patkos, T., Chibani, A., Amirat, Y.: Rule-based context assessment in smart cities. In: Web Reasoning and Rule Systems: RR, pp. 221–224 (2012)
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Julisch, K., Dacier, M.: Mining intrusion detection alarms for actionable knowledge. In: KDD, pp. 366–375 (2002)
Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Quiane-Ruiz, J.-A., Papotti, P., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD (2015)
Milo, T., Novgorodov, S., Tan, W.-C.: RUDOLF: interactive rule refinement system for fraud detection. Proc. VLDB Endow. 9(13), 1465–1468 (2016)
Naumann, F., Herschel, M.: An Introduction to Duplicate Detection. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2010)
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)
Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R.J., Srivastava, D.: Combining quantitative and logical data cleaning. Proc. VLDB Endow. 9(4), 300–311 (2015)
Roesch, M.: SNORT - Lightweight intrusion detection for networks. In: LISA, pp. 229–238 (1999)
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: SIGKDD, pp. 269–278 (2002)
Singh, R., Meduri, V., Elmagarmid, A.K., Madden, S., Papotti, P., Quiané-Ruiz, J., Solar-Lezama, A., Tang, N.: Generating concise entity matching rules. In: SIGMOD, pp. 1635–1638 (2017)
Song, S., Chen, L., Cheng, H.: Efficient determination of distance thresholds for differential dependencies. IEEE Trans. Knowl. Data Eng. 26(9), 2179–2192 (2014)
Suganthan, P., Sun, C., Gayatri, K., Zhang, H., Yang, F., Rampalli, N., Prasad, S., Arcaute, E., Krishnan, G., Deep, R., Raghavendra, V., Doan, A.: Why big data industrial systems need rules and what we can do about it. In: SIGMOD, pp. 265–276 (2015)
Wang, J., Li, G., Yu, J.X., Feng, J.: Entity matching: how similar is similar. Proc. VLDB Endow. 4(10), 622–633 (2011)
Wyss, C., Giannella, C., Robertson, E.: FastFDs: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 101–110. Springer, Heidelberg (2001). doi:10.1007/3-540-44801-2_11
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Meduri, V.V., Papotti, P. (2017). Towards User-Aware Rule Discovery. In: Kotzinos, D., Laurent, D., Petit, JM., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personlization. ISIP 2016. Communications in Computer and Information Science, vol 760. Springer, Cham. https://doi.org/10.1007/978-3-319-68282-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-68282-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68281-5
Online ISBN: 978-3-319-68282-2
eBook Packages: Computer ScienceComputer Science (R0)