Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

Nin, Jordi; Laurent, Anne; Poncelet, Pascal

doi:10.1007/s10844-009-0112-9

Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

Published: 19 November 2009

Volume 35, pages 447–463, (2010)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jordi Nin^1,2,
Anne Laurent² &
Pascal Poncelet²

181 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

Gradual rules allow users to be provided with rules describing the ordering correlations among attributes. Such a rule is for instance given by the higher the salary and the lower the number of cars, the higher the number of tourist travels. Previously intensively used in fuzzy command systems, these rules were manually provided to the system. More recently, they have received attention from the data mining community and methods have been defined to automatically extract and maintain gradual rules from numerical databases. However, no method has been shown to be able to handle data streams, as no method is scalable enough to manage the high rate which stream data arrive at. In this paper, we thus propose an original approach to mine data streams for gradual rules. Our method is based on B-Trees and OWA (Ordered Weighted Aggregation) operator in order to speed up the process. B-Trees are used to store already-known gradual rules in order to maintain the knowledge over time, while OWA operators provide a fast way to discard non relevant data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Graduality in Data Sciences: Gradual Patterns

M2LFGP: Mining Gradual Patterns over Fuzzy Multiple Levels

Utility-Oriented Gradual Itemsets Mining Using High Utility Itemsets Mining

Notes

If a tuple has an attribute lower or higher than the minimum or maximum value predefined, such value can be normalized as 0 (for the minimum) or 1 (for the maximum).

References

Aggarwal, C. (2007). Data streams: Models and algorithms. New York: Springer.
MATH Google Scholar
Alon, N., Matias, Y., & Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proc. of the 28th annual ACM symposium on theory of computing (STOC’96) (pp. 20–29).
Berzal, F., Cubero, J., Sanchez, D., Vila, M., & Serrano, J. (2007). An alternative approach to discover gradual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), 15(5), 559–570.
Article MATH MathSciNet Google Scholar
Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.
Article MATH Google Scholar
Calders, T., Dexters, N., & Goethals, B. (2007). Mining frequent itemsets in a stream. In Proc. of the IEEE int. conference on data mining (ICDM’07) (pp. 83–92).
Calders, T., Dexters, N., & Goethals, B. (2008). Mining frequent items in a stream using flexible windows. Journal of Intelligent Data Analysis, 12(3), 293–304.
Google Scholar
Charikar, M., Chen, K., & Farach-Colton, M. (2004). Finding frequent items in data streams. Theoretical Computer Sciences, 312(1), 3–15.
Article MATH MathSciNet Google Scholar
Chi, Y., Wang, H., Yu, P., & Muntz, R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proc. of the IEEE int. conference on data mining (ICDM’04).
Coenen, F., Goulbourne, G., & Leng, P. (2004). Tree structures for mining association rules. Data Mining and Knowledge Discovery, 8(1), 25–51.
Article MathSciNet Google Scholar
Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. Cambridge: MIT.
Google Scholar
Greco, S., Matarazzo, B., Pappalardo, N., & Slowinski, R. (2006). Measuring expected effects of interventions based on decision rules. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2).
Greenwa, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proc. of the ACM int. conference on management of data (SIGMOD’01) (pp. 56–66).
Han, J., Chen, Y., Dong, G., Pei, J., Wah, B., Wang, J., et al. (2005). Stream cube: An architecture for multi-dimensional analysis of data streams. Distributed and Parallel Databases, 18(2), 173–197.
Article Google Scholar
Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proc. of ACM int. conference on knowledge discovery and data mining (KDD’00).
Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8, 53–87.
Article MathSciNet Google Scholar
Hüllermeier, E. (2002). Association rules for expressing gradual dependences. In Proc. of the 6th Eu. conf. on principles of data mining and knowledge discovery (PKDD’02) (pp. 200–211).
Jorio, L. D., Laurent, A., & Teisseire, M. (2008). Fast extraction of gradual association rules: A heuristic based method. In IEEE/ACM int. conference on soft computing as transdisciplinary science and technology (CSTST).
Jorio, L. D., Laurent, A., & Teisseire, M. (2009). Mining for gradualness over time using sequential patterns. In Intelligent decision technologies (IDT’09).
Li, H.-F., Lee, S., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of 1st int. workshop on knowledge discovery in data streams.
Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proc.of 28th int. conf. very large databases.
Masseglia, F., Poncelet, P., Teisseire, M., & Marascu, A. (2008). Web usage mining: extracting unexpected periods from web logs. Data Mining and Knowledge Discover, 16(1), 39–65.
Article MathSciNet Google Scholar
Miller, R. J., & Yang, Y. (1997). Association rules over interval data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 452–461).
Nag, B., Deshpande, P. M., & DeWitt, D. J. (1999). Using a knowledge cache for interactive discovery of association rules. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 244–253).
Raissi, C., Poncelet, P., & Teisseire, M. (2006). Speed: Mining maximal sequential patterns over data streams. In Proc. of the 3rd IEEE int. conference on intelligent systems (IS2006).
Raissi, C., Poncelet, P., & Teisseire, M. (2007). Towards a new approach for mining maximal frequent itemsets over data stream. Journal of Intelligent Information Systems (JIIS), 28(1), 23–36.
Article Google Scholar
Ras, Z., & Wieczorkowska, A. (2000). Action rules: how to increase profit of a company. In Proceedings of the principles of data mining and knowledge discovery (PKDD 00) (pp. 587–592). Lyon, France.
Skowron, A., & Synak, P. (2006). Planning based on reasoning about information changes. In Rough sets and current trends in computing (pp. 165–173).
Stollnitz, E. J., Derose, T. D., & Salesin, D. H. (1996). Wavelets for computer graphics: Theory and applications. San Mateo: Morgan Kaufmann.
Google Scholar
Teng, W.-G., Chen, M.-S., & Yu, P. (2003). A regression-based temporal patterns mining schema for data streams. In Proc. of 29th int. conf. very large databases (VLDB’03) (pp. 93–104).
Torra, V. (1997). The weighted owa operator. International Journal of Intelligent Systems, 12, 153–166.
Article MATH Google Scholar
Torra, V. (2004). Owa operators in data modeling and re-identification. IEEE Transaction on Fuzzy Systems, 12(5), 652–660.
Article Google Scholar
Torra, V., & Narukawa, Y. (2007). Modeling decisions: Information fusion and aggregation operators. New York: Springer.
Google Scholar
Torra, V., & Nin, J. (2008). Record linkage for database integration using fuzzy integrals. International Journal of Intelligent Systems, 23(6), 715–734.
Article MATH Google Scholar
Tsay, L.-S., & Ras, Z. (2005). Action rules discovery system dear, method and experiments. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 119–128.
Article MATH Google Scholar
Tzacheva, A., & Ras, Z. (2005). Action rules mining. International Journal of Intelligent Systems, 20(7), 719–736.
Article MATH Google Scholar
Verma, K., Vyas, O., & Vyas, R. (2005). Temporal approach to association rule mining using t-tree and p-tree. In Machine learning and data mining in pattern recognition. Lecture notes in computer sciences (Vol. 3587, pp. 651–659). Springer.
Vitter, J. S. (1985). Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 37–57.
Article MATH MathSciNet Google Scholar
Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transaction on Systems, Man and Cybernetics, 18, 183–190.
Article MATH MathSciNet Google Scholar
Yager, R. R. (1993). Families of owa operators. Fuzzy Sets and Systems, 59, 125–148.
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

Partial support by the European Community through the 7th Framework Programme Marie Curie Intra-European fellowship, contract No 235226 is acknowledged. Partial support by the Spanish MEC and Generalitat de Catalunya (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004, eAEGIS – TSI2007-65406-C03-02 and 2009-SGR-7) is also acknowledged This work was also done in the context of the French ANR Project MIDAS (AND-07-MDCO-008).

Author information

Authors and Affiliations

LAAS, Lab. d’Analyse et d’Architecture des Systèmes, CNRS, Centre National de la Recherche Scientifique, 7, Avenue du Colonel Roche, Toulouse, 31077, France
Jordi Nin
LIRMM - CNRS UMR 5506, Univ. Montpellier 2, 161 rue Ada, 34395, Montpellier Cedex 5, France
Jordi Nin, Anne Laurent & Pascal Poncelet

Authors

Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar
Anne Laurent
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Poncelet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Nin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nin, J., Laurent, A. & Poncelet, P. Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach. J Intell Inf Syst 35, 447–463 (2010). https://doi.org/10.1007/s10844-009-0112-9

Download citation

Received: 27 April 2009
Revised: 05 November 2009
Accepted: 05 November 2009
Published: 19 November 2009
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10844-009-0112-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

Abstract

Access this article

Similar content being viewed by others

Graduality in Data Sciences: Gradual Patterns

M2LFGP: Mining Gradual Patterns over Fuzzy Multiple Levels

Utility-Oriented Gradual Itemsets Mining Using High Utility Itemsets Mining

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach

Abstract

Access this article

Similar content being viewed by others

Graduality in Data Sciences: Gradual Patterns

M2LFGP: Mining Gradual Patterns over Fuzzy Multiple Levels

Utility-Oriented Gradual Itemsets Mining Using High Utility Itemsets Mining

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation