Comprehensive Log Compression with Frequent Patterns

  • Kimmo Hätönen
  • Jean François Boulicaut
  • Mika Klemettinen
  • Markus Miettinen
  • Cyrille Masson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2737)


In this paper we present a comprehensive log compression (CLC) method that uses frequent patterns and their condensed representations to identify repetitive information from large log files generated by communications networks. We also show how the identified information can be used to separate and filter out frequently occurring events that hide other, unique or only a few times occurring events. The identification can be done without any prior knowledge about the domain or the events. For example, no pre-defined patterns or value combinations are needed. This separation makes it easier for a human observer to perceive and analyse large amounts of log data. The applicability of the CLC method is demonstrated with real-world examples from data communication networks.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, Washington, USA, pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)Google Scholar
  3. 3.
    Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 62–73. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Brachman, R.J., Anand, T.: The process of knowledge discovery in databases: A first sketch. In: Advances in Knowledge Discovery and Data Mining (July 1994)Google Scholar
  7. 7.
    Bykowski, A., Rigotti, C.: A condensedre presentation to find frequent patterns. In: PODS 2001, pp. 267–273. ACM Press, New York (2001)CrossRefGoogle Scholar
  8. 8.
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–83. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34 (1996)CrossRefGoogle Scholar
  10. 10.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–34. AAAI Press, Menlo Park (1996)Google Scholar
  11. 11.
    Kosala, R., Blockeel, H.: Web mining research: A survey. In: SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, vol. 2(1), pp. 1–15. ACM, New York (2000)Google Scholar
  12. 12.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closedit emset lattices. Information Systems 24(1), 25–46 (1999)CrossRefGoogle Scholar
  13. 13.
    Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: SIGMOD Workshop DMKD 2000, Dallas, USA (May 2000)Google Scholar
  14. 14.
    Scheffer, T.: Finding association rules that trade support optimally against confidence. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 424–435. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Sese, J., Morishita, S.: Answering the most correlated N association rules efficiently. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 410–422. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery andap plications of usage patterns from web data. SIGKDD Explorations 1(2), 12–23 (2000)CrossRefGoogle Scholar
  17. 17.
    Zaki, M.J.: Generating non-redundant association rules. In: SIGKDD 2000, Boston, USA, pp. 34–43. ACM Press, New York (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Kimmo Hätönen
    • 1
  • Jean François Boulicaut
    • 2
  • Mika Klemettinen
    • 1
  • Markus Miettinen
    • 1
  • Cyrille Masson
    • 2
  1. 1.Nokia GroupNokia Research CenterFinland
  2. 2.INSA de Lyon, LIRIS CNRS FRE 2672VilleurbanneFrance

Personalised recommendations