Skip to main content
Book cover

Data Mining pp 229–243Cite as

A Multi-level Framework for the Analysis of Sequential Data

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

Abstract

Traditionally text mining has had a strong link with information retrieval and classification and has largely aimed to classify documents according to embedded knowledge. Association rule mining and sequence mining, on the other hand, have had a different goal; one of eliciting relationships within or about the data being mined. Recently there has been research conducted using sequence mining techniques on digital document collections by treating the text as sequential data.

In this paper we propose a multi-level framework that is applicable to text analysis and that improves the knowledge discovery process by finding additional or hitherto unknown relationships within the data being mined. We believe that this can lead to the detection or fine tuning of the context of documents under consideration and may lead to a more informed classification of those documents. Moreover, since we use a semantic map at varying stages in the framework, we are able to impose a greater degree of focus and therefore a greater transitivity of semantic relatedness that facilitates the improvement in the knowledge discovery process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.I.: Applying data mining techniques in text analysis. Tech Report C-1997-23, University of Helsinki, Department of Computer Science (1997)

    Google Scholar 

  2. Mannila, H., Toivonen, H.: Discovering generalized episodes using minimal occurrences. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, pp. 146–151. AAAI Press, Menlo Park (1996)

    Google Scholar 

  3. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997)

    Article  Google Scholar 

  4. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.I.: Applying data mining techniques for descriptive phrase extraction in digital document collections. In: Proceedings of the Advances in Digital Libraries Conference, p. 2. IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  5. Besançon, R.a.: Text mining — knowledge extraction from unstructured textual data. In: 6th Conference of International Federation of Classification Societies (IFCS 1998), Rome (1998)

    Google Scholar 

  6. Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Computing Surveys 12, 381–402 (1980)

    Article  MathSciNet  Google Scholar 

  7. Aho, A.: Algorithms for finding patterns in strings. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science. Algorithms and Complexity, vol. A. Elsevier, Amsterdam (1990)

    Google Scholar 

  8. Breslauer, D., Ga̧sieniec, L.: Efficient string matching on coded texts. In: Galil, Z., Ukkonen, E. (eds.) Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, Espoo, Finland, pp. 27–40. Springer, Berlin (1995)

    Google Scholar 

  9. Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of the Eighth annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, United States. Society for Industrial and Applied Mathematics, pp. 360–369 (1997)

    Google Scholar 

  10. Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM Journal on Computing 27, 557–582 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Sankoff, D., Kruskal, J.B.: Time warps, string edits, and macromolecules / The theory and practice of sequence comparison. Reissue ed. edn. David Hume series. Center for the Study of Language and Information, Stanford (1999)

    Google Scholar 

  12. Chan, S., Kao, B., Yip, C.L., Tang, M.: Mining emerging substrings. Tech. Report TR-2002-11, HKU CSIS (2002)

    Google Scholar 

  13. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proceedings of the Eleventh annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, United States. Society for Industrial and Applied Mathematics, pp. 794–803 (2000)

    Google Scholar 

  14. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM (JACM) 21, 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  15. Tichy, W.F.: The string-to-string correction problem with block moves. ACM Transactions on Computer Systems (TOCS) 2, 309–321 (1984)

    Article  Google Scholar 

  16. Bunke, H., Csirik, J.: Edit distance of run-length coded strings. In: Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing, Kansas City, Missouri, United States, pp. 137–143. ACM Press, New York (1992)

    Google Scholar 

  17. Oommen, B.J., Loke, R.K.S.: Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1154–1159 (1995)

    Google Scholar 

  18. Oommen, B.J., Zhang, K.: The normalized string editing problem revisited. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 669–672 (1996)

    Article  Google Scholar 

  19. Cole, R., Hariharan, R.: Approximate string matching: a simpler faster algorithm. In: Proceedings of the Ninth annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, United States. Society for Industrial and Applied Mathematics, pp. 463–472 (1998)

    Google Scholar 

  20. Arslan, A.N., Egecioglu, O.: An efficient uniform-cost normalized edit distance algorithm. In: 6th Symposium on String Processing and Information Retrieval (SPIRE 1999), pp. 8–15. IEEE Comp. Soc, Los Alamitos (1999)

    Google Scholar 

  21. Arslan, A.N., Egecioglu, O.: Efficient algorithms for normalized edit distance. Journal of Discrete Algorithms 1, 3–20 (2000)

    MathSciNet  Google Scholar 

  22. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: Proceedings of the Thirteenth annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California. Society for Industrial and Applied Mathematics, pp. 667–676 (2002)

    Google Scholar 

  23. Batu, T., Ergün, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: Proceedings of the Thirty-Fifth ACM Symposium on Theory of Computing, pp. 316–324. ACM Press, San Diego (2003)

    Chapter  Google Scholar 

  24. Hyyrö, H.: A bit-vector algorithm for computing levenshtein and damerau edit distances. Nordic Journal of Computing 10, 29–39 (2003)

    MATH  MathSciNet  Google Scholar 

  25. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33, 31–88 (2001)

    Article  Google Scholar 

  26. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Computational Linguistics 8, 627–633 (1965)

    Google Scholar 

  27. Miller, G.A., Chalres, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6, 1–28 (1991)

    Article  Google Scholar 

  28. Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17, 21–48 (1991)

    Google Scholar 

  29. Okumura, M., Honda, T.: Word sense disambiguation and text segmentation based on lexical cohesion. In: 15th Conference on Computational Linguistics, Kyoto, Japan, vol. 2, pp. 755–761 (1994)

    Google Scholar 

  30. Kedad, Z., Métais, E.: Dealing with semantic heterogeneity during data integration. In: Akoka, J., Bouzeghoub, M., Comyn-Wattiau, I., Métais, E. (eds.) ER 1999. LNCS, vol. 1728, pp. 325–339. Springer, Heidelberg (1999)

    Google Scholar 

  31. Kozima, H.: Text segmentation based on similarity between words. In: 31st Annual Meeting of the Association for Computational Linguistics, pp. 286–288 (1993)

    Google Scholar 

  32. Kozima, H., Furugori, T.: Similarity between words computed by spreading activation on an english dictionary. In: 6th Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, Netherlands, pp. 232–239 (1993)

    Google Scholar 

  33. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  34. Rada, R., Bicknell, H.: Ranking documents with a thesaurus. Journal of the American Society for Information Science (JASIS) 40, 304–310 (1989)

    Article  Google Scholar 

  35. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)

    Google Scholar 

  36. Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Second International Conference on Information and Knowledge Management, Arlington, VA, USA, pp. 67–74 (1993)

    Google Scholar 

  37. Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)

    Google Scholar 

  38. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: 14th International Joint Conference on Artificial Intelligence, Montreal, pp. 448–453 (1995)

    Google Scholar 

  39. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  40. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference on Research in Computational Linguistics, Taiwan, pp. 19–33 (1997)

    Google Scholar 

  41. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  42. Richardson, R., Smeaton, A., Murphy, J.: Using wordnet as a knowledge base for measuring semantic similarity between words. Technical Report Working Paper CA-1294, School of Computer Applications, Dublin City University (1994)

    Google Scholar 

  43. Spanoudakis, G., Constantopoulos, P.: Similarity for analogical software reuse: A computational model. In: 11th European Conference on Artificial Intelligence (ECAI 1994), Amsterdam, The Netherlands, pp. 18–22 (1994)

    Google Scholar 

  44. Spanoudakis, G., Constantopoulos, P.: Elaborating analogies from conceptual models. International Journal of Intelligent Systems 11, 917–974 (1996)

    Article  Google Scholar 

  45. Weinstein, P., Birmingham, W.: Agent communication with differentiated ontologies: eight new measures of description compatibility. Technical report, Department of Electrical Engineering and Computer Science, University of Michigan (1999)

    Google Scholar 

  46. Miller, R., Yang, Y.: Association rules over interval data. In: Peckham, J. (ed.) ACM SIGMOD Conference on the Management of Data, Tucson, Arizona, USA, pp. 452–461. ACM Press, New York (1997)

    Google Scholar 

  47. Rodríguez, M.A., Egenhofer, M.J.: Putting similarity assessment into context: Matching-distance with the user’s intended operations. In: Bouquet, P., Serafini, L., Brézillon, P., Benercetti, M., Castellani, F. (eds.) CONTEXT 1999. LNCS (LNAI), vol. 1688, pp. 310–323. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  48. Rodríguez, M., Egenhofer, M., Rugg, R.: Assessing semantic similarities among geospatial feature class definitions. In: Včkovski, A., Brassel, K.E., Schek, H.-J. (eds.) INTEROP 1999. LNCS, vol. 1580, pp. 189–202. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  49. Roddick, J.F., Hornsby, K., De Vries, D.: A unifying semantic distance model for determining the similarity of attribute values. In: Oudshoorn, M. (ed.) 26th Australasian Computer Science Conference (ACSC 2003), Adelaide, Australia, ACS, vol. 16, pp. 111–118 (2003)

    Google Scholar 

  50. Mooney, C.H., Roddick, J.F.: Mining relationships between interacting episodes. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D. (eds.) Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida. SIAM, Philadelphia (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mooney, C.H., de Vries, D., Roddick, J.F. (2006). A Multi-level Framework for the Analysis of Sequential Data. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_18

Download citation

  • DOI: https://doi.org/10.1007/11677437_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32547-5

  • Online ISBN: 978-3-540-32548-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics