Skip to main content

Joint Sequence Complexity: Introduction and Theory

  • Chapter
  • First Online:
Topic Detection and Classification in Social Networks
  • 553 Accesses

Abstract

In this chapter we study joint sequence complexity and we introduce its applications for topic detection and text classification, in particular source discrimination. The mathematical concept of the complexity of a sequence is defined as the number of distinct factors of it. The Joint Complexity is thus the number of distinct common factors of two sequences. Sequences containing many common parts have a higher Joint Complexity. The extraction of the factors of a sequence is done by suffix trees, which is a simple and fast (low complexity) method to store and retrieve them from the memory. Joint Complexity is used for evaluating the similarity between sequences generated by different sources and we will predict its performance over Markov sources. Markov models describe well the generation of natural text, and their performance can be predicted via linear algebra, combinatorics and asymptotic analysis. This analysis follows in this chapter. We exploit datasets from different natural languages, for both short and long sequences, with promising results on complexity and accuracy. We performed automated online sequence analysis on information streams in Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aiello LM, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris Y, Jaimes A (2013) Sensing trending topics in twitter. IEEE Trans Multimedia 15(6):1268–1282

    Article  Google Scholar 

  2. Becher V, Heiber PA (2011) A better complexity of finite sequences, abstracts of the 8th. In: International conference on computability, complexity, and randomness, p 7

    Google Scholar 

  3. Burnside G, Milioris D, Jacquet P (2014) One day in twitter: topic detection via joint complexity. In: Proceedings of SNOW 2014 data challenge (WWW’14), Seoul

    Google Scholar 

  4. Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: International conference on the analysis of algorithms (AofA), Barcelona, pp 95–104

    Google Scholar 

  5. Flajolet P, Sedgewick R (2008) Analytic combinatorics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  6. Flajolet P, Gourdon X, Dumas P (1995) Mellin transforms and asymptotics: harmonic sums. Theor Comput Sci 144(1–2):3–58

    Article  MathSciNet  MATH  Google Scholar 

  7. Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  8. Ilie L, Yu S, Zhang K (2002) Repetition complexity of words. In: Proceedings of COCOON, pp 320–329

    MATH  Google Scholar 

  9. Jacquet P (2007) Common words between two random strings. In: IEEE international symposium on information theory, pp 1495–1499

    Google Scholar 

  10. Jacquet P, Szpankowski W (1994) Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach. J Combin Theory Ser A 66:237–269

    MATH  Google Scholar 

  11. Jacquet P, Szpankowski W (1998) Analytical depoissonization and its applications. Theor Comput Sci 201:1–62

    Article  MATH  Google Scholar 

  12. Jacquet P, Szpankowski W (2012) Joint string complexity for Markov sources. In: 23rd international meeting on probabilistic, combinatorial and asymptotic methods for the analysis of algorithms, vol 12, pp 303–322

    Google Scholar 

  13. Jacquet P, Szpankowski W (2015) Analytic pattern matching: from DNA to twitter. Cambridge University Press, Cambridge

    Book  Google Scholar 

  14. Jacquet P, Szpankowski W, Tang J (2001) Average profile of the Lempel-Ziv parsing scheme for a Markovian source. Algorithmica 31(3):318–360

    Article  MathSciNet  MATH  Google Scholar 

  15. Jacquet P, Milioris D, Szpankowski W (2013) Classification of Markov sources through joint string complexity: theory and experiments. In: IEEE international symposium on information theory (ISIT), Istanbul

    Google Scholar 

  16. Janson S, Lonardi S, Szpankowski W (2004) On average sequence complexity. Theor Comput Sci 326:213–227

    Article  MathSciNet  MATH  Google Scholar 

  17. Li M, Vitanyi P (1993) Introduction to Kolmogorov Complexity and its Applications. Springer, Berlin

    Book  MATH  Google Scholar 

  18. Milioris D, Jacquet P (2013) Method and device for classifying a message. European Patent No. 13306222.4

    Google Scholar 

  19. Milioris D, Jacquet P (2014) Joint sequence complexity analysis: application to social networks information flow. Bell Labs Tech J 18(4):75–88

    Article  Google Scholar 

  20. Neininger R, Rüschendorf L (2004) A general limit theorem for recursive algorithms and combinatorial structures. Ann Appl Probab 14(1):378–418

    Article  MathSciNet  MATH  Google Scholar 

  21. Niederreiter H (1999) Some computable complexity measures for binary sequences. In: Ding C, Hellseth T, Niederreiter H (eds) Sequences and their applications. Springer, Berlin, pp 67–78

    Chapter  Google Scholar 

  22. Nilsson S, Tikkanen M (1998) Implementing a dynamic compressed Trie. In: Proceedings of 2nd workshop on algorithm engineering

    Google Scholar 

  23. Papadopoulos S, Corney D, Aiello L (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: Proceedings of the SNOW 2014 data challenge

    Google Scholar 

  24. Szpankowski W (2001) Analysis of algorithms on sequences. Wiley, New York

    Book  MATH  Google Scholar 

  25. Tata S, Hankins R, Patel J (2004) Practical suffix tree construction. In: 30th VLDB conference, vol 30

    Google Scholar 

  26. Ziv J (1988) On classification with empirically observed statistics and universal data compression. IEEE Trans Inform Theory 34:278–286

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Milioris, D. (2018). Joint Sequence Complexity: Introduction and Theory. In: Topic Detection and Classification in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-66414-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66414-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66413-2

  • Online ISBN: 978-3-319-66414-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics