Joint Sequence Complexity: Introduction and Theory

Milioris, Dimitrios

doi:10.1007/978-3-319-66414-9_3

Dimitrios Milioris²

553 Accesses

Abstract

In this chapter we study joint sequence complexity and we introduce its applications for topic detection and text classification, in particular source discrimination. The mathematical concept of the complexity of a sequence is defined as the number of distinct factors of it. The Joint Complexity is thus the number of distinct common factors of two sequences. Sequences containing many common parts have a higher Joint Complexity. The extraction of the factors of a sequence is done by suffix trees, which is a simple and fast (low complexity) method to store and retrieve them from the memory. Joint Complexity is used for evaluating the similarity between sequences generated by different sources and we will predict its performance over Markov sources. Markov models describe well the generation of natural text, and their performance can be predicted via linear algebra, combinatorics and asymptotic analysis. This analysis follows in this chapter. We exploit datasets from different natural languages, for both short and long sequences, with promising results on complexity and accuracy. We performed automated online sequence analysis on information streams in Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aiello LM, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris Y, Jaimes A (2013) Sensing trending topics in twitter. IEEE Trans Multimedia 15(6):1268–1282
Article Google Scholar
Becher V, Heiber PA (2011) A better complexity of finite sequences, abstracts of the 8th. In: International conference on computability, complexity, and randomness, p 7
Google Scholar
Burnside G, Milioris D, Jacquet P (2014) One day in twitter: topic detection via joint complexity. In: Proceedings of SNOW 2014 data challenge (WWW’14), Seoul
Google Scholar
Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: International conference on the analysis of algorithms (AofA), Barcelona, pp 95–104
Google Scholar
Flajolet P, Sedgewick R (2008) Analytic combinatorics. Cambridge University Press, Cambridge
MATH Google Scholar
Flajolet P, Gourdon X, Dumas P (1995) Mellin transforms and asymptotics: harmonic sums. Theor Comput Sci 144(1–2):3–58
Article MathSciNet MATH Google Scholar
Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Ilie L, Yu S, Zhang K (2002) Repetition complexity of words. In: Proceedings of COCOON, pp 320–329
MATH Google Scholar
Jacquet P (2007) Common words between two random strings. In: IEEE international symposium on information theory, pp 1495–1499
Google Scholar
Jacquet P, Szpankowski W (1994) Autocorrelation on words and its applications. Analysis of suffix trees by string-ruler approach. J Combin Theory Ser A 66:237–269
MATH Google Scholar
Jacquet P, Szpankowski W (1998) Analytical depoissonization and its applications. Theor Comput Sci 201:1–62
Article MATH Google Scholar
Jacquet P, Szpankowski W (2012) Joint string complexity for Markov sources. In: 23rd international meeting on probabilistic, combinatorial and asymptotic methods for the analysis of algorithms, vol 12, pp 303–322
Google Scholar
Jacquet P, Szpankowski W (2015) Analytic pattern matching: from DNA to twitter. Cambridge University Press, Cambridge
Book Google Scholar
Jacquet P, Szpankowski W, Tang J (2001) Average profile of the Lempel-Ziv parsing scheme for a Markovian source. Algorithmica 31(3):318–360
Article MathSciNet MATH Google Scholar
Jacquet P, Milioris D, Szpankowski W (2013) Classification of Markov sources through joint string complexity: theory and experiments. In: IEEE international symposium on information theory (ISIT), Istanbul
Google Scholar
Janson S, Lonardi S, Szpankowski W (2004) On average sequence complexity. Theor Comput Sci 326:213–227
Article MathSciNet MATH Google Scholar
Li M, Vitanyi P (1993) Introduction to Kolmogorov Complexity and its Applications. Springer, Berlin
Book MATH Google Scholar
Milioris D, Jacquet P (2013) Method and device for classifying a message. European Patent No. 13306222.4
Google Scholar
Milioris D, Jacquet P (2014) Joint sequence complexity analysis: application to social networks information flow. Bell Labs Tech J 18(4):75–88
Article Google Scholar
Neininger R, Rüschendorf L (2004) A general limit theorem for recursive algorithms and combinatorial structures. Ann Appl Probab 14(1):378–418
Article MathSciNet MATH Google Scholar
Niederreiter H (1999) Some computable complexity measures for binary sequences. In: Ding C, Hellseth T, Niederreiter H (eds) Sequences and their applications. Springer, Berlin, pp 67–78
Chapter Google Scholar
Nilsson S, Tikkanen M (1998) Implementing a dynamic compressed Trie. In: Proceedings of 2nd workshop on algorithm engineering
Google Scholar
Papadopoulos S, Corney D, Aiello L (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: Proceedings of the SNOW 2014 data challenge
Google Scholar
Szpankowski W (2001) Analysis of algorithms on sequences. Wiley, New York
Book MATH Google Scholar
Tata S, Hankins R, Patel J (2004) Practical suffix tree construction. In: 30th VLDB conference, vol 30
Google Scholar
Ziv J (1988) On classification with empirically observed statistics and universal data compression. IEEE Trans Inform Theory 34:278–286
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA
Dimitrios Milioris

Authors

Dimitrios Milioris
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Milioris, D. (2018). Joint Sequence Complexity: Introduction and Theory. In: Topic Detection and Classification in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-66414-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-66414-9_3
Published: 06 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66413-2
Online ISBN: 978-3-319-66414-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics