Skip to main content

Finding a Duplicate and a Missing Item in a Stream

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4484))

Abstract

We consider the following problem in a stream model: Given a sequence \(a= \langle a_1,a_2,\cdot,a_m \langle\) wich each a i  ∈ [n] = {1,...,n} and m > n, find a duplicate in the sequence, i.e., find some d = a i  = a l with i ≠ l by using limited s bits of memory and r passes over the input sequence. In one pass an algorithm reads the input sequence a in the order a 1, a 2, ..., a m . Since m > n, a duplicate exists by the pigeonhole principle. Muthukrishnan [Mu05a], [Mu05b] has posed the following question for the case where m = n + 1: For s = O(logn), is there a solution with a constant number of passes? We have described the problem generalizing Muthukrishnan’s question by taking the sequence length m as a parameter. We give a negative answer to the original question by showing the following: Assume that m = n + 1. A streaming algorithm with O(logn) space requires Ω(logn/loglogn) passes; a k-pass streaming algorithm requires Ω(n 1/(2k − 1)) space. We also consider the following problem of finding a missing item: Assuming that n < m, find x ∈ [m] such that x ≠ a j for 1 ≤ j ≤ n. The same lower bound applies for the missing-item finding problem. The proof is a simple reduction to the communication complexity of a relation. We also consider one-pass algorithms and exactly determine the minimum space required. Interesting open questions such as the following remain. For the number of passes of algorithms using O(logn) space, show an ω(1) lower bound (or an O(1) upper bound) for: (1) duplicate finding for m = 2n, (2) missing-item finding for m = 2n, and (3) the case where we allow Las-Vegas type randomization for m = n + 1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ajtai, M., Ben-Or, M.: A Theorem on Probabilistic Constant Depth Circuits. In: Proc. of STOC84, pp. 471–474 (1984)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proc. of STOC97, pp. 20–29 (1996)

    Google Scholar 

  3. Boppana, R.: Threshold Functions and Bounded Depth Monotone Circuits. Journal of Computer and System Sciences 32, 222–229 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boppana, R., Sipser, M.: The Complexity of Finite Functions. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, MIT Press, Cambridge (1990)

    Google Scholar 

  5. Borodin, A., Cook, S.: A Time-Space Trade-Off for Sorting on a General Sequential Model of Computation. SIAM Journal on Computing 11, 287–297 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  6. Håstad, J.: Computational Limits for Small-Depth Circuits. MIT Press, Cambridge (1987)

    Google Scholar 

  7. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  8. Karchmer, M., Wigderson, A.: Monotone Circuits for Connectivity Requires Super-Logarithmic Depth. SIAM Journal on Discrete Mathematics 5(4), 545–557 (1992)

    Article  MathSciNet  Google Scholar 

  9. Muthukrishnan, S.: talk given at the Kyoto Workshop on New Horizons in Computing (March 2005)

    Google Scholar 

  10. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005), a preliminary version available at http://www.cs.rutgers.edu/~muthu

  11. Razborov, A., Wigderson, A., Yao, A.: Read-Once Branching Programs, Rectangular Proofs of the Pigeon-Hole Principle and the Transversal Calculus. Combinatorica 22(4), 555–574 (2002), also in Proc. of STOC97, pp. 739–748 (1997)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jin-Yi Cai S. Barry Cooper Hong Zhu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tarui, J. (2007). Finding a Duplicate and a Missing Item in a Stream. In: Cai, JY., Cooper, S.B., Zhu, H. (eds) Theory and Applications of Models of Computation. TAMC 2007. Lecture Notes in Computer Science, vol 4484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72504-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72504-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72503-9

  • Online ISBN: 978-3-540-72504-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics