Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Semijoin

  • Kai-Uwe Sattler
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_706

Synonyms

Bit vector join; Bloom filter join; Bloom join; Hash filter join; Semijoin filter

Definition

Semijoin is a technique for processing a join between two tables that are stored at different sites. The basic idea is to reduce the transfer cost by first sending only the projected join column(s) to the other site, where it is joined with the second relation. Then, all matching tuples from the second relation are sent back to the first site to compute the final join result.

Historical Background

The semijoin technique was originally developed by Bernstein et al. [3] as part of the SDD-1 project as a reduction operator for distributed query processing. The idea of applying hash filtering was proposed by Babb [1] as well as by Valduriez [9] particularly for specialized hardware (content addressed file stores and distributed database machines respectively). The theory of semijoin-based distributed query processing was presented in [2]. In [10] semijoins are also exploited for query...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Babb E. Implementing a relational database by means of specialized hardware. ACM Trans Database Syst. 1979;4(1):1–29.CrossRefGoogle Scholar
  2. 2.
    Bernstein PA, Chiu D-MW. Using semi-joins to solve relational queries. J ACM. 1981;28(1):25–50.zbMATHCrossRefGoogle Scholar
  3. 3.
    Bernstein PA, Goodman N, Wong E, Reeve CL, Rothnie Jr. Query processing in a system for distributed databases (SDD-1). ACM Trans Database Syst. 1981;6(4):602–25.zbMATHCrossRefGoogle Scholar
  4. 4.
    Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970;13(7):422–6.zbMATHCrossRefGoogle Scholar
  5. 5.
    Hevner AR, Yao SB. Query processing in distributed database systems. IEEE Trans Softw Eng. 1979;5(3):177–82.zbMATHCrossRefGoogle Scholar
  6. 6.
    Lu H, Carey M. Some experimental results on distributed join algorithms in a local network. In: Proceedings of the 11th International Conference on Very Large Data Bases; 1985. p. 229–304.Google Scholar
  7. 7.
    Mackert L.F., Lohman G. R* optimizer validation and performance evaluation for local queries. In: Proceedings of the ACM SIGMOD International Conference on Management on Data; 1986. p. 4–95.CrossRefGoogle Scholar
  8. 8.
    Özsu MT, Valduriez P. Principles of distributed database systems. 2nd ed. Prentice-Hall; 1999.Google Scholar
  9. 9.
    Valduriez P. Semi-join algorithms for distributed database machines. In: Schneider J-J, editor. Distributed data bases. Amsterdam: North-Holland; 1982. p. 23–37.Google Scholar
  10. 10.
    Valduriez P, Gardarin G. Join and semi join algorithms for a multiprocessor database machine. ACM Trans Database Syst. 1984;9(1):133–61.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Technische Universität IlmenauIlmenauGermany

Section editors and affiliations

  • Kian-Lee Tan
    • 1
  1. 1.Department of Computer ScienceNational University of SingaporeSingaporeSingapore