Join算法

Nest loop join

nest loop join, 遍历取外表R中一条记录r, 然后遍历inner表S每条记录和r做join。 对于外表中的每一条记录,都需要对Inner表做一次全表扫描。IO比较高

algorithm nested_loop_join is
    for each tuple r in R do
        for each tuple s in S do
            if r and s satisfy the join condition then
                yield tuple <r,s>

Block nest loop join

Block Nest Loop Join是对NestLoop Join的一个优化

for each block Br of r do begin
  for each block Bs of s do begin
    for each tuple tr in Br do begin
      for each tuple ts in Bs do begin
        test pair (tr, ts) to see if they satisfy the join condition
          if they do, add tr ⋅ ts to the result;
      end
    end
  end
end

Indexed Nested loop join

index join inner表中对于要join的attribute由了索引, 可以使用索引 来避免对inner表的全表扫描, 复杂度为O(M * log N)

for each tuple r in R do
    for each tuple s in S in the index lookup do
        yield tuple <r,s>

Hash join

/* Partition s */
for each tuple ts in s do begin
  i := h(ts[JoinAttrs]);
  Hsi := Hsi ∪ {ts};
end

/* Partition r */
for each tuple tr in r do begin
  i := h(tr[JoinAttrs]);
  Hri := Hri ∪ {tr};
end

/* Perform join on each partition */
for i := 0 to nh do begin
  read Hsi and build an in-memory hash index on it;
  for each tuple tr in Hri do begin
    probe the hash index on Hsi to locate all tuples ts
    such that ts[JoinAttrs] = tr[JoinAttrs];
    for each matching tuple ts in Hsi do begin
      add tr ⋈ ts to the result;
    end
  end
end

Sort MergeJoin

function sortMerge(relation left, relation right, attribute a)
    var relation output
    var list left_sorted := sort(left, a) // Relation left sorted on attribute a
    var list right_sorted := sort(right, a)
    var attribute left_key, right_key
    var set left_subset, right_subset // These sets discarded except where join predicate is satisfied
    advance(left_subset, left_sorted, left_key, a)
    advance(right_subset, right_sorted, right_key, a)
    while not empty(left_subset) and not empty(right_subset)
        if left_key = right_key // Join predicate satisfied
            add cartesian product of left_subset and right_subset to output
            advance(left_subset, left_sorted, left_key, a)
            advance(right_subset, right_sorted,right_key, a)
        else if left_key < right_key
            advance(left_subset, left_sorted, left_key, a)
        else // left_key > right_key
            advance(right_subset, right_sorted, right_key, a)
    return output

// Remove tuples from sorted to subset until the sorted[1].a value changes
function advance(subset out, sorted inout, key out, a in)
    key := sorted[1].a
    subset := emptySet
    while not empty(sorted) and sorted[1].a = key
        insert sorted[1] into subset
        remove sorted[1]