The hash join and tuple stream code was not handling correctly the
case of joins whose right side had very high cardinality but where
tuple had zero footprint. Any such join with more than 16M tuples
on the right side would crash. In particular, if the tuple footprint
is zero, an infinite number of rows fit in one block. But according to
the old way we were iterating over the rows of the stream, we would
increment by 1 the idx to get the next "row" eventually overflowing
and hitting dcheck.
Another, second, problem was the calculation of the size of the hash
table in such where the footprint of tuples is zero. In such case,
a hash table of minimum size would suffice. Instead we would try to
create a very large hash table to fit the large number of tuples,
resulting to OOM errors.
This patch fixes the two problems by having specific calculation of
the next idx in the stream as well as the size of the hash table in
case the stream contains tuples with zero footprint.
Change-Id: I12469b9c63581fcbc78c87200de7797eac3428c9
Reviewed-on: http://gerrit.cloudera.org:8080/811
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins