These tests functionally test whether the following type of files
are able to be scanned properly:
1) Add a parquet file with multiple blocks such that each node has to
scan multiple blocks.
2) Add a parquet file with multiple blocks but only one row group
that spans the entire file. Only one scan range should do any work
in this case.
Change-Id: I4faccd9ce3fad42402652c8f17d4e7aa3d593368
Reviewed-on: http://gerrit.cloudera.org:8080/1500
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Impala supports reading Parquet files with multiple row groups but
with possible performance degradation due to remote reads. This patch
maximizes scan locality by allowing multiple impalads to scan the
rowgroups in their local splits. Each impalad starts a new scan range
for each split local to it if that split contains row group(s) that
need to be scanned.
Change-Id: Iaecc5fb8e89364780bc59dbfa9ae51d0d124d16e
Reviewed-on: http://gerrit.cloudera.org:8080/908
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins