mirror of
https://github.com/apache/impala.git
synced 2026-01-04 18:00:57 -05:00
parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins
This directory contains Impala test data sets. The directory layout is structured as follows: datasets/ <data set>/<data set>_schema_template.sql <data set>/<data files SF1>/data files <data set>/<data files SF2>/data files Where SF is the scale factor controlling data size. This allows for scaling the same schema to different sizes based on the target test environment.