The current location resolves to /user/hive/warehouse/chars_formats_*.
Impala's test data actually lives at /test-warehouse/chars_formats_*.
Tested this by reloading data from scratch and running the core tests.
Change-Id: I781b484e7a15ccaa5de590563d68b3dca6a658e5
Reviewed-on: http://gerrit.cloudera.org:8080/11789
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.
Testing was carried out independently on parquet and non-parquet
scans:
1. Parquet scans were tested via the existing parquet-filtering
planner test. Additionally, a new test
[parquet-filtering-disabled] was added to ensure that the
explain plan generated skips parquet predicates based on the
query options.
2. Non-parquet scans were tested manually to ensure that the
functions to compute parquet conjucts were not invoked.
Additional test cases were added to the parquet-filtering
planner test to scan non parquet tables and ensure that the
plans do not contain conjuncts based on parquet statistics.
3. A parquet partition was added to the alltypesmixedformat
table in the functional database. Planner tests were added
to ensure that Parquet conjuncts are constructed only when
the Parquet partition is included in the query.
Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit builds on the previous work of
Pooja Nilangekar: https://gerrit.cloudera.org/#/c/7464/
The commit implements the write path of PARQUET-922:
"Add column indexes to parquet.thrift". As specified in the
parquet-format, Impala writes the page indexes just before
the footer. This allows much more efficient page filtering
than using the same information from the 'statistics' field
of DataPageHeader.
I updated Pooja's python tests as well.
Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Reviewed-on: http://gerrit.cloudera.org:8080/9693
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch contains the following changes:
- Add a metastore_snapshot_file parameter to build.sh
- Enable skipping loading the metadata.
- create-load-data.sh is refactored into functions.
- A lot of scripts source impala-config, which creates a lot of log spew. This has now
been muted.
- Unecessary log spew from compute-table-stats has been muted.
- build_thirdparty.sh determins its parallelism from the system, it was previously hard
coded to 4
- Only force load data of the particular dataset if a schema change is detected.
Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.
This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
Moved this out of the data loading framework because it is kind of a special
case. I will consider how we can update the framework to address mixed format
tables.
This change moves (almost) all the functional data loading to the new data
loading framework. This removes the need for the create.sql, load.sql, and
load-raw-data.sql file. Instead we just have the single schema template file:
testdata/datasets/functional/functional_schema_template.sql
This template can be used to generate the schema for all file formats and
compression variations. It also should help make loading data easier. Now you
can run:
bin/load-impala-data.sh "query-test" "exhaustive"
And get all data needed for running the query tests.
This change also includes the initial changes for new dataset/workload directory
structure. The new structure looks like:
testdata/workload <- Will contain query files and test vectors/dimensions
testdata/datasets <- WIll contain the data files and schema templates
Note: This is the first part of the change to this directory structure - it's
not yet complete. # Please enter the commit message for your changes. Lines starting
At the same time, this patch removes the partitionKeyRegex in favour
of explicitly sending a list of literal expressions for each file path
from the front end.