Commit Graph

14 Commits

Author SHA1 Message Date
Henry Robinson
7d2c47ad72 IMPALA-258: Make partition key string encoding Hive-compatible 2014-01-08 10:49:54 -08:00
Alex Behm
5db3f2cdf5 IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive. 2014-01-08 10:49:48 -08:00
Lenni Kuff
c74b7e41dd Enable insert tests to run against parquet 2014-01-08 10:49:47 -08:00
Lenni Kuff
5f81becd84 Create tables used by insert tests in a supported insert format 2014-01-08 10:49:00 -08:00
ishaan
5138a720bb IMP-768: Enable the python test framework to check for insert results. 2014-01-08 10:48:22 -08:00
Henry Robinson
222d15c6ca IMPALA-72: String partition keys should be URL encoded 2014-01-08 10:48:20 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
d2e4776731 Support passing snapshot file to buildall, add script to run all tests, remove old tests 2014-01-08 10:47:59 -08:00
Lenni Kuff
bed633c1ae Extract config/metastore creation from buildall + script for loading warehouse snapshot 2014-01-08 10:46:53 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Henry Robinson
e7348a209b IMP-232: Parallel INSERT OVERWRITE 2014-01-08 10:45:04 -08:00
Henry Robinson
c472213eeb Parallel INSERT, sink-per-scan-node plan 2014-01-08 10:44:35 -08:00
Alan Choi
f15ef994fb "mvn test" now uses impalad and beeswax api to submit query and fetch, including
insert query.

review issue: 260
2014-01-08 10:44:30 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00