Commit Graph

6 Commits

Author SHA1 Message Date
Alex Behm
4ad15bb2be IMPALA-1524: Materialize all tuples produced by an EmptySetNode.
Change-Id: I3b151ace464c67634104f84f7223c948fed8909e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5406
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c2959485a066b5c0b40e8b0790d526726236d0c9)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5409
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-11-25 23:21:02 -08:00
Matthew Jacobs
652d4b4699 IMPALA-1234: Fix bugs when producing EmptySetNode
Fixes two issues that can occur when generating the plan for a
stmt with an empty result set (e.g. due to limit 0 or constant
predicates that evaluate to false):
 1) Unions with an inline view that produces an empty result set
    does not create the EmptySetNode for the correct stmt.
 2) An EmptySetNode may contain non-materialized tuples which
    will fail a precondition check when generating the thrift
    plan.

Change-Id: I1511c755be3a59fdb8934624fd08250323266d27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4744
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-10-06 19:49:50 -07:00
Alex Behm
3111827ae2 IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode.
Before: Constant conjuncts used to be registered in the analyzer together with
non-constant conjuncts. Since constant conjuncts are not bound by any slot or
tuple they were incorrectly placed into whatever plan node called init() first
and then were incorrectly marked as assigned. For handling queries with a
limit 0 we had special code in the BE.

After: Since constant conjuncts do not fit well into the existing slot/tuple
based assignment logic this patch treats them specially as follows. Constant
that do not originate from the ON clause of an outer join are evaluated
directly. Depending on which clause the conjunct came from either the entire
query block is marked as returning an empty set (HAVING clause) or the block
is marked as having an empty select-project-join portion (ON and WHERE clause).
In the latter case, aggregations (if any) must still be performed.
The plan sub-trees that are guaranteed to return an empty result set are
implemented by an EmptySetNode. Constant conjuncts from the ON clause of an
outer are assigned to the node implementing the join.

Similarly, query blocks with a limit 0 are marked as returning an empty result,
and planned as an EmptySetNode.

As a side effect, this patch also fixes:
IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0
consistent with Hive's. The target table is left empty after
such an operation.

Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800
2014-08-08 04:35:31 -07:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00