Commit Graph

12 Commits

Author SHA1 Message Date
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
bf85225911 IMPALA-881: Tests for joins with union inputs.
Change-Id: I4be6821ac3938345ca95c542d868c87512ff66da
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3229
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-06-23 15:38:06 -07:00
Dimitris Tsirogiannis
2d7a8b7c70 IMPALA-964: Full outer join on values() followed by group by hits a
preconditions check

This commit fixes IMPALA-964 where full outer join between two inline
views followed by a group by (e.g. select 1 FROM (VALUES(1 x, 1 y)) a
FULL OUTER JOIN (VALUES(1 x, 1 y)) b ON (a.x = b.y) GROUP BY a.x;)
hits a preconditions check. This check evaluates if the numNodes
(number of nodes for the purpose of resource estimation) variable
is greater or equal to zero and is triggered when we try to compute
the resource estimates (number of distinct values) of a plan fragment.

The following changes are included in this commit:
1. Modified the getNumDistinctValues function in PlanFragment class to
consider the special case where the numNodes of a plan fragment is -1.
2. Added a test case in QueryTest/joins.test.

Change-Id: I2962ed5079e174d0e76ad990ab84e1fb1a4607ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2466
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2514
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-05-11 19:30:38 -07:00
Alex Behm
dd0409e9d6 IMPALA-509: Minimal type promotion for arithmetic exprs.
Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:30 -08:00
Matthew Jacobs
93368e20b1 Fix CROSS JOIN handling in join order optimization and add tests
Cross joins should be handled like outer joins in the join order
optimization in that the right table referenced by a cross join may not
be reordered anywhere before tables referenced to the left of the cross
join. If there are inner joins to the right of the cross join, those
tables may be reordered before the cross join.

E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A
and B, but D may be reordered to come before C.

Also adds test cases for join order optimization and predicate propagation.

Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:29 -08:00
Matthew Jacobs
f327431a8e IMPALA-171: Add CROSS JOIN
Adds a CROSS JOIN (cartesian product). Common join code is moved from to
a new abstract base class BlockingJoinNode. We must keep all build RowBatches in
memory in order to iterate over them for every row from the left child. The
TupleRowList provides a convenient way to iterate over all of the rows.
A future change will address codegen for the CrossJoinNode.

Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/970
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Alan Choi
595edaa9d1 Disable all string to numeric and boolean implicit cast 2014-01-08 10:46:24 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00