Commit Graph

14 Commits

Author SHA1 Message Date
Alex Behm
0687c54792 IMPALA-2894: Move regression test into a different .test file.
We cannot run certain nested types queries with the legacy joins/aggs,
so to fix a build I just moved a recently added test into a different
.test file that already does not run with legacy joins/agggs.

Change-Id: I0ec0e61535ad01333129bd49beca4aa481f04d74
Reviewed-on: http://gerrit.cloudera.org:8080/1918
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2016-01-27 20:41:45 +00:00
Alex Behm
95951a36e8 IMPALA-2539: Unmark collections slots of empty union operands.
Change-Id: I401f9b9a5e5457120600a7cb5b54f84adb8477f7
Reviewed-on: http://gerrit.cloudera.org:8080/1895
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-01-26 11:32:40 +00:00
Dan Hecht
86d0780b07 Move IMPALA-2259 test case to nested-types-runtime.test
These tests are failing in the old agg/join nightly.  This test
references a nested types table, so we need to skip it when the old
agg/join is enabled.  So, move it to a place where this already happens.

Change-Id: I09400760fd0b7506e4c127bbef92ab413d5d8615
Reviewed-on: http://gerrit.cloudera.org:8080/1143
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-10-05 11:30:46 -07:00
Alex Behm
6866c3045b IMPALA-2430: Mark unused collection-typed slots as non-materialized.
The bug: During planning, when generating an EmptySetNode for a query
block (or a portion thereof) that contained relative table refs we still
populated the corresponding collection-typed slots in the parent scan,
ultimately hitting a sanity DCHECK in the BE.

The fix: Since those collection-typed slots are never used, the
corresponding parent scan should not materilaize them. When creating
an EmptySetNode we mark the appropriate collection-typed slots
as non-materialized.

Change-Id: If0b9c37c46c0e27be7f1b47c395c8c90b499e323
Reviewed-on: http://gerrit.cloudera.org:8080/1092
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:43 -07:00
Tim Armstrong
75887730cb IMPALA-2233: avoid loss of precision in function arguments
This patch changes the resolution of overloaded functions so that we
prefer functions where there is no loss of precision in argument types.
Previously, the logic would happily convert DECIMAL to FLOAT even if
there was a more suitable overload available.  E.g. greatest(TINYINT,
DECIMAL) was resolved to greatest(FLOAT...) instead of greatest(DECIMAL).
This only changes behaviour when no overload exactly matches the argument
types, but the arguments can be converted with no loss of precision,
e.g. TINYINT to DECIMAL.

This patch introduces a conceptual distinction between strict and
non-strict compatibility. All contexts aside from function matching
use non-strict to support the current behavior of implicitly casting
decimals to floats/doubles.

This patch also makes resolution of overloaded functions consistent
regardless of what order functions were added to the Db - overloads are
checked in a canonical order.

Switching to this canonical order revealed further problems with
overload resolution where the correct overload was selected only because
of the order in which it was added to the database. For example, the
logic equally preferred resolving fn(STRING, TINYINT) to
fn(TIMESTAMP, INT) or fn(STRING, INT). This required changes to the
compatibility matrix.

Various cleanup and simplification of the type compatibility logic is
also included.

Change-Id: I50e657c78cdcb925b616b5b088b801510020e255
Reviewed-on: http://gerrit.cloudera.org:8080/845
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:40 -07:00
Tim Armstrong
d37bf390a8 IMPALA-2406: avoid rows with no tuples
In some cases the planner generated plans with rows with no
materialized tuples. Recent changes to the backend caused these to
hit a DCHECK. This patch addresses one case in the planner where it
was possible to create such plans: when the planner generated an
empty node from a select subquery with no from clause. The fix is to
create a materialized tuple based on the select list expressions, in
the same way as we handle these selects when the planner cannot
statically determine they have no result rows.

An example query is included as a test.

It also adds additional checks to the frontend and backend to catch
these invalid rows earlier.

Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4
Reviewed-on: http://gerrit.cloudera.org:8080/911
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:25 -07:00
Alex Behm
3ac341287c IMPALA-2088: Fix planning of empty union operands with analytics.
The check for ignoring empty union operands was simply misplaced.
This misplacement resulted in empty union operands not being
dropped if the containing UnionStmt had analytic functions.

Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3
Reviewed-on: http://gerrit.cloudera.org:8080/554
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-27 15:46:41 -07:00
Dimitris Tsirogiannis
dd5ecb9deb IMPALA-1960: Illegal reference to non-materialized tuple when query has
an empty select-project-join block

This commit fixes an issue where an aggregation expr may reference a
non-materialized slot if the query contains an empty select-project-join
block. This fix ensures that all the exprs in an aggregation reference
materialized slots/tuples.

Change-Id: Ic2cc9818061b3f06ab1d1cebf4e604352c2df6d1
Reviewed-on: http://gerrit.cloudera.org:8080/348
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-21 23:29:14 +00:00
Alex Behm
4ad15bb2be IMPALA-1524: Materialize all tuples produced by an EmptySetNode.
Change-Id: I3b151ace464c67634104f84f7223c948fed8909e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5406
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c2959485a066b5c0b40e8b0790d526726236d0c9)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5409
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-11-25 23:21:02 -08:00
Matthew Jacobs
652d4b4699 IMPALA-1234: Fix bugs when producing EmptySetNode
Fixes two issues that can occur when generating the plan for a
stmt with an empty result set (e.g. due to limit 0 or constant
predicates that evaluate to false):
 1) Unions with an inline view that produces an empty result set
    does not create the EmptySetNode for the correct stmt.
 2) An EmptySetNode may contain non-materialized tuples which
    will fail a precondition check when generating the thrift
    plan.

Change-Id: I1511c755be3a59fdb8934624fd08250323266d27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4744
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-10-06 19:49:50 -07:00
Alex Behm
3111827ae2 IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode.
Before: Constant conjuncts used to be registered in the analyzer together with
non-constant conjuncts. Since constant conjuncts are not bound by any slot or
tuple they were incorrectly placed into whatever plan node called init() first
and then were incorrectly marked as assigned. For handling queries with a
limit 0 we had special code in the BE.

After: Since constant conjuncts do not fit well into the existing slot/tuple
based assignment logic this patch treats them specially as follows. Constant
that do not originate from the ON clause of an outer join are evaluated
directly. Depending on which clause the conjunct came from either the entire
query block is marked as returning an empty set (HAVING clause) or the block
is marked as having an empty select-project-join portion (ON and WHERE clause).
In the latter case, aggregations (if any) must still be performed.
The plan sub-trees that are guaranteed to return an empty result set are
implemented by an EmptySetNode. Constant conjuncts from the ON clause of an
outer are assigned to the node implementing the join.

Similarly, query blocks with a limit 0 are marked as returning an empty result,
and planned as an EmptySetNode.

As a side effect, this patch also fixes:
IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0
consistent with Hive's. The target table is left empty after
such an operation.

Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800
2014-08-08 04:35:31 -07:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00