impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 12:02:10 -05:00

Author	SHA1	Message	Date
Alex Behm	0687c54792	IMPALA-2894: Move regression test into a different .test file. We cannot run certain nested types queries with the legacy joins/aggs, so to fix a build I just moved a recently added test into a different .test file that already does not run with legacy joins/agggs. Change-Id: I0ec0e61535ad01333129bd49beca4aa481f04d74 Reviewed-on: http://gerrit.cloudera.org:8080/1918 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-01-27 20:41:45 +00:00
Alex Behm	95951a36e8	IMPALA-2539: Unmark collections slots of empty union operands. Change-Id: I401f9b9a5e5457120600a7cb5b54f84adb8477f7 Reviewed-on: http://gerrit.cloudera.org:8080/1895 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-01-26 11:32:40 +00:00
Dan Hecht	86d0780b07	Move IMPALA-2259 test case to nested-types-runtime.test These tests are failing in the old agg/join nightly. This test references a nested types table, so we need to skip it when the old agg/join is enabled. So, move it to a place where this already happens. Change-Id: I09400760fd0b7506e4c127bbef92ab413d5d8615 Reviewed-on: http://gerrit.cloudera.org:8080/1143 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-10-05 11:30:46 -07:00
Alex Behm	6866c3045b	IMPALA-2430: Mark unused collection-typed slots as non-materialized. The bug: During planning, when generating an EmptySetNode for a query block (or a portion thereof) that contained relative table refs we still populated the corresponding collection-typed slots in the parent scan, ultimately hitting a sanity DCHECK in the BE. The fix: Since those collection-typed slots are never used, the corresponding parent scan should not materilaize them. When creating an EmptySetNode we mark the appropriate collection-typed slots as non-materialized. Change-Id: If0b9c37c46c0e27be7f1b47c395c8c90b499e323 Reviewed-on: http://gerrit.cloudera.org:8080/1092 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:43 -07:00
Tim Armstrong	75887730cb	IMPALA-2233: avoid loss of precision in function arguments This patch changes the resolution of overloaded functions so that we prefer functions where there is no loss of precision in argument types. Previously, the logic would happily convert DECIMAL to FLOAT even if there was a more suitable overload available. E.g. greatest(TINYINT, DECIMAL) was resolved to greatest(FLOAT...) instead of greatest(DECIMAL). This only changes behaviour when no overload exactly matches the argument types, but the arguments can be converted with no loss of precision, e.g. TINYINT to DECIMAL. This patch introduces a conceptual distinction between strict and non-strict compatibility. All contexts aside from function matching use non-strict to support the current behavior of implicitly casting decimals to floats/doubles. This patch also makes resolution of overloaded functions consistent regardless of what order functions were added to the Db - overloads are checked in a canonical order. Switching to this canonical order revealed further problems with overload resolution where the correct overload was selected only because of the order in which it was added to the database. For example, the logic equally preferred resolving fn(STRING, TINYINT) to fn(TIMESTAMP, INT) or fn(STRING, INT). This required changes to the compatibility matrix. Various cleanup and simplification of the type compatibility logic is also included. Change-Id: I50e657c78cdcb925b616b5b088b801510020e255 Reviewed-on: http://gerrit.cloudera.org:8080/845 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-10-01 13:58:40 -07:00
Tim Armstrong	d37bf390a8	IMPALA-2406: avoid rows with no tuples In some cases the planner generated plans with rows with no materialized tuples. Recent changes to the backend caused these to hit a DCHECK. This patch addresses one case in the planner where it was possible to create such plans: when the planner generated an empty node from a select subquery with no from clause. The fix is to create a materialized tuple based on the select list expressions, in the same way as we handle these selects when the planner cannot statically determine they have no result rows. An example query is included as a test. It also adds additional checks to the frontend and backend to catch these invalid rows earlier. Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4 Reviewed-on: http://gerrit.cloudera.org:8080/911 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-09-27 15:13:25 -07:00
Alex Behm	3ac341287c	IMPALA-2088: Fix planning of empty union operands with analytics. The check for ignoring empty union operands was simply misplaced. This misplacement resulted in empty union operands not being dropped if the containing UnionStmt had analytic functions. Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3 Reviewed-on: http://gerrit.cloudera.org:8080/554 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 15:46:41 -07:00
Dimitris Tsirogiannis	dd5ecb9deb	IMPALA-1960: Illegal reference to non-materialized tuple when query has an empty select-project-join block This commit fixes an issue where an aggregation expr may reference a non-materialized slot if the query contains an empty select-project-join block. This fix ensures that all the exprs in an aggregation reference materialized slots/tuples. Change-Id: Ic2cc9818061b3f06ab1d1cebf4e604352c2df6d1 Reviewed-on: http://gerrit.cloudera.org:8080/348 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-21 23:29:14 +00:00
Alex Behm	4ad15bb2be	IMPALA-1524: Materialize all tuples produced by an EmptySetNode. Change-Id: I3b151ace464c67634104f84f7223c948fed8909e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5406 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins (cherry picked from commit c2959485a066b5c0b40e8b0790d526726236d0c9) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5409 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-11-25 23:21:02 -08:00
Matthew Jacobs	652d4b4699	IMPALA-1234: Fix bugs when producing EmptySetNode Fixes two issues that can occur when generating the plan for a stmt with an empty result set (e.g. due to limit 0 or constant predicates that evaluate to false): 1) Unions with an inline view that produces an empty result set does not create the EmptySetNode for the correct stmt. 2) An EmptySetNode may contain non-materialized tuples which will fail a precondition check when generating the thrift plan. Change-Id: I1511c755be3a59fdb8934624fd08250323266d27 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4744 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-10-06 19:49:50 -07:00
Alex Behm	3111827ae2	IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode. Before: Constant conjuncts used to be registered in the analyzer together with non-constant conjuncts. Since constant conjuncts are not bound by any slot or tuple they were incorrectly placed into whatever plan node called init() first and then were incorrectly marked as assigned. For handling queries with a limit 0 we had special code in the BE. After: Since constant conjuncts do not fit well into the existing slot/tuple based assignment logic this patch treats them specially as follows. Constant that do not originate from the ON clause of an outer join are evaluated directly. Depending on which clause the conjunct came from either the entire query block is marked as returning an empty set (HAVING clause) or the block is marked as having an empty select-project-join portion (ON and WHERE clause). In the latter case, aggregations (if any) must still be performed. The plan sub-trees that are guaranteed to return an empty result set are implemented by an EmptySetNode. Constant conjuncts from the ON clause of an outer are assigned to the node implementing the join. Similarly, query blocks with a limit 0 are marked as returning an empty result, and planned as an EmptySetNode. As a side effect, this patch also fixes: IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0 consistent with Hive's. The target table is left empty after such an operation. Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800	2014-08-08 04:35:31 -07:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

14 Commits