impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Author	SHA1	Message	Date
Alex Behm	4ad15bb2be	IMPALA-1524: Materialize all tuples produced by an EmptySetNode. Change-Id: I3b151ace464c67634104f84f7223c948fed8909e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5406 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins (cherry picked from commit c2959485a066b5c0b40e8b0790d526726236d0c9) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5409 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-11-25 23:21:02 -08:00
Matthew Jacobs	652d4b4699	IMPALA-1234: Fix bugs when producing EmptySetNode Fixes two issues that can occur when generating the plan for a stmt with an empty result set (e.g. due to limit 0 or constant predicates that evaluate to false): 1) Unions with an inline view that produces an empty result set does not create the EmptySetNode for the correct stmt. 2) An EmptySetNode may contain non-materialized tuples which will fail a precondition check when generating the thrift plan. Change-Id: I1511c755be3a59fdb8934624fd08250323266d27 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4744 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-10-06 19:49:50 -07:00
Alex Behm	3111827ae2	IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode. Before: Constant conjuncts used to be registered in the analyzer together with non-constant conjuncts. Since constant conjuncts are not bound by any slot or tuple they were incorrectly placed into whatever plan node called init() first and then were incorrectly marked as assigned. For handling queries with a limit 0 we had special code in the BE. After: Since constant conjuncts do not fit well into the existing slot/tuple based assignment logic this patch treats them specially as follows. Constant that do not originate from the ON clause of an outer join are evaluated directly. Depending on which clause the conjunct came from either the entire query block is marked as returning an empty set (HAVING clause) or the block is marked as having an empty select-project-join portion (ON and WHERE clause). In the latter case, aggregations (if any) must still be performed. The plan sub-trees that are guaranteed to return an empty result set are implemented by an EmptySetNode. Constant conjuncts from the ON clause of an outer are assigned to the node implementing the join. Similarly, query blocks with a limit 0 are marked as returning an empty result, and planned as an EmptySetNode. As a side effect, this patch also fixes: IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0 consistent with Hive's. The target table is left empty after such an operation. Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800	2014-08-08 04:35:31 -07:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

6 Commits