impala

mirror of https://github.com/apache/impala.git synced 2025-12-31 15:00:10 -05:00

Author	SHA1	Message	Date
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Srinath Shankar	74a975c45b	IMPALA-862: count(x) may return null when a similar count(distinct x) is also used count(x) with no distinct and no group-by expressions returns NULL on empty input if other distinct aggs (e.g. COUNT(distinct x) are present. This happens because the COUNT is transformed to SUM(COUNT()), with the inner COUNT being evaluated WITH a group-by expression (e.g. x). SUM over empty input returns NULL, but COUNT should return 0. This patch fixes this by replacing COUNT with zeroifnull(COUNT) before AggregateInfo is generated if there are distinct aggs and no group-bys. The logic in AggregateInfo itself has not been modified. Change-Id: I902e3fdd95767135b2f3fe423e8802ef57366af1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1921 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-03-14 23:35:55 -07:00
Alex Behm	f7c2781afe	IMPALA-845: Transfer predicates to 2nd phase merge agg in some cases. Having predicates need to be transferred to the 2nd phase merge agg for distinct + non-distinct aggregates without group by. For distinct + non-distinct aggregates with group by, it is correct to evaluate the predicates at the 2nd phase (non-merge) agg. Change-Id: I71d73c4ef92becbb81e142bc0cb5f54e790b1fb5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1743 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1817	2014-03-07 21:45:16 -08:00
Alex Behm	3bba336bbf	IMPALA-359: Return proper tuple id of inline view with distinct aggregation.	2014-01-08 10:51:26 -08:00
Alex Behm	c9040aee22	IMPALA-111: COUNT(DISTINCT col) returns wrong results -- does not ignore NULLs.	2014-01-08 10:50:09 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
Lenni Kuff	5f81becd84	Create tables used by insert tests in a supported insert format	2014-01-08 10:49:00 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Nong Li	b4dc3eeb35	Fix IMP-575	2014-01-08 10:46:45 -08:00
Marcel Kornacker	ea050a43ad	Switching over backend runtime structures to new planner. Added container-util.h	2014-01-08 10:46:20 -08:00
Marcel Kornacker	5984c0be52	First cut of partitioned plan generation: - created new class PlanFragment, which encapsulates everything having to do with a single plan fragment, including its partition, output exprs, destination node, etc. - created new class DataPartition - explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints - Adding IdGenerator class. - moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for PlanFragment.getExplainString() - Changed planner interface to return scan ranges with a complete list of server locations, instead of making a server assignment. Also included: cleaned up AggregateInfo: - the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation. - moved analysis functionality into AggregateInfo Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).	2014-01-08 10:45:56 -08:00
Alan Choi	dd1537d116	IMP-132: collect unique agg expr	2014-01-08 10:44:39 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

14 Commits