impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 12:02:54 -05:00

Author	SHA1	Message	Date
Alex Behm	2325c8c923	Added [shuffle]/[noshuffle] plan hints for forcing/preventing repartitioning before an insert. Change-Id: I0647366815f4488cabbcb1fc7bc3cf851960c44e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1007 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:16 -08:00
Matthew Jacobs	8a55982105	Add OFFSET to skip rows returned with a LIMIT Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive does not support OFFSET so creating a view with an OFFSET will not work in Hive. For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip 5 rows, then return the next 20. OFFSET requires an ORDER BY clause. Note this is not very efficient as we must actually keep (limit+offset) rows in memory in the topn-node, and all child sort nodes must as well. Users should be careful when using this feature. Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d Reviewed-on: http://gerrit.ent.cloudera.com:8080/900 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:02 -08:00
Alex Behm	3f54240fed	PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode. Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c Reviewed-on: http://gerrit.ent.cloudera.com:8080/678 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:23 -08:00
Alex Behm	4bb8b38cde	Added stats and cost estimates to explain output. Change-Id: I1273745a439fd25cefa4e08ecc075c98cc8bfc45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/602 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:22 -08:00
Nong Li	15db34e356	AggregationNode refactoring This patch redoes how the aggregation node is implemented. The functionality is now split between aggregation-node, agg-expr and aggregate-functions. This is a working progress (there's still a lot of debug stuff I added that needs to be cleaned up) but it does pass the tests. Aggregation-node is now very simple and now only deals with the grouping part. Aggregate-expr serves as the glue between the agg node and the aggregate functions. The aggregation functions are implemented with the UDA interface. I've reimplemented our existing aggregate functions with this setup. For true UDAs, the binaries would be loaded in aggregate-expr. This also includes some preliminary changes in the FE. We now need to annotate each AggNode as executing the update vs. merge phase (root aggs execute update, others execute merge) and if it needs a finalize step (only the root does). This is more general than our builtins which are too simple to need this structure. There is a big TODO here to allow the intermediate types between agg nodes to change. For example, in distinct estimate, the input type is the column type and the output type is a bigint. We'd like the intermediate type to be CHAR(256). This is different since currently, the intermediate type and output type have always been the same. We've hacked around this by having both the intermediate and output type be TYPE_STRING. I've left this for another patch (changing the BE to support this is trivial). For aggregates that result in strings, we used to store some additional stuff past the end of the tuple. The layout was: <tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc The rationale for this is that we want to reuse the buffer for min/max and grow the buffer more quickly for group_concat. This breaks down the abstraction between agg-expr and agg-node and is not something UDAs can use in general. Rather than try to hack around this, I think the proper solution is to the intermediate type not be StringValue and to contain the buffer length itself. This patch also resurrects the distinct estimate code. The distinct estimate functions exercise all of the code paths. Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346 Reviewed-on: http://gerrit.ent.cloudera.com:8080/564 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:13 -08:00
Alex Behm	9065648d77	Improvements to cost estimation and explain output. Fixed cost estimation of union queries and exchange nodes. Fixed propagation of stats through cloning of exprs and plan nodes. Fixed propagation of expr stats to slots they are materialized into (e.g., grouping columns in multi-level aggs). Improved explain output for constant selects. Change-Id: I96d1652c00d48e4093b85ae7fc8bad28d74b8b81 Reviewed-on: http://gerrit.ent.cloudera.com:8080/547 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:08 -08:00
Alex Behm	39f9a067fa	IMPALA-444: Fixed accuracy of string to double conversion. Falling back to strod for scientific notation. Change-Id: I9a5d948620907d34601ef041e58b1c9bb2172f71 Reviewed-on: http://gerrit.ent.cloudera.com:8080/507 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:56 -08:00
Nong Li	2b9105cd11	IMPALA-487: don't compact data from rhs of join if it is going through an exchange node. Change-Id: I442445e7370218352cd6d3137f2a454c9afb73ba Reviewed-on: http://gerrit.ent.cloudera.com:8080/476 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:50 -08:00
Alex Behm	77c0e54bb9	Set HDFS block size to 128MB because HDFS versions since 2.1.0-beta use 128MB as a default (HDFS-4053). Change-Id: If112d2eab242b44f05f64ee071ebea5b253c7927 Reviewed-on: http://gerrit.ent.cloudera.com:8080/470 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:48 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Alex Behm	e52ed0800b	IMPALA-524: Fix computation of stats for ExchangeNode and merge AggregationNodes. The issue caused unnecessary repartitioning for static partition insert queries having grouped aggregation in the feeding query stmt. Change-Id: I5f4017e2c4d5a1bf88f51c4e0ff7ab28911e14f1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:11 -08:00
Alex Behm	48ee7ce891	IMPALA-508: Fix join-cardinality estimation and choice of join strategy when a join involves a table lacking table stats. Change-Id: I871273e1d9f048377ce638c201118fc21086db9a Reviewed-on: http://gerrit.ent.cloudera.com:8080/152 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:05 -08:00
Alex Behm	f0e2d539fc	IMPALA-495: Views Sometimes Not Utilizing Partition Pruning. Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Marcel Kornacker	d85b90cb22	SlotDescriptor.label plus repartitioning for inserts when column stats are missing.	2014-01-08 10:51:56 -08:00
Marcel Kornacker	c8afd16bbb	IMPALA-85: planner fails with "Join requires at least one equality predicate between the two tables" when "from" table order does not match "where" join order This fix contains two parts: - functionality inside the analyzer to compute a value transfer graph (from equality predicates between slotrefs) and from that equivalence classes for all slots; this functionality is required for this fix but will be generally useful when adding propagation of binding predicates in the future - a "shortest path" implementation inside the planner of a fix for the problem at hand; this leaves a lot to be desired: * correct handling of assigned predicates: the added test case shows that the planner will try to assign all predicates to some node in the tree, even if that predicate is superfluous because it was subsumed by an equality derived from equivalence class membership * complete lack of propagation of binding predicates (e.g., propagate "col1 = 5" to all slotrefs that are in the same equivalence class as col1) This is beyond what can be accomplished for 1.1 and therefore will have to wait for 1.2.	2014-01-08 10:51:51 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Alex Behm	3bba336bbf	IMPALA-359: Return proper tuple id of inline view with distinct aggregation.	2014-01-08 10:51:26 -08:00
Alex Behm	ece9f76a0b	IMP-967: Recognize predicates referring to more than two tuples as eq conjuncts.	2014-01-08 10:51:13 -08:00
Alex Behm	045038e479	IMPALA-374: Added WITH clause without recursion.	2014-01-08 10:51:00 -08:00
Alan Choi	2bdba77f61	Perform HBase deterministic region assigment and enable HBase scan range location test in the planner test	2014-01-08 10:50:54 -08:00
Lenni Kuff	2e19107496	Fixed TPCH planner test due to column stat changes in CDH4.3.0 hive	2014-01-08 10:50:46 -08:00
Alex Behm	937a44f9f8	IMPALA-68: Support Values() statement.	2014-01-08 10:50:31 -08:00
Alex Behm	c7819f4db7	IMPALA-87: Support INSERT from SELECT without FROM.	2014-01-08 10:50:30 -08:00
Nong Li	4235bf5009	Fix planner test result.	2014-01-08 10:50:11 -08:00
Alan Choi	2d25f11ec3	IMPALA-91 new explain plan output	2014-01-08 10:50:10 -08:00
Alex Behm	c9040aee22	IMPALA-111: COUNT(DISTINCT col) returns wrong results -- does not ignore NULLs.	2014-01-08 10:50:09 -08:00
Marcel Kornacker	21ec49e810	IMPALA-150: Performing dynamic partition insert via Impala on "large" table fails and takes down HDFS This is solved by repartitioning the input to the hdfs table sinks on the partition key columns of the hdfs table, so that each partition is only written by a single node.	2014-01-08 10:50:07 -08:00
Skye Wanderman-Milne	0c343913fa	IMPALA-266: Round() does not output the right precision	2014-01-08 10:50:02 -08:00
Marcel Kornacker	5bfc477ccc	IMPALA-291: Plans should explicitly mention the join strategy	2014-01-08 10:49:59 -08:00
Alex Behm	132513f98c	IMPALA-75: 'at least one equality predicate' error message needs improvement	2014-01-08 10:49:58 -08:00
Alex Behm	21685d4f8f	Fixed a failed Preconditions check if a join predicate has constants.	2014-01-08 10:49:52 -08:00
Marcel Kornacker	7bf87a4b54	fix for IMPALA-90/IMPALA-221	2014-01-08 10:49:50 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
Alex Behm	805fa50d6f	IMPALA-67: Constant SELECT clauses do not work in subqueries.	2014-01-08 10:49:48 -08:00
Alex Behm	2277386d4d	IMPALA-225: Compound predicate ranges on partition keys crash impalad.	2014-01-08 10:49:45 -08:00
Marcel Kornacker	398e725a23	make broadcast joins the default join strategy	2014-01-08 10:49:34 -08:00
Marcel Kornacker	d7e22f44bb	Partitioned hash joins - added PlanNode.numNodes, PlanNode.avgRowSize and PlanNode.computeStats() - fixing up some cardinality estimates - Planner now tries to do a cost-based decision between broadcast join and join with full repartitioning (both inputs) - ExchangeNode now distinguishes between its input and output row descriptor: the output potentially contains more tuples - fixed problem related to cancellation and concurrent hash table builds. Not included: - partitioned joins that take advantage of existing partitions of the inputs; those will have to wait for a follow-on change	2014-01-08 10:49:29 -08:00
Alan Choi	4a503a4e35	IMP-808 construct runtime state in fe-support to eval now()	2014-01-08 10:49:20 -08:00
Nong Li	20fc700002	Fix precision issue in text table writer.	2014-01-08 10:49:19 -08:00
Alex Behm	0821e2f826	IMPALA-66: Support for UNION with constant SELECT clauses.	2014-01-08 10:49:18 -08:00
Marcel Kornacker	0c36c7f327	Partitioned merge aggregation.	2014-01-08 10:48:59 -08:00
Marcel Kornacker	d7bfe6c68d	IMPALA-144: partition pruning for arbitrary predicates that are fully bound by partition columns This makes partition pruning more effective by extending it to predicates that are fully bound by the partition column, e.g., '<col> IN (1, 2, 3)' will also be used to prune partitions, in addition to equality and binary comparisons.	2014-01-08 10:48:41 -08:00
Marcel Kornacker	c02d25baa8	IMPALA-20: Limit clause in inline view not handled correctly by planner - this adds a SelectNode that evaluates conjuncts and enforces the limit - all limits are now distributed: enforced both by the child plan fragment and by the merging ExchangeNode - all limits w/ Order By are now distributed: enforced both by the child plan fragment and by the merging TopN node	2014-01-08 10:48:29 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Nong Li	02c329b97a	Update RC files to use io mgr and remove scanner support for non-io mgr.	2014-01-08 10:47:11 -08:00
Nong Li	15dfd968fb	Disable tpch-q21 and fix plan output for tpch-q22. We can now generate the temp table for q22 which changes the plan output.	2014-01-08 10:47:03 -08:00
Henry Robinson	15228f945f	IMP-503: INSERTS into unpartitioned tables should be checked for union compatibility	2014-01-08 10:46:57 -08:00
Alan Choi	476a665763	IMP-620: print number of scanned partition and total scaned bytes	2014-01-08 10:46:57 -08:00
Marcel Kornacker	f6af9316d9	Fix for IMP-137: incorrect predicate placement for outer joins Fixing predicate assignment for outer joins: - On clause predicates for outer joins are now assigned to the join node - the exception are On clause predicates that can be directly evaluated by the outer-joined tables themselves; those are "pushed down" - Where clause predicates for outer-joined tables are assigned to the join node that materializes the outer join	2014-01-08 10:46:50 -08:00
Lenni Kuff	febdb112f4	Fixed bug in test file section parsing	2014-01-08 10:46:50 -08:00

1 2

65 Commits