Commit Graph

247 Commits

Author SHA1 Message Date
Alex Behm
c3b5edd2af IMPALA-2414: Fix correlated WITH-clause views.
The bug was that the analysis of a WITH-clause view containing a
relative table ref would register a collection-typed slot in the parent
tuple descriptor of that relative table ref. The problem is that we use
a dummy analyzer with a fresh global state for the WITH-clause analysis.
Since the descriptor table is part of the global state, it was possible
that we'd register a collection-typed slot with an item tuple id that was
the same as the parent tuple id (or another arbitrary tuple id that has
no meaning in the parent tuple). This parent/item tuple with the same id
lead to an infinite recursion in the backend.

The fix is to not register collection-typed slots in parent tuples when
analyzing a relative table ref inside a WITH-clause view. I added a new
flag to the analyzer to indicate whether it is analyzing a WITH-clause
view.

Change-Id: Ifc1bdebe4577a959aec1f6bb89a73eaa37562458
Reviewed-on: http://gerrit.cloudera.org:8080/1021
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:40 -07:00
Alex Behm
0e5f501782 IMPALA-2401: Properly set the number of nodes in UnnestNode and SingularRowSrcNode.
The bug was that we were hitting a Preconditions check inside
AnalyticPlanner.computeInputPartitionExprs() because the passed numNodes was -1,
but this function should expect a valid numNodes value. We passed a value of -1
as a result of not properly setting it in the computeStats() of UnnestNode
and SingularRowSrcNode.

We were setting the numNodes based on the numNodes of the containing SubplanNode
which is only initialized after the subplan tree (second child) has been set.
The fix is to set the numNodes based on the input of the containing SubplanNode.

Change-Id: Ib64ef75e7aaa08c4ea4c7e3c868189b35e5f1647
Reviewed-on: http://gerrit.cloudera.org:8080/906
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:59 -07:00
Alex Behm
9795364a77 IMPALA-2383: Prevent incorrect re-ordering of subplans across outer/semi joins.
In order to consider a subplan at a given point in the plan we require a specific
list of tuple ids and table ref ids to be materialized.
The bug was that our requirement on the table ref ids was not strict enough
to properly prevent re-ordering of subplans across outer/semi joins.

The fix: Just like in the regular join ordering, when considering the placement
of a relative or correlated table ref in the plan, we require that all table refs
to the left of and including any outer/semi join preceding that table ref in the
FROM clause are materialized before adding the plan for that table ref.

Change-Id: If2915c16864fda3e9d5536d5c7d346475c9b9e53
Reviewed-on: http://gerrit.cloudera.org:8080/898
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:57 -07:00
aacalfa
57dd4d1502 IMPALA-1309: Add support for distinct in group_concat function.
Change-Id: I2790f1d2a7bfd0ecc7ef66cc5d91dafe3414e111
Reviewed-on: http://gerrit.cloudera.org:8080/892
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 09:42:17 +00:00
Alex Behm
f1e2720099 Nested Types: Add support for column lineage with nested types.
Change-Id: I70741eaf4294e6d230ec72b01a9321a5e731a952
Reviewed-on: http://gerrit.cloudera.org:8080/887
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:34 -07:00
Alex Behm
ceb0781c5b IMPALA-2358: Fix join ordering of relative collection table refs inside a subplan.
The bug was that we were dropping a relative collection table ref during plan
generation. As a result, we were not computing the memory layout for the item
tuple descriptor of a collection-typed slot, which lead to a crash in the BE
when trying to populate that slot with an invalid offset during a scan.

Change-Id: I229b04ac581305ce78bc67b094fe7220881bbe46
Reviewed-on: http://gerrit.cloudera.org:8080/883
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Ippokratis Pandis
0b614cf915 IMPALA-2165: Avoid cardinality 0 in scan nodes of small tables and low selectivity
To estimate the cardinality of the scan nodes we multiply the number
of rows of the table with the computed selectivity (which is a
function of the number of predicates). If the table is small (e.g. a
small dimension table) and there are a few predicates, then because of
rounding we end up using cardinality of 0 for this scan node. But this
is causing errors in the cardinality estimation of upstream nodes such
as cross join that typically multiply the cardinalities of their
inputs.

This patch fixes the problem by making sure that in case we round to
0 the cardinality, we use 1 instead. Some plans change because of this
patch.

Change-Id: Ic52e4f40964a5620f674803b48c4af814e6d4164
Reviewed-on: http://gerrit.cloudera.org:8080/819
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Alex Behm
0c90bf7ef5 IMPALA-2340: Fix NOT IN subquery planning and execution with nested types.
Fixes:
1. Change the planner to not invert null-aware anti join because there is
   only a left version. Also, always use a hash join because the
   nested-loop join does not support that join mode.
2. Fix PartitionedJoinNode::Reset() and related calls to make the join
   usable in subplans with the left null-aware anti join mode.

Change-Id: I8da50747f6a0412c5858fd32b9498f58ed779712
Reviewed-on: http://gerrit.cloudera.org:8080/847
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Alex Behm
84e2c043a4 IMPALA-2341: Fix preconditions check for inverting joins without an On-clause.
Joins that have a correlated or relative table ref on the right-hand side
are not required to have an On-clause due to the implicit parent/child
condition. This patch relaxes a preconditions check in join inversion
to reflext this new case introduced by nested types.

Change-Id: Iaa0cf155883e3cdcb200241263e56caee9ee2ca2
Reviewed-on: http://gerrit.cloudera.org:8080/843
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
ishaan
3ed9d8c3fe Fix the Planner Test testComplexFileFormats.
Currently, the expected results have a wrong file size. This fixes the expected file size
to reflect what's expected with the 5.7.0 thirdparty bits.

Testing:

Ran a full data load on the previous trunk and this was the only error. Also confirmed
that the values match what's expected.

Change-Id: I29e5e5c54d459b6d314b729c49a6b9207a48d4ff
Reviewed-on: http://gerrit.cloudera.org:8080/878
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2015-09-17 23:37:58 +00:00
ishaan
368c7e70b1 Revert expected output for file size in nested planner tests.
Previously, a patch changed the expected file sizes for nested planner test because of a
change in the underlying thirdparty bits in cdh5.5.0. However, this is not reflected in
trunk, so this patch reverts those changes.

Change-Id: I650f6a8415e20508a926e029d8df78523d830eb2
2015-09-16 20:49:40 -07:00
ishaan
aa7eb329a5 IMPALA-2315: Re-enable planner tests for nested types.
Change-Id: If8dc560b5e5d7d369b4365aea465db114667baa4
Reviewed-on: http://gerrit.cloudera.org:8080/842
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-15 08:38:13 -07:00
Alex Behm
52b0e04c78 IMPALA-1917: Do not register aux equivalence predicates with NULL on either side.
Registering an auxiliary equivalence predicate with NULL on one side would
be incorrect, because <expr> = NULL is false (even NULL = NULL).

Change-Id: Ie10d0e2fa61e5c2c7d4040c8b78d5bfb1a1e2761
Reviewed-on: http://gerrit.cloudera.org:8080/777
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-09 23:31:55 +00:00
Alex Behm
9d46853fbc Nested Types: Check un/supported file formats for complex types.
Before this patch, we used to accept any query referencing complex
types, regardless of the table/partition's file format being scanned.
We would ultimately hit a DCHECK in the BE when attempting to scan
complex types of a table/partition with an unsupported format.

This patch makes queries fail gracefully during planning if a scan
would access a table/partition in a format for which we do not
support complex types.

For mixed-format partitioned Hdfs tables we perform this check
at the partition granularity, so such a table can be scanned as
long as only partitions with supported formats are accessed.

HBase tables with complex-typed columns can be scanned as long as
no complex-typed columns are accessed in the query.

Change-Id: I2fd2e386c9755faf2cfe326541698a7094fa0ffc
Reviewed-on: http://gerrit.cloudera.org:8080/705
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-01 03:26:53 +00:00
Alex Behm
d94a930ef0 IMPALA-2266: Pass correct child node in 2nd phase merge aggregation.
The bug was that we were not passing the correct child node in the
constructor of the AggregationNode for the 2nd phase merge. As a result,
the 2nd phase merge aggregation node did not have the correct output
smap set, and all exprs on top were not substituted properly.

The bug was not apparent because the child node of the 2nd phase distinct
aggregation was fixed up after construction when setting the new plan
root of the corresponding fragment, so the bug did not manifest itself in
cases where the output smap was empty anyway (e.g., distinct aggregations
in the top-level select block).

Change-Id: I0617b2389e55ebcaf424600803babb250323ad3c
Reviewed-on: http://gerrit.cloudera.org:8080/715
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-29 23:45:11 +00:00
Alex Behm
ae9fd52c51 IMPALA-2089: Retain eq predicates bound by grouping slots with complex grouping exprs.
The bug: When enforcing slot equivalences at an aggregation node, we used to
incorrectly assume that equivalences among grouping slots must have already been
enforced below the aggregation (e.g., in a scan). This assumption is correct if the
grouping slots are produced by simple SlotRef grouping exprs, because then there is
certainly a value transfer between the grouping slot and another slot below the
aggregation. However, for grouping slots with complex grouping exprs this assumption
is not correct, and as a result, we would incorrectly remove eq predicates bound by
gropuing slots with complex grouping exprs because we assumed they were redundant.

Ths fix is to enforce slot equivalences among grouping slots with complex grouping
exprs as usual, and not assume that they have already been enforced below the agg.

Change-Id: Idcd44acccb9326a35c9121025dc88c2c70c7c7c7
Reviewed-on: http://gerrit.cloudera.org:8080/656
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-23 04:43:37 +00:00
Alex Behm
af4ac802e7 Nested Types: Add planner tests for nested TPCH.
Change-Id: Ia8cb9f567c827f0b3800b63d53c17cddc5b3da4a
Reviewed-on: http://gerrit.cloudera.org:8080/669
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-22 23:36:15 +00:00
Alex Behm
de4410af15 Nested Types: Assign conjuncts bound by collection-item tuples in Hdfs scans.
This patch changes HdfsScanNode.init() to collect conjuncts that can be evaluated
while materializing the items (tuples) of collection-typed slots, and assign these
conjuncts to the scan node.

Limitation: Conjuncts that must first be migrated into inline views and that cannot
be captured by slot binding will not be assigned here, but in an UnnestNode.
This limitation applies to conjuncts bound by inline-view slots that are backed by
non-SlotRef exprs in the inline-view's select list. We only capture value transfers
between slots, and not between arbitrary exprs.

Change-Id: I20f2522070b257411c5e5d4ba9430e74b215308f
Reviewed-on: http://gerrit.cloudera.org:8080/665
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-21 09:08:54 +00:00
Alex Behm
d84182eeb0 Nested Types: Enforce slot equivalences at UnnestNode, and other fixes.
This patch includes the following fixes/improvements:
1. Bug fix: Enforce slot equivalences at UnnestNodes.
2. Bug fix: Correctly distinguish the join and non-join conjuncts
   for nested-loop joins.
3. Improvement: Always use a nested-loop join if one join input is
   a singular row src node. Since a singular row src has a cardinality
   of 1, a nested-loop join is certainly cheaper than a hash join.
4. Improvement: Always place singular row src nodes on the build side
   of a join.

Change-Id: Ia1d7ace5fa7d00cc7e702c7ca74a0b479ca0b455
Reviewed-on: http://gerrit.cloudera.org:8080/667
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-08-21 04:59:27 +00:00
Alex Behm
a39ffcf910 Nested Types: Relax restrictions on subquery rewrites to run nested TPCH.
This patch relaxes the restrictions on subquery rewrites for subqueries
that only reference relative table references. The existing restrictions
for subqueries with absolute table references remain unchanged.

In principle, there should be no restrictions on executing subqueries with
only relative table references because such subqueries are executed with
a subplan. However, since the more general solution is more involved this
patch only covers a few important cases illustrated by the examples below.

Examples of subqueries that pass rewrite/analysis after this patch:

// Uncorrelated not exists subquery.
select .. from customers c
where not exists (select ... from c.orders)

// Correlated [not] exists subquery with no equi-join predicates.
select .. from customers c
where [not] exists (select ... from c.orders o where c.cid < o.oid)

These subquery rewrites are needed to run our nested versions of
TPCH-Q21 and TPCH-Q22.

Change-Id: I326cd1327defed323a47cdb2555457b33d9b6638
Reviewed-on: http://gerrit.cloudera.org:8080/485
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-20 02:36:19 +00:00
Alex Behm
2db16efda8 Nested Types: Plan generation for correlated and child table refs with Subplans.
The plan generation is heuristic. A SubplanNode is placed as low as possible in the
plan tree - as soon as its required parent tuple ids are materialized.
This approach is simple to understand and implement, but not always optimal. For
example, it may be better to place a Subplan after a selective join, but today we
will place it below the join if it is correct to do so.

For such scenarios, the straight_join hint can be used to manually tune the join
and Subplan order. If straight_join is used, correlated and child table refs are placed
into the same SubplanNode if they are adjacent in the FROM clause.

Change-Id: I53e4623eb58f8b7ad3d02be15ad8726769f6f8c9
Reviewed-on: http://gerrit.cloudera.org:8080/401
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-19 18:37:02 +00:00
Skye Wanderman-Milne
7906ed44ac IMPALA-2015: Add support for nested loop join
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.

Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.

Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-08-19 08:40:14 +00:00
Alex Behm
408fc6ec1e IMPALA-2216: Set the output smap of an EmptySetNode produced from an empty inline view.
The bug: In one specific code path in inline-view planning where the inline-view plan
is simply an EmptySetNode, the final output exprs of the query still referred to
the non-materialized inline-view slots because the inline-view's smap was not set
on the EmptySetNode, so the final output exprs were not substituted.

The following conditions / sequence of actions triggered this bug:
- A query statement that has an inline view on a select stmt without a FROM clause
- During planning, a conjunct is migrated into that inline view (e.g. from the
  WHERE-clause of an enclosing query block)
- After migration the conjunct becomes constant end evaluates to false
- The inline view plan becomes an EmptySetNode
- The bug: The output smap of that EmptySetNode was not set

Change-Id: Ieaad6e4e9675fb52fbbfc843f9fa25b48b34406d
Reviewed-on: http://gerrit.cloudera.org:8080/657
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-19 01:44:47 +00:00
Dimitris Tsirogiannis
47c5ae405a Revert "IMPALA-2015: Add support for nested loop join"
This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa.

Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23
Reviewed-on: http://gerrit.cloudera.org:8080/636
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-13 02:20:07 +00:00
Skye Wanderman-Milne
f000758ca8 IMPALA-2015: Add support for nested loop join
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.

Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.

Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e
Reviewed-on: http://gerrit.cloudera.org:8080/457
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-08-07 02:47:32 +00:00
Alex Behm
2b537b200f Nested Types: Generalize equi-join conjunct assignment for bushy plans.
In our equi-join predicate assignment logic, we used to assume that
the right-hand side or a join (or left-hand side if inverted) only contains
a single table ref id.

With the addition of subplans for nested types, we now sometimes generate
bushy plans where both sides of a join could cover multiple table ref ids.
This patch generalizes our existing logic to handle that case for equi-join
conjunct assignment, including inference of new join conjuncts as well
as removal of redundant conjuncts based on equivalence classes.

Change-Id: I252769ffd187988b1d617a17cd3c4ae638d5fcfa
Reviewed-on: http://gerrit.cloudera.org:8080/503
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 08:46:10 +00:00
Alex Behm
3ac341287c IMPALA-2088: Fix planning of empty union operands with analytics.
The check for ignoring empty union operands was simply misplaced.
This misplacement resulted in empty union operands not being
dropped if the containing UnionStmt had analytic functions.

Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3
Reviewed-on: http://gerrit.cloudera.org:8080/554
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-27 15:46:41 -07:00
Shant Hovsepian
6d87fe090c Improve Hll estimate for small cardinalities.
Based on Google's HyperLogLog++ paper. Uses a bias correcting
interpolation as a sub algorithm for Hll estimates within a specific
range.

Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9
Reviewed-on: http://gerrit.cloudera.org:8080/415
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2015-07-16 19:38:17 +00:00
Dimitris Tsirogiannis
fcba301b18 IMPALA-2018: Where clause does not propagate to joins inside nested
views

This commit fixes an issue where during predicate propagation a
predicate from the where clause is not properly assigned at the join
node that outer joins the generated predicate.

Change-Id: Ifccc1b0e0a0579c3baa48f0fb3dedcbd44941b53
Reviewed-on: http://gerrit.cloudera.org:8080/476
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-06-26 23:35:25 +00:00
Dan Hecht
4823889e14 IMPALA-1968: Part 2: Improve planner numNodes estimate
See the previous commit for IMPALA-1968 for details.  This commit
addresses cases 2 & 3 by enabling the new estimate logic even when there
are no remote scan ranges.

Change-Id: I54bb26ee7d89ae9d74dcfcc3753ea73dae8315bc
Reviewed-on: http://gerrit.cloudera.org:8080/426
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-06-10 21:10:31 +00:00
Dan Hecht
d46de9bba1 IMPALA-1968: Part 1: Improve planner numNodes estimate for remote scans
This commit will be backported to 5.4.x to improve plans when using
Isilon and S3.

The planner currently estimates the number of backends that an hdfs scan
node will execute on as the number of datanodes holding block replica
for the corresponding table.  This can be a bad estimate for various reasons:

1) It's completely wrong when the scan is remote (e.g. S3 or Isilon).
2) It doesn't account for partition pruning.
3) The size of the set of hosts holding block replica may larger than
   the number of scan ranges.

Improve the estimate by examing the scan ranges and taking locality into
account.  While this new estimate will eventually be used in all cases,
this change uses the new estimate only when there is a remote scan range
as to not change plans produced for local ranges (since this commit will
be backported to 5.4.x).  So, this commit purposely addresses only case
1.  A follow on commit will enable the new logic for all cases.

Also set up the S3PlannerTest so that we can enable it in the nightly
jenkins S3 run.  It was inadvertantly never enabled there.

Change-Id: I3fd3f7c5431a535fb044c98c326338c21b8a1898
Reviewed-on: http://gerrit.cloudera.org:8080/425
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-06-03 20:04:03 +00:00
Alex Behm
b558ea3f92 Nested Types: Refactoring of join nodes in preparation for more 'cross join' modes.
This patch introduces a new superclass, JoinNode, as the parent of HashJoinNode
and CrossJoinNode. It is a first step in supporting the semi/outer modes for
non-equi joins via a nested-loops implementation (like our existing cross join).
I have a left a few TODOs that should be addressed when adding such suppoort.

This patch also includes a cosmetic improvement to explain plans:
The distribution mode of CROSS JOINs is now only displayed for distributed plans,
and not for single-node plans (which is important for Subplans).

Change-Id: I93546871c459f4bc564f6dcb6bf4c35addbad4ec
Reviewed-on: http://gerrit.cloudera.org:8080/388
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-05-19 05:31:40 +00:00
Juan Yu
78446e5f34 Fix FE test failure in PlannerTest#testAnalyticFns
Change-Id: Ica3aa33686c3be6372b8c36ec66b367ce1d21a3b
Reviewed-on: http://gerrit.cloudera.org:8080/379
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-05-14 19:18:37 +00:00
Alex Behm
2825420d21 Nested Types: Basic plan generation for scans of nested collections.
Change-Id: I5bb44d302d785afb93aa8363fca716b32d58599c
Reviewed-on: http://gerrit.cloudera.org:8080/359
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-04-30 20:52:39 +00:00
Dimitris Tsirogiannis
dd5ecb9deb IMPALA-1960: Illegal reference to non-materialized tuple when query has
an empty select-project-join block

This commit fixes an issue where an aggregation expr may reference a
non-materialized slot if the query contains an empty select-project-join
block. This fix ensures that all the exprs in an aggregation reference
materialized slots/tuples.

Change-Id: Ic2cc9818061b3f06ab1d1cebf4e604352c2df6d1
Reviewed-on: http://gerrit.cloudera.org:8080/348
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-21 23:29:14 +00:00
Alex Behm
7067a5d94d IMPALA-1519: Fix wrapping of exprs via a TupleIsNullPredicate with analytics.
The bug:
Analytic functions introduced a few challenges in properly wrapping
exprs with TupleIsNullPredicates when substituting exprs from outer-joined
inline views.

1. The logical to physical tuple mapping during the plan generation of analytics
invalidated the tuple ids originally set in upstream TupleIsNullPredicates
introduced during analysis (e.g., in the result exprs).

2. TupleIsNullPredicates require specific tuple ids for evaluation.
Since sort nodes materializes a new tuple, it's impossible to evaluate
TupleIsNullPredicates referring to a sort's input after the sort.
Non-analytic sorts handle this case during analysis by materializing
the result of that select block. However, analytic sorts used to only materialize
the slots of materialized tuple ids of the input plan node.

The fixes:

1. Move the TupleIsNullPredicate wrapping from the inline-view analysis into
the inline-view planning. This avoids the original problem because all physical
output tuples are known during plan generation. This simple change has a few
subtle consequences: First, we must rely on the plan root's output smap for
substituting the final result exprs, and *not* use the top-level base table smap
generated during analysis. Second, during plan generation we must use an inline
view's smap (and *not* its base table smap) for generating the output smap of its
plan such that we can properly wrap the rhs exprs in TupleIsNullPredicates
at every level.
This change also fixes IMPALA-1946 by deferring the TupleIsNullWrapping to
planning time.

2. To preserve the information whether an input tuple was null or not at an
anlytic sort, we materialize TupleIsNullPredicates, which are then substituted
by a SlotRef into the sort's tuple in ancestor nodes.

This patch also cleans up and consolidates the code used for wrapping exprs into
TupleIsNullPredicate itself.

Change-Id: I5c6d142bdf9c99ece2a564e557d4ffe22ac90865
Reviewed-on: http://gerrit.cloudera.org:8080/317
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-04-14 23:33:20 +00:00
Dimitris Tsirogiannis
30069f2cb5 IMPALA-1900: Assign predicates below analytic functions with a
compatible partition by clause

This commit enables pushing predicates through inline views with
analytic functions if we can guarantee that the predicates are compatible
with the partition by clauses of all analytic functions in the view
definition stmt.

Change-Id: Ic3debd11a7294dfaf7df8e88d7dc3a1d48b7f927
Reviewed-on: http://gerrit.cloudera.org:8080/278
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-07 01:41:55 +00:00
Dimitris Tsirogiannis
4eceeacf16 IMPALA-1550: Invalid rewrite when EXISTS subqueries contain aggregate
functions

This commit fixes an issue where a [NOT] EXISTS subquery that contains
an aggregate function will sometimes be incorrectly rewritten into a
join, thereby returning incorrect results.

Change-Id: I18b211d76ee3de77d8061603ff5bb1fbceae2e60
Reviewed-on: http://gerrit.cloudera.org:8080/266
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-02 19:11:00 +00:00
Dimitris Tsirogiannis
3a7ed7c59e CDH-26149: No navigator lineage for CREATE/ALTER VIEW statements
This commit fixes the issue where no lineage events are generated for
create and alter view statements.

Change-Id: Ib8c4513219569f62eb26a0eb09a8c2a762054b70
Reviewed-on: http://gerrit.cloudera.org:8080/265
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-25 00:05:39 +00:00
Alex Behm
32f644820d IMPALA-1860: INSERT/CTAS evaluates and applies constant predicates.
This patch fixes a regression introduced in:
c6907e4c2eabf5d73f83cc8e16b7f35a13c3b59f
IMPALA-1376: Split up Planner into multiple classes.

The problem was that in single-node planning the root analyzer
was passed to generate the plan for an INSERT/CTAS' query stmt.
The fix is to instead pass the stmt's analyzer that contains
information about evaluated constant predicates.

Change-Id: I551f471c978bc1f6bdff0d98e4826856d1e4860f
Reviewed-on: http://gerrit.cloudera.org:8080/191
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:39 -07:00
Alex Behm
adb19deece Re-enable tests that had been temporarily removed to unblock the full data load.
The following commits disabled tests to unblock the full data load:
a00a9a5e53f7a8e7a1e3c931ea0e4b7db21c6f00
bf29d06f2e53bb924d250275d51f5ccd1213531d

This patch re-enables those tests and adds new tests to guard against
regressions to HIVE-6308.

Unfortunately, we cannot completely remove the analysis check for HIVE-6308
in our code, because there is still one case where COMPUTE STATS will fail on
a Hive-created Avro table: If there is a mismatch in column names between
the Avro schema and the column defs given to a CREATE TABLE in Hive.

Change-Id: I81ae6b526db02fdfc634e09eeb9d12036e2adfdd
Reviewed-on: http://gerrit.cloudera.org:8080/180
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:38 -07:00
Dimitris Tsirogiannis
d04f190973 IMPALA-1802: Impala produces incorrect count(distinct) result with limit
clause

This commit fixes the issue where if a query contains a count(distinct)
expression in conjunction with a limit, the limit is incorrectly applied
to the wrong place in the generated plan, thereby producing incorrect
results.

Change-Id: I776e1b78461323e7ab72d491dcec7a9acd9e75f9
Reviewed-on: http://gerrit.cloudera.org:8080/196
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-10 19:41:32 +00:00
Alex Behm
7615ec9b98 Temporarily disable a single insert planner test to unblock the full data load.
The issue here is that the file sizes for alltypes in seq/snap in the current
snapshot are different from the ones generated by the new Hive.
After we have generated a new snapshot, I will restore the test
as part of: http://gerrit.cloudera.org:8080/180

Change-Id: I96187587e490098a3c600e0e20f0c39ffb74a7fd
Reviewed-on: http://gerrit.cloudera.org:8080/184
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/187
2015-03-07 19:09:58 +00:00
Dimitris Tsirogiannis
c88d179413 IMPALA-1636: Generalize index-based partition pruning to allow constant
expressions

This commit enables fast partition pruning for cases where constant
expressions appear in binary or IN predicates. During partition pruning,
the constant expressions are evaluated in the BE and are replaced by the
computed results as LiteralExprs.

Change-Id: Ie8a2accf260391117559dc6c0a565f907c516478
Reviewed-on: http://gerrit.cloudera.org:8080/144
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-07 09:51:27 +00:00
Dimitris Tsirogiannis
3852155feb CDH-24093: Impala should produce column lineage info needed by Navigator
This change adds support for column lineage logging in Impala to be
consumed by Navigator. This feature is disabled by default and is enabled
by setting the -lineage_event_log_dir flag. When lineage logging is
enabled, the serialized column lineage graph is computed for each query
and stored in a specialized log file in JSON format.

Change-Id: Ib8d69cdbcc435be1e9c9694998c1d33ec1245b10
Reviewed-on: http://gerrit.cloudera.org:8080/70
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-02-27 11:30:13 +00:00
ishaan
ad0d723170 Fix hbase/joins Planner Tests to account for the new default regionserver ports.
Change-Id: Id7988afaaaf1073551ee90c366da78fafa4f7858
2015-02-25 23:13:11 -08:00
Dan Hecht
cc8c9cf089 S3: Synthesize file block metadata for "other" filesystems
Some Hadoop filesystems, like the S3-based ones, are not block based.
Since Impala derives scan ranges from file blocks, synthesize file
blocks for these filesystems.  Otherwise, files are always assigned a
single scan range, limiting parallelism.

An alternate approach would be to modify the planner's compute scan range
code.  However, there would be some downsides to that approach: (a) we'd
need to plumb through more information from the catalog to the frontend,
increasing the catalog size, and (b) we'd be doing more work on each
query rather than once at metadata load time.

Change-Id: If53cba6e25506545eae78190601fbee0147547b3
Reviewed-on: http://gerrit.cloudera.org:8080/54
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-02-20 03:00:27 +00:00
Alex Behm
76118bd000 Re-enable TPCDS planner tests.
These tests had been 'temporarily' disabled when moving the TPCDS schema
on CDH5 to a partitioned store_sales with DECIMAL. The intention was to
to re-enable these tests shortly after the schema change, but it was never
actually done.

Process to restore this test:
I started off with the tpcds-all.test file from our CDH4 branch and ran
it on CDH5. I investigated the following plan differences:
- Table sizes are slightly different on CDH5
- Several plan changes, e.g., join order, analytic order. All plan differences
  are due to the size difference between DECIMAL and FLOAT. The CDH4 tables
  use FLOAT, and the CDH5 tables use DECIMAL. Some plans had aggregates on those
  columns, and the size difference between, e.g., a DECIMAL(38,2) and a DOUBLE
  was significant enough to change the plan choice in several instances.

I concluded that all the plan differences are legitimate and should be accepted.

Change-Id: I11f36a543e9a5041d569c6f633fdfd296b72d31e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5672
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-29 14:44:33 -08:00
Dimitris Tsirogiannis
57132bf021 IMPALA-1535: Partition pruning with NULL
This commit fixes the issue where partition pruning returns wrong
results when a binary predicate contains a NULL literal.

Change-Id: I24c647184dcef49d12d6ff422e28667777df7784
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5443
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5549
2014-12-10 17:33:11 -08:00
Alex Behm
02db135ead IMPALA-1553: Fix sequence of UnionNode.init() calls for single node execution.
Change-Id: I6fe909f6cad30d9f594f52034e13a83156750a17
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5438
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5445
2014-11-27 00:34:17 -08:00