Files
impala/testdata/workloads/functional-planner/queries/PlannerTest/empty.test
Henry Robinson 2212240106 IMPALA-2552: Runtime filter forwarding between joins and scans
This patch adds the ability for operators to compute and forward bitmap
filters from one operator to another, across fragment and machine
boundaries.

Filters are provided as part of the plan from the frontend. In this
patch hash join nodes produce filters from their build side input, and
propagate them, via the query's coordinator, to the scan nodes which
provide the probe side for that join. The scan nodes may then filter
their rows before they are sent to the join, reducing the amount of work
the join has to do.

Filters are attached to the local RuntimeState's RuntimeFilterBank by
the join node. When complete, they are asynchronously sent to the
coordinator via a new UpdateFilter() RPC. The coordinator maintains a
routing table that maps incoming filters to their recipient
backends. For partitioned joins, the filters must be aggregated from all
providers. The coordinator performs this aggregation and transmits the
completed filter only when all inputs have been received.

In this patch, filtering can occur in up to four places in a scan:

1. Before initial scan ranges are issued (all file formats, partition
   columns only)
2. Before each scan range is processed (all file formats, partition
   columns only)
3. Before a row group is processed (Parquet, partition columns only)
4. During assembly of every row (Parquet, any column)

This patch also replaces the existing bitmap-based filters with Bloom
Filter based ones.

The Bloom Filters are statically sized to have an expected false positive rate of
0.1 on 2^20 distinct items. This yields Bloom Filters of 1MB in
size. This is configurable by setting --bloom_filter_size, and we
will perform tests to determine a good default. The query option
RUNTIME_BLOOM_FILTER_SIZE can override the command-line flag on a
per-query basis.

This patch also simplifies and improves the memory handling for
allocated filters by the RuntimeFilterBank. New filters are tracked
through the query memory tracker, and owned by the fragment instance's
RuntimeState::obj_pool().

This patch also adds a simple heuristic to disable filter creation based
on estimated false-positive rate for the Bloom Filter. By default the
maximum FP rate is set to 75%. It can be controlled by setting
--max_filter_error_rate.

Finally, this patch adds short-circuit publication for filters that are
broadcast, and does so always even when distributed runtime filter
propagation is disabled.

To avoid cross-compilation problems, bloom-filter.h was rewritten in C++98.

Change-Id: Icea03a87cf1705c1b4aa46f86f13141c4b58da10
Reviewed-on: http://gerrit.cloudera.org:8080/1861
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2016-02-13 16:19:41 +00:00

484 lines
12 KiB
Plaintext

# Constant conjunct in WHERE clause turns query block into an empty-set node.
select t1.id, t2.id
from functional.alltypestiny t1
left outer join functional.alltypes t2
on t1.id = t2.id
where false
---- PLAN
00:EMPTYSET
====
# HBase scan turns into empty-set node due to a constant conjunct.
select * from functional_hbase.alltypessmall where false
---- PLAN
00:EMPTYSET
====
# Data source scan turns into empty-set node due to a constant conjunct.
select *
from functional.alltypes_datasource a
inner join functional.alltypestiny b
on a.id = b.id
where length("a") > 7
---- PLAN
00:EMPTYSET
====
# Constant conjunct in ON clause turns query block into an empty-set node.
select *
from functional.alltypestiny t1
inner join functional.alltypes t2
on (t1.id = t2.id and (false or false))
---- PLAN
00:EMPTYSET
====
# Constant conjunct in WHERE clause turns query block into an aggregation
# fed by an empty-set node.
select count(int_col), avg(double_col), count(*)
from functional.alltypes
where null
---- PLAN
01:AGGREGATE [FINALIZE]
| output: count(int_col), avg(double_col), count(*)
|
00:EMPTYSET
====
# Constant conjunct in ON clause turns query block into an aggregation
# fed by an empty-sed node.
select count(*)
from functional.alltypestiny t1
inner join functional.alltypes t2
on (t1.id = t2.id and (false or false))
---- PLAN
01:AGGREGATE [FINALIZE]
| output: count(*)
|
00:EMPTYSET
====
# Constant conjunct in HAVING clause turns query block into an empty-set node,
# regardless of aggregation.
select t1.int_col, count(*)
from functional.alltypestiny t1
left outer join functional.alltypes t2
on t1.id = t2.id
group by t1.int_col
having ifnull(null, false)
---- PLAN
00:EMPTYSET
====
# Constant conjunct causes empty-set inline view.
select e.id, f.id
from functional.alltypessmall f
inner join
(select t1.id
from functional.alltypestiny t1
left outer join functional.alltypes t2
on t1.id = t2.id
where 1 + 3 > 10) e
on e.id = f.id
---- PLAN
02:HASH JOIN [INNER JOIN]
| hash predicates: f.id = t1.id
| runtime filters: RF000 <- t1.id
|
|--01:EMPTYSET
|
00:SCAN HDFS [functional.alltypessmall f]
partitions=4/4 files=4 size=6.32KB
runtime filters: RF000 -> f.id
====
# Constant conjunct causes union operand to be dropped.
select * from functional.alltypessmall
union all
select * from functional.alltypes where "abc" = "cde"
union all
select * from functional.alltypestiny
---- PLAN
00:UNION
|
|--02:SCAN HDFS [functional.alltypestiny]
| partitions=4/4 files=4 size=460B
|
01:SCAN HDFS [functional.alltypessmall]
partitions=4/4 files=4 size=6.32KB
====
# Constant conjunct turns union into an empty-set node.
select *
from functional.alltypes a
full outer join
(select * from
(select * from functional.alltypestiny
union all
select * from functional.alltypessmall) t1
where null) t2
on a.id = t2.id
---- PLAN
02:HASH JOIN [FULL OUTER JOIN]
| hash predicates: a.id = id
|
|--01:EMPTYSET
|
00:SCAN HDFS [functional.alltypes a]
partitions=24/24 files=24 size=478.45KB
====
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
left outer join functional.alltypestiny b
on (a.id = b.id and 1 + 1 > 10)
---- PLAN
02:HASH JOIN [LEFT OUTER JOIN]
| hash predicates: a.id = b.id
| other join predicates: 1 + 1 > 10
|
|--01:SCAN HDFS [functional.alltypestiny b]
| partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
partitions=4/4 files=4 size=6.32KB
====
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
right outer join functional.alltypestiny b
on (a.id = b.id and !true)
---- PLAN
02:HASH JOIN [RIGHT OUTER JOIN]
| hash predicates: a.id = b.id
| other join predicates: NOT TRUE
| runtime filters: RF000 <- b.id
|
|--01:SCAN HDFS [functional.alltypestiny b]
| partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
partitions=4/4 files=4 size=6.32KB
runtime filters: RF000 -> a.id
====
# Constant conjunct in the ON-clause of an outer join is
# assigned to the join.
select *
from functional.alltypessmall a
full outer join functional.alltypestiny b
on (a.id = b.id and null = "abc")
---- PLAN
02:HASH JOIN [FULL OUTER JOIN]
| hash predicates: a.id = b.id
| other join predicates: NULL = 'abc'
|
|--01:SCAN HDFS [functional.alltypestiny b]
| partitions=4/4 files=4 size=460B
|
00:SCAN HDFS [functional.alltypessmall a]
partitions=4/4 files=4 size=6.32KB
====
# Limit 0 turns query block into an empty-set node.
select t1.id, t2.id
from functional.alltypestiny t1
left outer join functional.alltypes t2
on t1.id = t2.id
limit 0
---- PLAN
00:EMPTYSET
====
# Limit 0 turns query block into an empty-set node.
select count(int_col), avg(double_col), count(*)
from functional.alltypes
limit 0
---- PLAN
00:EMPTYSET
====
# Limit 0 causes empty-set inline view.
select e.id, f.id
from functional.alltypessmall f
inner join
(select t1.id
from functional.alltypestiny t1
left outer join functional.alltypes t2
on t1.id = t2.id
limit 0) e
on e.id = f.id
---- PLAN
02:HASH JOIN [INNER JOIN]
| hash predicates: f.id = t1.id
| runtime filters: RF000 <- t1.id
|
|--01:EMPTYSET
|
00:SCAN HDFS [functional.alltypessmall f]
partitions=4/4 files=4 size=6.32KB
runtime filters: RF000 -> f.id
====
# Limit 0 causes union operand to be dropped.
select * from functional.alltypessmall
union all
select * from functional.alltypes limit 0
union all
select * from functional.alltypestiny
---- PLAN
00:UNION
|
|--02:SCAN HDFS [functional.alltypestiny]
| partitions=4/4 files=4 size=460B
|
01:SCAN HDFS [functional.alltypessmall]
partitions=4/4 files=4 size=6.32KB
====
# Limit 0 causes empty-set union.
select * from functional.alltypessmall
union all
select * from functional.alltypes where "abc" = "cde"
union all
(select * from functional.alltypestiny)
limit 0
---- PLAN
00:EMPTYSET
====
# Inline view with a constant select stmt that is guaranteed to be empty.
select count(w1.c1)
from
(select 1 as c1 from functional.alltypessmall)
w1 where w1.c1 is null
union all
select int_col from functional.alltypesagg
---- PLAN
00:UNION
|
|--03:SCAN HDFS [functional.alltypesagg]
| partitions=11/11 files=11 size=814.73KB
|
02:AGGREGATE [FINALIZE]
| output: count(1)
|
01:EMPTYSET
====
# IMPALA-1234: Analytic with constant empty result set failed precondition check in FE
select MIN(int_col) OVER () FROM functional.alltypes limit 0
---- PLAN
00:EMPTYSET
====
# IMPALA-1860: INSERT/CTAS should evaluate and apply constant predicates.
insert into functional.alltypes partition(year, month)
select * from functional.alltypes where 1 = 0
---- PLAN
WRITE TO HDFS [functional.alltypes, OVERWRITE=false, PARTITION-KEYS=(functional.alltypes.year,functional.alltypes.month)]
| partitions=24
|
00:EMPTYSET
====
# IMPALA-1860: INSERT/CTAS should evaluate and apply constant predicates.
with t as (select * from functional.alltypes where coalesce(NULL) > 10)
insert into functional.alltypes partition(year, month)
select * from t
---- PLAN
WRITE TO HDFS [functional.alltypes, OVERWRITE=false, PARTITION-KEYS=(functional.alltypes.year,functional.alltypes.month)]
| partitions=24
|
00:EMPTYSET
====
# IMPALA-1860: INSERT/CTAS should evaluate and apply constant predicates.
create table test_1860 as
select * from (select 1 from functional.alltypes limit 0) v
---- PLAN
WRITE TO HDFS [default.test_1860, OVERWRITE=false]
| partitions=1
|
00:EMPTYSET
====
# IMPALA-1960: Exprs in the aggregation that reference slots from an inline view when
# the select stmt has an empty select-project-join portion.
select sum(T.id), count(T.int_col)
from
(select id, int_col, bigint_col from functional.alltypestiny) T
where false
---- PLAN
01:AGGREGATE [FINALIZE]
| output: sum(id), count(int_col)
|
00:EMPTYSET
====
# IMPALA-1960: Exprs in the aggregation that reference slots from an inline view when
# the select stmt has an empty select-project-join portion.
select sum(T1.id + T2.int_col)
from
(select id, bigint_col from functional.alltypestiny) T1 inner join
(select id, int_col from functional.alltypestiny) T2 on (T1.id = T2.id)
where T1.bigint_col < 10 and 1 > 1
---- PLAN
01:AGGREGATE [FINALIZE]
| output: sum(id + int_col)
|
00:EMPTYSET
====
# IMPALA-1960: Exprs in the aggregation that reference slots from an inline view when
# the select stmt has an empty select-project-join portion.
select count(distinct T1.int_col)
from
(select id, int_col from functional.alltypestiny) T1 inner join
functional.alltypessmall T2 on T1.id = T2.id
where T2.bigint_col < 10 and false
---- PLAN
02:AGGREGATE [FINALIZE]
| output: count(T1.int_col)
|
01:AGGREGATE
| group by: int_col
|
00:EMPTYSET
====
# IMPALA-2088: Test empty union operands with analytic functions.
# TODO: Simplify the plan of unions with empty operands using an empty set node.
# TODO: Simplify the plan of unions with only a single non-empty operand to not
# use a union node (this is tricky because a union materializes a new tuple).
select lead(-496, 81) over (order by t1.double_col desc, t1.id asc)
from functional.alltypestiny t1 where 5 = 6
union
select 794.67
from functional.alltypes t1 where 5 = 6
union all
select coalesce(10.4, int_col)
from functional.alltypes where false
---- PLAN
02:UNION
|
01:AGGREGATE [FINALIZE]
| group by: lead(-496, 81, NULL) OVER(...)
|
00:UNION
====
# IMPALA-2088: Test empty union operands with analytic functions.
select lead(-496, 81) over (order by t1.double_col desc, t1.id asc)
from functional.alltypestiny t1 where 5 = 6
union
select 794.67
from functional.alltypes t1 where 5 = 6
union all
select coalesce(10.4, int_col)
from functional.alltypes where false
union all
select 1
union all select bigint_col
from functional.alltypestiny
---- PLAN
02:UNION
| constant-operands=1
|
|--01:AGGREGATE [FINALIZE]
| | group by: lead(-496, 81, NULL) OVER(...)
| |
| 00:UNION
|
03:SCAN HDFS [functional.alltypestiny]
partitions=4/4 files=4 size=460B
====
# IMPALA-2216: Make sure the final output exprs are substituted, even
# if the resulting plan is an EmptySetNode.
select * from (select 10 as i, 2 as j, '2013' as s) as t
where t.i < 10
---- PLAN
00:EMPTYSET
====
# IMPALA-2216: Make sure the final output exprs are substituted, even
# if the table sink is fed by an EmptySetNode.
insert into functional.alltypes (id) partition(year,month)
select * from (select 10 as i, 2 as j, 2013 as s) as t
where t.i < 10
---- PLAN
WRITE TO HDFS [functional.alltypes, OVERWRITE=false, PARTITION-KEYS=(2,2013)]
| partitions=1
|
00:EMPTYSET
====
# IMPALA-2430: Test empty blocks containing relative table refs.
select c_custkey, v1.cnt, v2.o_orderkey, v3.l_linenumber, v4.cnt
from tpch_nested_parquet.customer c
left outer join
(select count(*) cnt from c.c_orders
where false) v1
left outer join
(select o_orderkey from c.c_orders
where 20 < 10) v2
left outer join
(select l_linenumber from c.c_orders.o_lineitems
where "a" in ("b", "c")) v3
left outer join
(select count(*) cnt from c.c_orders o left outer join
(select l_linenumber from o.o_lineitems
where null) nv) v4
where c_custkey < 10
---- PLAN
01:SUBPLAN
|
|--16:NESTED LOOP JOIN [LEFT OUTER JOIN]
| |
| |--12:AGGREGATE [FINALIZE]
| | | output: count(*)
| | |
| | 08:SUBPLAN
| | |
| | |--11:NESTED LOOP JOIN [RIGHT OUTER JOIN]
| | | |
| | | |--09:SINGULAR ROW SRC
| | | |
| | | 10:EMPTYSET
| | |
| | 07:UNNEST [c.c_orders o]
| |
| 15:NESTED LOOP JOIN [LEFT OUTER JOIN]
| |
| |--06:EMPTYSET
| |
| 14:NESTED LOOP JOIN [LEFT OUTER JOIN]
| |
| |--05:EMPTYSET
| |
| 13:NESTED LOOP JOIN [RIGHT OUTER JOIN]
| |
| |--02:SINGULAR ROW SRC
| |
| 04:AGGREGATE [FINALIZE]
| | output: count(*)
| |
| 03:EMPTYSET
|
00:SCAN HDFS [tpch_nested_parquet.customer c]
partitions=1/1 files=4 size=577.87MB
predicates: c_custkey < 10
====
# IMPALA-2539: Test empty union operands containing relative table refs.
select c_custkey, o_orderkey
from tpch_nested_parquet.customer c,
(select o_orderkey from c.c_orders o1
union distinct
select o_orderkey from c.c_orders o2
where false
union all
select o_orderkey from c.c_orders o3
where false
) v1
where c_custkey = 1
---- PLAN
01:SUBPLAN
|
|--07:NESTED LOOP JOIN [CROSS JOIN]
| |
| |--02:SINGULAR ROW SRC
| |
| 06:UNION
| |
| 05:AGGREGATE [FINALIZE]
| | group by: o_orderkey
| |
| 03:UNION
| |
| 04:UNNEST [c.c_orders o1]
|
00:SCAN HDFS [tpch_nested_parquet.customer c]
partitions=1/1 files=4 size=577.87MB
predicates: c_custkey = 1
====
# IMPALA-2215: Having clause without aggregation.
select 1 from (select 1) v having 1 > 1
---- PLAN
00:EMPTYSET
====