mirror of
https://github.com/apache/impala.git
synced 2026-02-01 03:00:22 -05:00
As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.
This fuctionality is especially useful if the query is against a
view. Note that TABLESAMPLE clause cannot be applied to a view and
embedding a TABLESAMPLE explicitly on a table within a view will
not work because we don't want to sample if there's no limit.
In this patch, a new table level hint, 'convert_limit_to_sample(n)'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. The parameter 'n' in parenthesis is
required and specifies the sample percentage. It must be an integer
between 1 and 100. For example:
set optimize_simple_limit = true;
CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample(5)]
WHERE [always_true] <predicate>;
SELECT * FROM v1 LIMIT 10;
In this case, the limit 10 is applied on top of a 5 percent sample
of T which is applied after partition pruning.
Testing:
- Added a alltypes_date_partition_2 table where the date and
timestamp values match (this helps with setting the
'always_true' hint).
- Added views with 'convert_limit_to_sample' and 'always_true'
hints and added new tests against the views. Modified a few
existing tests to reference the new table variant.
- Added an end-to-end test.
Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Reviewed-on: http://gerrit.cloudera.org:8080/16792
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>