Files
impala/testdata/bin
Aman Sinha b5ba793227 IMPALA-10360: Allow simple limit to be treated as sampling hint
As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.

This fuctionality is especially useful if the query is against a
view. Note that TABLESAMPLE clause cannot be applied to a view and
embedding a TABLESAMPLE explicitly on a table within a view will
not work because we don't want to sample if there's no limit.

In this patch, a new table level hint, 'convert_limit_to_sample(n)'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. The parameter 'n' in parenthesis is
required and specifies the sample percentage. It must be an integer
between 1 and 100. For example:

 set optimize_simple_limit = true;
 CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample(5)]
    WHERE [always_true] <predicate>;
 SELECT * FROM v1 LIMIT 10;

In this case, the limit 10 is applied on top of a 5 percent sample
of T which is applied after partition pruning.

Testing:
 - Added a alltypes_date_partition_2 table where the date and
   timestamp values match (this helps with setting the
   'always_true' hint).
 - Added views with 'convert_limit_to_sample' and 'always_true'
   hints and added new tests against the views. Modified a few
   existing tests to reference the new table variant.
 - Added an end-to-end test.

Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Reviewed-on: http://gerrit.cloudera.org:8080/16792
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-10 07:15:36 +00:00
..
2020-05-20 17:43:40 +00:00