Files
impala/testdata/datasets/functional/schema_constraints.csv
Janaki Lahorani aacd5c35d3 IMPALA-6533: Add min-max filter for decimal types on kudu tables.
The code mimics the code written for other min-max filters.  Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes.  The code respectively
handles these 3 storage configurations.  The column definition states the
precision and the precision determines the storage size.

The minimum and maximum values are stored in a union.  The precision from
the column will come in as an input.  Based on the precision the size will be
found, and depending on the size appropriate variable will be used.

The code in min-max-filter* follows the general convention of the file, hence
uses macros.

The test includes 24 decimal columns (as listed below) with the following joins:
1.  Inner Join with broadcast (2 tables)
  1a. 1 predicate
  1b. 4 predicates - all results in decimal min-max filter
  1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2.  Inner Join with Shuffle (3 tables)
3.  Right outer join (2 tables)
4.  Left Semi join (2 tables)
5.  Right Semi join (2 tables)

Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)

The test aggregates the count of probe rows.  This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table.  The count of probe rows is considered to be
deterministic.  But, it will be beneficial to look out for changes in Kudu that can
change the way data is partitioned.  Such a change could change the probe row count
and in that case, the test will have to be updated.

impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.

Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Reviewed-on: http://gerrit.cloudera.org:8080/12113
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-10 03:32:25 +00:00

15 KiB

1# Table level constraints:
2# Allows for defining constraints on which file formats to generate for an individual
3# table. The table name should match the base table name defined in the schema template
4# file.
5table_name:stringids, constraint:restrict_to, table_format:hbase/none/none
6table_name:hbasecolumnfamilies, constraint:restrict_to, table_format:hbase/none/none
7table_name:insertalltypesagg, constraint:restrict_to, table_format:hbase/none/none
8table_name:alltypessmallbinary, constraint:restrict_to, table_format:hbase/none/none
9table_name:insertalltypesaggbinary, constraint:restrict_to, table_format:hbase/none/none
10table_name:hbasealltypeserror, constraint:restrict_to, table_format:hbase/none/none
11table_name:hbasealltypeserrornonulls, constraint:restrict_to, table_format:hbase/none/none
12table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none
13table_name:stringpartitionkey, constraint:restrict_to, table_format:text/none/none
14table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none
15table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none
16table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none
17table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none
18table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none
19table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none
20table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none
21table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none
22table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none
23table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none
24table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none
25table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none
26table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none
27table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:parquet/none/none
28table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:parquet/none/none
29table_name:insert_string_partitioned, constraint:restrict_to, table_format:parquet/none/none
30table_name:old_rcfile_table, constraint:restrict_to, table_format:rc/none/none
31table_name:bad_text_lzo, constraint:restrict_to, table_format:text/lzo/block
32table_name:bad_text_gzip, constraint:restrict_to, table_format:text/gzip/block
33table_name:bad_seq_snap, constraint:restrict_to, table_format:seq/snap/block
34table_name:bad_avro_snap_strings, constraint:restrict_to, table_format:avro/snap/block
35table_name:bad_avro_snap_floats, constraint:restrict_to, table_format:avro/snap/block
36table_name:bad_avro_decimal_schema, constraint:restrict_to, table_format:avro/snap/block
37table_name:bad_parquet, constraint:restrict_to, table_format:parquet/none/none
38table_name:bad_parquet_strings_negative_len, constraint:restrict_to, table_format:parquet/none/none
39table_name:bad_parquet_strings_out_of_bounds, constraint:restrict_to, table_format:parquet/none/none
40table_name:bad_magic_number, constraint:restrict_to, table_format:parquet/none/none
41table_name:bad_metadata_len, constraint:restrict_to, table_format:parquet/none/none
42table_name:bad_dict_page_offset, constraint:restrict_to, table_format:parquet/none/none
43table_name:bad_compressed_size, constraint:restrict_to, table_format:parquet/none/none
44table_name:alltypesagg_hive_13_1, constraint:restrict_to, table_format:parquet/none/none
45table_name:kite_required_fields, constraint:restrict_to, table_format:parquet/none/none
46table_name:bad_column_metadata, constraint:restrict_to, table_format:parquet/none/none
47table_name:lineitem_multiblock, constraint:restrict_to, table_format:parquet/none/none
48table_name:lineitem_sixblocks, constraint:restrict_to, table_format:parquet/none/none
49table_name:lineitem_multiblock_one_row_group, constraint:restrict_to, table_format:parquet/none/none
50table_name:customer_multiblock, constraint:restrict_to, table_format:parquet/none/none
51# TODO: Support Avro. Data loading currently fails for Avro because complex types
52# cannot be converted to the corresponding Avro types yet.
53table_name:allcomplextypes, constraint:restrict_to, table_format:text/none/none
54table_name:allcomplextypes, constraint:restrict_to, table_format:parquet/none/none
55table_name:allcomplextypes, constraint:restrict_to, table_format:hbase/none/none
56table_name:functional, constraint:restrict_to, table_format:text/none/none
57table_name:complextypes_fileformat, constraint:restrict_to, table_format:text/none/none
58table_name:complextypes_fileformat, constraint:restrict_to, table_format:parquet/none/none
59table_name:complextypes_fileformat, constraint:restrict_to, table_format:avro/snap/block
60table_name:complextypes_fileformat, constraint:restrict_to, table_format:rc/snap/block
61table_name:complextypes_fileformat, constraint:restrict_to, table_format:seq/snap/block
62table_name:complextypes_fileformat, constraint:restrict_to, table_format:orc/def/block
63table_name:complextypes_multifileformat, constraint:restrict_to, table_format:text/none/none
64# TODO: Avro
65table_name:complextypestbl, constraint:restrict_to, table_format:parquet/none/none
66table_name:complextypestbl_medium, constraint:restrict_to, table_format:parquet/none/none
67table_name:alltypeserror, constraint:exclude, table_format:parquet/none/none
68table_name:alltypeserrornonulls, constraint:exclude, table_format:parquet/none/none
69table_name:unsupported_types, constraint:exclude, table_format:parquet/none/none
70table_name:escapechartesttable, constraint:exclude, table_format:parquet/none/none
71table_name:TblWithRaggedColumns, constraint:exclude, table_format:parquet/none/none
72# the text_ tables are for testing test delimiters and escape chars in text files
73table_name:text_comma_backslash_newline, constraint:restrict_to, table_format:text/none/none
74table_name:text_dollar_hash_pipe, constraint:restrict_to, table_format:text/none/none
75table_name:text_thorn_ecirc_newline, constraint:restrict_to, table_format:text/none/none
76table_name:bad_serde, constraint:restrict_to, table_format:text/none/none
77table_name:rcfile_lazy_binary_serde, constraint:restrict_to, table_format:rc/none/none
78table_name:unsupported_partition_types, constraint:restrict_to, table_format:text/none/none
79table_name:nullformat_custom, constraint:exclude, table_format:parquet/none/none
80table_name:alltypes_view, constraint:restrict_to, table_format:text/none/none
81table_name:allcomplextypes_view, constraint:restrict_to, table_format:text/none/none
82table_name:alltypes_view, constraint:restrict_to, table_format:seq/snap/block
83table_name:alltypes_hive_view, constraint:restrict_to, table_format:text/none/none
84table_name:alltypes_view_sub, constraint:restrict_to, table_format:text/none/none
85table_name:alltypes_view_sub, constraint:restrict_to, table_format:seq/snap/block
86table_name:alltypes_parens, constraint:restrict_to, table_format:text/none/none
87table_name:complex_view, constraint:restrict_to, table_format:text/none/none
88table_name:complex_view, constraint:restrict_to, table_format:seq/snap/block
89table_name:view_view, constraint:restrict_to, table_format:text/none/none
90table_name:view_view, constraint:restrict_to, table_format:seq/snap/block
91table_name:subquery_view, constraint:restrict_to, table_format:seq/snap/block
92table_name:subquery_view, constraint:restrict_to, table_format:rc/none/none
93# liketbl, tblwithraggedcolumns and manynulls all have
94# NULLs in primary key columns. hbase does not support
95# writing NULLs to primary key columns.
96table_name:liketbl, constraint:exclude, table_format:hbase/none/none
97table_name:manynulls, constraint:exclude, table_format:hbase/none/none
98table_name:tblwithraggedcolumns, constraint:exclude, table_format:hbase/none/none
99# Tables with only one column are not supported in hbase.
100table_name:greptiny, constraint:exclude, table_format:hbase/none/none
101table_name:tinyinttable, constraint:exclude, table_format:hbase/none/none
102# overflow uses a manually constructed text file which doesn't make sense to write to
103# other table formats since the values that would be written are different (e.g. already
104# truncated.)
105table_name:overflow, constraint:restrict_to, table_format:text/none/none
106# widerow has a single column with a single row containing a 10MB string. hbase doesn't
107# seem to like this.
108table_name:widerow, constraint:exclude, table_format:hbase/none/none
109# nullformat_custom is used in null-insert tests, which user insert overwrite,
110# which is not supported in hbase. The schema is also specified in HIVE_CREATE
111# with no corresponding LOAD statement.
112table_name:nullformat_custom, constraint:exclude, table_format:hbase/none/none
113table_name:unsupported_types, constraint:exclude, table_format:hbase/none/none
114# Decimal can only be tested on formats Impala can write to (text and parquet).
115# TODO: add Avro once Hive or Impala can write Avro decimals
116table_name:decimal_tbl, constraint:restrict_to, table_format:text/none/none
117table_name:decimal_tiny, constraint:restrict_to, table_format:text/none/none
118table_name:decimal_tbl, constraint:restrict_to, table_format:parquet/none/none
119table_name:decimal_tiny, constraint:restrict_to, table_format:parquet/none/none
120table_name:decimal_tbl, constraint:restrict_to, table_format:kudu/none/none
121table_name:decimal_tiny, constraint:restrict_to, table_format:kudu/none/none
122table_name:decimal_tbl, constraint:restrict_to, table_format:orc/def/block
123table_name:decimal_tiny, constraint:restrict_to, table_format:orc/def/block
124table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:text/none/none
125table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:parquet/none/none
126table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:kudu/none/none
127table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:orc/def/block
128table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:text/none/none
129table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:parquet/none/none
130table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:kudu/none/none
131table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:orc/def/block
132table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block
133# CHAR is not supported by HBase.
134table_name:chars_tiny, constraint:exclude, table_format:hbase/none/none
135table_name:chars_medium, constraint:exclude, table_format:hbase/none/none
136# invalid_decimal_part_tbl[1,2,3] tables are used for testing invalid decimal
137# partition key values (see IMPALA-1040)
138table_name:invalid_decimal_part_tbl1, constraint:restrict_to, table_format:text/none/none
139table_name:invalid_decimal_part_tbl2, constraint:restrict_to, table_format:text/none/none
140table_name:invalid_decimal_part_tbl3, constraint:restrict_to, table_format:text/none/none
141table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block
142# testescape tables are used for testing text scanner delimiter handling
143table_name:table_no_newline, constraint:restrict_to, table_format:text/none/none
144table_name:table_no_newline_part, constraint:restrict_to, table_format:text/none/none
145table_name:testescape_16_lf, constraint:restrict_to, table_format:text/none/none
146table_name:testescape_16_crlf, constraint:restrict_to, table_format:text/none/none
147table_name:testescape_17_lf, constraint:restrict_to, table_format:text/none/none
148table_name:testescape_17_crlf, constraint:restrict_to, table_format:text/none/none
149table_name:testescape_32_lf, constraint:restrict_to, table_format:text/none/none
150table_name:testescape_32_crlf, constraint:restrict_to, table_format:text/none/none
151# alltimezones is used to verify that impala properly deals with timezones
152table_name:alltimezones, constraint:restrict_to, table_format:text/none/none
153# Avro schema is inferred from the column definitions (IMPALA-1136)
154table_name:no_avro_schema, constraint:restrict_to, table_format:avro/snap/block
155table_name:avro_unicode_nulls, constraint:restrict_to, table_format:avro/snap/block
156# test single and multi stream bz2 files
157table_name:bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
158table_name:large_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
159table_name:multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
160table_name:large_multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
161# Kudu can't handle certain types such as timestamp so we pick and choose the tables
162# we actually use for Kudu related tests.
163table_name:alltypes, constraint:only, table_format:kudu/none/none
164table_name:alltypessmall, constraint:only, table_format:kudu/none/none
165table_name:alltypestiny, constraint:only, table_format:kudu/none/none
166table_name:alltypesagg, constraint:only, table_format:kudu/none/none
167table_name:alltypesaggnonulls, constraint:only, table_format:kudu/none/none
168table_name:testtbl, constraint:only, table_format:kudu/none/none
169table_name:jointbl, constraint:only, table_format:kudu/none/none
170table_name:emptytable, constraint:only, table_format:kudu/none/none
171table_name:dimtbl, constraint:only, table_format:kudu/none/none
172table_name:tinytable, constraint:only, table_format:kudu/none/none
173table_name:tinyinttable, constraint:only, table_format:kudu/none/none
174table_name:zipcode_incomes, constraint:only, table_format:kudu/none/none
175table_name:nulltable, constraint:only, table_format:kudu/none/none
176table_name:nullescapedtable, constraint:only, table_format:kudu/none/none
177table_name:decimal_tbl, constraint:only, table_format:kudu/none/none
178table_name:decimal_rtf_tbl, constraint:only, table_format:kudu/none/none
179table_name:decimal_rtf_tiny_tbl, constraint:only, table_format:kudu/none/none
180table_name:decimal_tiny, constraint:only, table_format:kudu/none/none
181table_name:strings_with_quotes, constraint:only, table_format:kudu/none/none
182table_name:manynulls, constraint:only, table_format:kudu/none/none
183# Skipping header lines is only effective with text tables
184table_name:table_with_header, constraint:restrict_to, table_format:text/none/none
185table_name:table_with_header_2, constraint:restrict_to, table_format:text/none/none
186table_name:table_with_header_insert, constraint:restrict_to, table_format:text/none/none
187# We also test that skipping header lines works on compressed tables (IMPALA-5287)
188table_name:table_with_header, constraint:restrict_to, table_format:text/gzip/block
189table_name:table_with_header_2, constraint:restrict_to, table_format:text/gzip/block
190table_name:table_with_header_insert, constraint:restrict_to, table_format:text/gzip/block
191# Inserting into parquet tables should not be affected by the 'skip.header.line.count'
192# property, so we test parquet format as well.
193table_name:table_with_header_insert, constraint:restrict_to, table_format:parquet/none/none