mirror of
https://github.com/apache/impala.git
synced 2026-01-28 18:00:14 -05:00
The code mimics the code written for other min-max filters. Decimal data can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively handles these 3 storage configurations. The column definition states the precision and the precision determines the storage size. The minimum and maximum values are stored in a union. The precision from the column will come in as an input. Based on the precision the size will be found, and depending on the size appropriate variable will be used. The code in min-max-filter* follows the general convention of the file, hence uses macros. The test includes 24 decimal columns (as listed below) with the following joins: 1. Inner Join with broadcast (2 tables) 1a. 1 predicate 1b. 4 predicates - all results in decimal min-max filter 1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't 2. Inner Join with Shuffle (3 tables) 3. Right outer join (2 tables) 4. Left Semi join (2 tables) 5. Right Semi join (2 tables) Decimal Columns: 4bytes: (5,0), (5,1), (5,3), (5,5) (9,0), (9,1), (9,5), (9,9) 8 bytes: (14,0), (14,1), (14,7), (14,14) (18,0), (18,1), (18,9), (18,18) 16 bytes: (28,0), (28,1), (28,14), (28,28) (38,0), (38,1), (38,19), (38,38) The test aggregates the count of probe rows. This shows that the min-max filter is exercised, because the number of probe rows is less than the total number of rows in the probe side table. The count of probe rows is considered to be deterministic. But, it will be beneficial to look out for changes in Kudu that can change the way data is partitioned. Such a change could change the probe row count and in that case, the test will have to be updated. impala_test_suite.py and test_result_verifier.py are enhanced to support saving of aggregation using update_results. Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 Reviewed-on: http://gerrit.cloudera.org:8080/12113 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
15 KiB
15 KiB
| 1 | # Table level constraints: |
|---|---|
| 2 | # Allows for defining constraints on which file formats to generate for an individual |
| 3 | # table. The table name should match the base table name defined in the schema template |
| 4 | # file. |
| 5 | table_name:stringids, constraint:restrict_to, table_format:hbase/none/none |
| 6 | table_name:hbasecolumnfamilies, constraint:restrict_to, table_format:hbase/none/none |
| 7 | table_name:insertalltypesagg, constraint:restrict_to, table_format:hbase/none/none |
| 8 | table_name:alltypessmallbinary, constraint:restrict_to, table_format:hbase/none/none |
| 9 | table_name:insertalltypesaggbinary, constraint:restrict_to, table_format:hbase/none/none |
| 10 | table_name:hbasealltypeserror, constraint:restrict_to, table_format:hbase/none/none |
| 11 | table_name:hbasealltypeserrornonulls, constraint:restrict_to, table_format:hbase/none/none |
| 12 | table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none |
| 13 | table_name:stringpartitionkey, constraint:restrict_to, table_format:text/none/none |
| 14 | table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none |
| 15 | table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none |
| 16 | table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none |
| 17 | table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none |
| 18 | table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none |
| 19 | table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none |
| 20 | table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none |
| 21 | table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none |
| 22 | table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none |
| 23 | table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none |
| 24 | table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none |
| 25 | table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none |
| 26 | table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none |
| 27 | table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:parquet/none/none |
| 28 | table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:parquet/none/none |
| 29 | table_name:insert_string_partitioned, constraint:restrict_to, table_format:parquet/none/none |
| 30 | table_name:old_rcfile_table, constraint:restrict_to, table_format:rc/none/none |
| 31 | table_name:bad_text_lzo, constraint:restrict_to, table_format:text/lzo/block |
| 32 | table_name:bad_text_gzip, constraint:restrict_to, table_format:text/gzip/block |
| 33 | table_name:bad_seq_snap, constraint:restrict_to, table_format:seq/snap/block |
| 34 | table_name:bad_avro_snap_strings, constraint:restrict_to, table_format:avro/snap/block |
| 35 | table_name:bad_avro_snap_floats, constraint:restrict_to, table_format:avro/snap/block |
| 36 | table_name:bad_avro_decimal_schema, constraint:restrict_to, table_format:avro/snap/block |
| 37 | table_name:bad_parquet, constraint:restrict_to, table_format:parquet/none/none |
| 38 | table_name:bad_parquet_strings_negative_len, constraint:restrict_to, table_format:parquet/none/none |
| 39 | table_name:bad_parquet_strings_out_of_bounds, constraint:restrict_to, table_format:parquet/none/none |
| 40 | table_name:bad_magic_number, constraint:restrict_to, table_format:parquet/none/none |
| 41 | table_name:bad_metadata_len, constraint:restrict_to, table_format:parquet/none/none |
| 42 | table_name:bad_dict_page_offset, constraint:restrict_to, table_format:parquet/none/none |
| 43 | table_name:bad_compressed_size, constraint:restrict_to, table_format:parquet/none/none |
| 44 | table_name:alltypesagg_hive_13_1, constraint:restrict_to, table_format:parquet/none/none |
| 45 | table_name:kite_required_fields, constraint:restrict_to, table_format:parquet/none/none |
| 46 | table_name:bad_column_metadata, constraint:restrict_to, table_format:parquet/none/none |
| 47 | table_name:lineitem_multiblock, constraint:restrict_to, table_format:parquet/none/none |
| 48 | table_name:lineitem_sixblocks, constraint:restrict_to, table_format:parquet/none/none |
| 49 | table_name:lineitem_multiblock_one_row_group, constraint:restrict_to, table_format:parquet/none/none |
| 50 | table_name:customer_multiblock, constraint:restrict_to, table_format:parquet/none/none |
| 51 | # TODO: Support Avro. Data loading currently fails for Avro because complex types |
| 52 | # cannot be converted to the corresponding Avro types yet. |
| 53 | table_name:allcomplextypes, constraint:restrict_to, table_format:text/none/none |
| 54 | table_name:allcomplextypes, constraint:restrict_to, table_format:parquet/none/none |
| 55 | table_name:allcomplextypes, constraint:restrict_to, table_format:hbase/none/none |
| 56 | table_name:functional, constraint:restrict_to, table_format:text/none/none |
| 57 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:text/none/none |
| 58 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:parquet/none/none |
| 59 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:avro/snap/block |
| 60 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:rc/snap/block |
| 61 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:seq/snap/block |
| 62 | table_name:complextypes_fileformat, constraint:restrict_to, table_format:orc/def/block |
| 63 | table_name:complextypes_multifileformat, constraint:restrict_to, table_format:text/none/none |
| 64 | # TODO: Avro |
| 65 | table_name:complextypestbl, constraint:restrict_to, table_format:parquet/none/none |
| 66 | table_name:complextypestbl_medium, constraint:restrict_to, table_format:parquet/none/none |
| 67 | table_name:alltypeserror, constraint:exclude, table_format:parquet/none/none |
| 68 | table_name:alltypeserrornonulls, constraint:exclude, table_format:parquet/none/none |
| 69 | table_name:unsupported_types, constraint:exclude, table_format:parquet/none/none |
| 70 | table_name:escapechartesttable, constraint:exclude, table_format:parquet/none/none |
| 71 | table_name:TblWithRaggedColumns, constraint:exclude, table_format:parquet/none/none |
| 72 | # the text_ tables are for testing test delimiters and escape chars in text files |
| 73 | table_name:text_comma_backslash_newline, constraint:restrict_to, table_format:text/none/none |
| 74 | table_name:text_dollar_hash_pipe, constraint:restrict_to, table_format:text/none/none |
| 75 | table_name:text_thorn_ecirc_newline, constraint:restrict_to, table_format:text/none/none |
| 76 | table_name:bad_serde, constraint:restrict_to, table_format:text/none/none |
| 77 | table_name:rcfile_lazy_binary_serde, constraint:restrict_to, table_format:rc/none/none |
| 78 | table_name:unsupported_partition_types, constraint:restrict_to, table_format:text/none/none |
| 79 | table_name:nullformat_custom, constraint:exclude, table_format:parquet/none/none |
| 80 | table_name:alltypes_view, constraint:restrict_to, table_format:text/none/none |
| 81 | table_name:allcomplextypes_view, constraint:restrict_to, table_format:text/none/none |
| 82 | table_name:alltypes_view, constraint:restrict_to, table_format:seq/snap/block |
| 83 | table_name:alltypes_hive_view, constraint:restrict_to, table_format:text/none/none |
| 84 | table_name:alltypes_view_sub, constraint:restrict_to, table_format:text/none/none |
| 85 | table_name:alltypes_view_sub, constraint:restrict_to, table_format:seq/snap/block |
| 86 | table_name:alltypes_parens, constraint:restrict_to, table_format:text/none/none |
| 87 | table_name:complex_view, constraint:restrict_to, table_format:text/none/none |
| 88 | table_name:complex_view, constraint:restrict_to, table_format:seq/snap/block |
| 89 | table_name:view_view, constraint:restrict_to, table_format:text/none/none |
| 90 | table_name:view_view, constraint:restrict_to, table_format:seq/snap/block |
| 91 | table_name:subquery_view, constraint:restrict_to, table_format:seq/snap/block |
| 92 | table_name:subquery_view, constraint:restrict_to, table_format:rc/none/none |
| 93 | # liketbl, tblwithraggedcolumns and manynulls all have |
| 94 | # NULLs in primary key columns. hbase does not support |
| 95 | # writing NULLs to primary key columns. |
| 96 | table_name:liketbl, constraint:exclude, table_format:hbase/none/none |
| 97 | table_name:manynulls, constraint:exclude, table_format:hbase/none/none |
| 98 | table_name:tblwithraggedcolumns, constraint:exclude, table_format:hbase/none/none |
| 99 | # Tables with only one column are not supported in hbase. |
| 100 | table_name:greptiny, constraint:exclude, table_format:hbase/none/none |
| 101 | table_name:tinyinttable, constraint:exclude, table_format:hbase/none/none |
| 102 | # overflow uses a manually constructed text file which doesn't make sense to write to |
| 103 | # other table formats since the values that would be written are different (e.g. already |
| 104 | # truncated.) |
| 105 | table_name:overflow, constraint:restrict_to, table_format:text/none/none |
| 106 | # widerow has a single column with a single row containing a 10MB string. hbase doesn't |
| 107 | # seem to like this. |
| 108 | table_name:widerow, constraint:exclude, table_format:hbase/none/none |
| 109 | # nullformat_custom is used in null-insert tests, which user insert overwrite, |
| 110 | # which is not supported in hbase. The schema is also specified in HIVE_CREATE |
| 111 | # with no corresponding LOAD statement. |
| 112 | table_name:nullformat_custom, constraint:exclude, table_format:hbase/none/none |
| 113 | table_name:unsupported_types, constraint:exclude, table_format:hbase/none/none |
| 114 | # Decimal can only be tested on formats Impala can write to (text and parquet). |
| 115 | # TODO: add Avro once Hive or Impala can write Avro decimals |
| 116 | table_name:decimal_tbl, constraint:restrict_to, table_format:text/none/none |
| 117 | table_name:decimal_tiny, constraint:restrict_to, table_format:text/none/none |
| 118 | table_name:decimal_tbl, constraint:restrict_to, table_format:parquet/none/none |
| 119 | table_name:decimal_tiny, constraint:restrict_to, table_format:parquet/none/none |
| 120 | table_name:decimal_tbl, constraint:restrict_to, table_format:kudu/none/none |
| 121 | table_name:decimal_tiny, constraint:restrict_to, table_format:kudu/none/none |
| 122 | table_name:decimal_tbl, constraint:restrict_to, table_format:orc/def/block |
| 123 | table_name:decimal_tiny, constraint:restrict_to, table_format:orc/def/block |
| 124 | table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:text/none/none |
| 125 | table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:parquet/none/none |
| 126 | table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:kudu/none/none |
| 127 | table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:orc/def/block |
| 128 | table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:text/none/none |
| 129 | table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:parquet/none/none |
| 130 | table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:kudu/none/none |
| 131 | table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:orc/def/block |
| 132 | table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block |
| 133 | # CHAR is not supported by HBase. |
| 134 | table_name:chars_tiny, constraint:exclude, table_format:hbase/none/none |
| 135 | table_name:chars_medium, constraint:exclude, table_format:hbase/none/none |
| 136 | # invalid_decimal_part_tbl[1,2,3] tables are used for testing invalid decimal |
| 137 | # partition key values (see IMPALA-1040) |
| 138 | table_name:invalid_decimal_part_tbl1, constraint:restrict_to, table_format:text/none/none |
| 139 | table_name:invalid_decimal_part_tbl2, constraint:restrict_to, table_format:text/none/none |
| 140 | table_name:invalid_decimal_part_tbl3, constraint:restrict_to, table_format:text/none/none |
| 141 | table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block |
| 142 | # testescape tables are used for testing text scanner delimiter handling |
| 143 | table_name:table_no_newline, constraint:restrict_to, table_format:text/none/none |
| 144 | table_name:table_no_newline_part, constraint:restrict_to, table_format:text/none/none |
| 145 | table_name:testescape_16_lf, constraint:restrict_to, table_format:text/none/none |
| 146 | table_name:testescape_16_crlf, constraint:restrict_to, table_format:text/none/none |
| 147 | table_name:testescape_17_lf, constraint:restrict_to, table_format:text/none/none |
| 148 | table_name:testescape_17_crlf, constraint:restrict_to, table_format:text/none/none |
| 149 | table_name:testescape_32_lf, constraint:restrict_to, table_format:text/none/none |
| 150 | table_name:testescape_32_crlf, constraint:restrict_to, table_format:text/none/none |
| 151 | # alltimezones is used to verify that impala properly deals with timezones |
| 152 | table_name:alltimezones, constraint:restrict_to, table_format:text/none/none |
| 153 | # Avro schema is inferred from the column definitions (IMPALA-1136) |
| 154 | table_name:no_avro_schema, constraint:restrict_to, table_format:avro/snap/block |
| 155 | table_name:avro_unicode_nulls, constraint:restrict_to, table_format:avro/snap/block |
| 156 | # test single and multi stream bz2 files |
| 157 | table_name:bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block |
| 158 | table_name:large_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block |
| 159 | table_name:multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block |
| 160 | table_name:large_multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block |
| 161 | # Kudu can't handle certain types such as timestamp so we pick and choose the tables |
| 162 | # we actually use for Kudu related tests. |
| 163 | table_name:alltypes, constraint:only, table_format:kudu/none/none |
| 164 | table_name:alltypessmall, constraint:only, table_format:kudu/none/none |
| 165 | table_name:alltypestiny, constraint:only, table_format:kudu/none/none |
| 166 | table_name:alltypesagg, constraint:only, table_format:kudu/none/none |
| 167 | table_name:alltypesaggnonulls, constraint:only, table_format:kudu/none/none |
| 168 | table_name:testtbl, constraint:only, table_format:kudu/none/none |
| 169 | table_name:jointbl, constraint:only, table_format:kudu/none/none |
| 170 | table_name:emptytable, constraint:only, table_format:kudu/none/none |
| 171 | table_name:dimtbl, constraint:only, table_format:kudu/none/none |
| 172 | table_name:tinytable, constraint:only, table_format:kudu/none/none |
| 173 | table_name:tinyinttable, constraint:only, table_format:kudu/none/none |
| 174 | table_name:zipcode_incomes, constraint:only, table_format:kudu/none/none |
| 175 | table_name:nulltable, constraint:only, table_format:kudu/none/none |
| 176 | table_name:nullescapedtable, constraint:only, table_format:kudu/none/none |
| 177 | table_name:decimal_tbl, constraint:only, table_format:kudu/none/none |
| 178 | table_name:decimal_rtf_tbl, constraint:only, table_format:kudu/none/none |
| 179 | table_name:decimal_rtf_tiny_tbl, constraint:only, table_format:kudu/none/none |
| 180 | table_name:decimal_tiny, constraint:only, table_format:kudu/none/none |
| 181 | table_name:strings_with_quotes, constraint:only, table_format:kudu/none/none |
| 182 | table_name:manynulls, constraint:only, table_format:kudu/none/none |
| 183 | # Skipping header lines is only effective with text tables |
| 184 | table_name:table_with_header, constraint:restrict_to, table_format:text/none/none |
| 185 | table_name:table_with_header_2, constraint:restrict_to, table_format:text/none/none |
| 186 | table_name:table_with_header_insert, constraint:restrict_to, table_format:text/none/none |
| 187 | # We also test that skipping header lines works on compressed tables (IMPALA-5287) |
| 188 | table_name:table_with_header, constraint:restrict_to, table_format:text/gzip/block |
| 189 | table_name:table_with_header_2, constraint:restrict_to, table_format:text/gzip/block |
| 190 | table_name:table_with_header_insert, constraint:restrict_to, table_format:text/gzip/block |
| 191 | # Inserting into parquet tables should not be affected by the 'skip.header.line.count' |
| 192 | # property, so we test parquet format as well. |
| 193 | table_name:table_with_header_insert, constraint:restrict_to, table_format:parquet/none/none |