Files
impala/testdata/datasets/functional/schema_constraints.csv
stiga-huang 9686545bfd IMPALA-6503: Support reading complex types from ORC
We've supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).

In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in
io buffer read from DiskIoMgr). The ORC lib can materialize ORC column
binaries into its representation (orc::ColumnVectorBatch). Then we
transform values in orc::ColumnVectorBatch into impala::Tuples in
hdfs-orc-scanner. We don't need to do anything about decoding/decompression
since they are handled by the ORC lib. Fortunately, the ORC lib already
supports complex types, we can still leverage it to support complex types.

What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need in the form required by the ORC
  lib (Get list of ORC type ids from tuple descriptors)
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
  Impala's representation (Slots/Tuples/RowBatches)

To format the materialization, we implement several ORC column readers
in hdfs-orc-scanner. Each kind of reader treats a column type and
transforms outputs of the ORC lib into tuple/slot values.

Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.
* Run exhaustive tests in DEBUG and RELEASE builds.

Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Reviewed-on: http://gerrit.cloudera.org:8080/12168
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-08 04:39:08 +00:00

15 KiB

1# Table level constraints:
2# Allows for defining constraints on which file formats to generate for an individual
3# table. The table name should match the base table name defined in the schema template
4# file.
5table_name:stringids, constraint:restrict_to, table_format:hbase/none/none
6table_name:hbasecolumnfamilies, constraint:restrict_to, table_format:hbase/none/none
7table_name:insertalltypesagg, constraint:restrict_to, table_format:hbase/none/none
8table_name:alltypessmallbinary, constraint:restrict_to, table_format:hbase/none/none
9table_name:insertalltypesaggbinary, constraint:restrict_to, table_format:hbase/none/none
10table_name:hbasealltypeserror, constraint:restrict_to, table_format:hbase/none/none
11table_name:hbasealltypeserrornonulls, constraint:restrict_to, table_format:hbase/none/none
12table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none
13table_name:stringpartitionkey, constraint:restrict_to, table_format:text/none/none
14table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none
15table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none
16table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none
17table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none
18table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none
19table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none
20table_name:alltypesinsert, constraint:restrict_to, table_format:text/none/none
21table_name:alltypesnopart_insert, constraint:restrict_to, table_format:text/none/none
22table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:text/none/none
23table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:text/none/none
24table_name:insert_string_partitioned, constraint:restrict_to, table_format:text/none/none
25table_name:alltypesinsert, constraint:restrict_to, table_format:parquet/none/none
26table_name:alltypesnopart_insert, constraint:restrict_to, table_format:parquet/none/none
27table_name:insert_overwrite_nopart, constraint:restrict_to, table_format:parquet/none/none
28table_name:insert_overwrite_partitioned, constraint:restrict_to, table_format:parquet/none/none
29table_name:insert_string_partitioned, constraint:restrict_to, table_format:parquet/none/none
30table_name:old_rcfile_table, constraint:restrict_to, table_format:rc/none/none
31table_name:bad_text_lzo, constraint:restrict_to, table_format:text/lzo/block
32table_name:bad_text_gzip, constraint:restrict_to, table_format:text/gzip/block
33table_name:bad_seq_snap, constraint:restrict_to, table_format:seq/snap/block
34table_name:bad_avro_snap_strings, constraint:restrict_to, table_format:avro/snap/block
35table_name:bad_avro_snap_floats, constraint:restrict_to, table_format:avro/snap/block
36table_name:bad_avro_decimal_schema, constraint:restrict_to, table_format:avro/snap/block
37table_name:bad_parquet, constraint:restrict_to, table_format:parquet/none/none
38table_name:bad_parquet_strings_negative_len, constraint:restrict_to, table_format:parquet/none/none
39table_name:bad_parquet_strings_out_of_bounds, constraint:restrict_to, table_format:parquet/none/none
40table_name:bad_magic_number, constraint:restrict_to, table_format:parquet/none/none
41table_name:bad_metadata_len, constraint:restrict_to, table_format:parquet/none/none
42table_name:bad_dict_page_offset, constraint:restrict_to, table_format:parquet/none/none
43table_name:bad_compressed_size, constraint:restrict_to, table_format:parquet/none/none
44table_name:alltypesagg_hive_13_1, constraint:restrict_to, table_format:parquet/none/none
45table_name:kite_required_fields, constraint:restrict_to, table_format:parquet/none/none
46table_name:bad_column_metadata, constraint:restrict_to, table_format:parquet/none/none
47table_name:lineitem_multiblock, constraint:restrict_to, table_format:parquet/none/none
48table_name:lineitem_sixblocks, constraint:restrict_to, table_format:parquet/none/none
49table_name:lineitem_multiblock_one_row_group, constraint:restrict_to, table_format:parquet/none/none
50table_name:customer_multiblock, constraint:restrict_to, table_format:parquet/none/none
51# TODO: Support Avro. Data loading currently fails for Avro because complex types
52# cannot be converted to the corresponding Avro types yet.
53table_name:allcomplextypes, constraint:restrict_to, table_format:text/none/none
54table_name:allcomplextypes, constraint:restrict_to, table_format:parquet/none/none
55table_name:allcomplextypes, constraint:restrict_to, table_format:hbase/none/none
56table_name:functional, constraint:restrict_to, table_format:text/none/none
57table_name:complextypes_fileformat, constraint:restrict_to, table_format:text/none/none
58table_name:complextypes_fileformat, constraint:restrict_to, table_format:parquet/none/none
59table_name:complextypes_fileformat, constraint:restrict_to, table_format:avro/snap/block
60table_name:complextypes_fileformat, constraint:restrict_to, table_format:rc/snap/block
61table_name:complextypes_fileformat, constraint:restrict_to, table_format:seq/snap/block
62table_name:complextypes_fileformat, constraint:restrict_to, table_format:orc/def/block
63table_name:complextypes_multifileformat, constraint:restrict_to, table_format:text/none/none
64# TODO: Avro
65table_name:complextypestbl, constraint:restrict_to, table_format:parquet/none/none
66table_name:complextypestbl, constraint:restrict_to, table_format:orc/def/block
67table_name:complextypestbl_medium, constraint:restrict_to, table_format:parquet/none/none
68table_name:complextypestbl_medium, constraint:restrict_to, table_format:orc/def/block
69table_name:alltypeserror, constraint:exclude, table_format:parquet/none/none
70table_name:alltypeserrornonulls, constraint:exclude, table_format:parquet/none/none
71table_name:unsupported_types, constraint:exclude, table_format:parquet/none/none
72table_name:escapechartesttable, constraint:exclude, table_format:parquet/none/none
73table_name:TblWithRaggedColumns, constraint:exclude, table_format:parquet/none/none
74# the text_ tables are for testing test delimiters and escape chars in text files
75table_name:text_comma_backslash_newline, constraint:restrict_to, table_format:text/none/none
76table_name:text_dollar_hash_pipe, constraint:restrict_to, table_format:text/none/none
77table_name:text_thorn_ecirc_newline, constraint:restrict_to, table_format:text/none/none
78table_name:bad_serde, constraint:restrict_to, table_format:text/none/none
79table_name:rcfile_lazy_binary_serde, constraint:restrict_to, table_format:rc/none/none
80table_name:unsupported_partition_types, constraint:restrict_to, table_format:text/none/none
81table_name:nullformat_custom, constraint:exclude, table_format:parquet/none/none
82table_name:alltypes_view, constraint:restrict_to, table_format:text/none/none
83table_name:allcomplextypes_view, constraint:restrict_to, table_format:text/none/none
84table_name:alltypes_view, constraint:restrict_to, table_format:seq/snap/block
85table_name:alltypes_hive_view, constraint:restrict_to, table_format:text/none/none
86table_name:alltypes_view_sub, constraint:restrict_to, table_format:text/none/none
87table_name:alltypes_view_sub, constraint:restrict_to, table_format:seq/snap/block
88table_name:alltypes_parens, constraint:restrict_to, table_format:text/none/none
89table_name:complex_view, constraint:restrict_to, table_format:text/none/none
90table_name:complex_view, constraint:restrict_to, table_format:seq/snap/block
91table_name:view_view, constraint:restrict_to, table_format:text/none/none
92table_name:view_view, constraint:restrict_to, table_format:seq/snap/block
93table_name:subquery_view, constraint:restrict_to, table_format:seq/snap/block
94table_name:subquery_view, constraint:restrict_to, table_format:rc/none/none
95# liketbl, tblwithraggedcolumns and manynulls all have
96# NULLs in primary key columns. hbase does not support
97# writing NULLs to primary key columns.
98table_name:liketbl, constraint:exclude, table_format:hbase/none/none
99table_name:manynulls, constraint:exclude, table_format:hbase/none/none
100table_name:tblwithraggedcolumns, constraint:exclude, table_format:hbase/none/none
101# Tables with only one column are not supported in hbase.
102table_name:greptiny, constraint:exclude, table_format:hbase/none/none
103table_name:tinyinttable, constraint:exclude, table_format:hbase/none/none
104# overflow uses a manually constructed text file which doesn't make sense to write to
105# other table formats since the values that would be written are different (e.g. already
106# truncated.)
107table_name:overflow, constraint:restrict_to, table_format:text/none/none
108# widerow has a single column with a single row containing a 10MB string. hbase doesn't
109# seem to like this.
110table_name:widerow, constraint:exclude, table_format:hbase/none/none
111# nullformat_custom is used in null-insert tests, which user insert overwrite,
112# which is not supported in hbase. The schema is also specified in HIVE_CREATE
113# with no corresponding LOAD statement.
114table_name:nullformat_custom, constraint:exclude, table_format:hbase/none/none
115table_name:unsupported_types, constraint:exclude, table_format:hbase/none/none
116# Decimal can only be tested on formats Impala can write to (text and parquet).
117# TODO: add Avro once Hive or Impala can write Avro decimals
118table_name:decimal_tbl, constraint:restrict_to, table_format:text/none/none
119table_name:decimal_tiny, constraint:restrict_to, table_format:text/none/none
120table_name:decimal_tbl, constraint:restrict_to, table_format:parquet/none/none
121table_name:decimal_tiny, constraint:restrict_to, table_format:parquet/none/none
122table_name:decimal_tbl, constraint:restrict_to, table_format:kudu/none/none
123table_name:decimal_tiny, constraint:restrict_to, table_format:kudu/none/none
124table_name:decimal_tbl, constraint:restrict_to, table_format:orc/def/block
125table_name:decimal_tiny, constraint:restrict_to, table_format:orc/def/block
126table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:text/none/none
127table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:parquet/none/none
128table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:kudu/none/none
129table_name:decimal_rtf_tbl, constraint:restrict_to, table_format:orc/def/block
130table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:text/none/none
131table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:parquet/none/none
132table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:kudu/none/none
133table_name:decimal_rtf_tiny_tbl, constraint:restrict_to, table_format:orc/def/block
134table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block
135# CHAR is not supported by HBase.
136table_name:chars_tiny, constraint:exclude, table_format:hbase/none/none
137table_name:chars_medium, constraint:exclude, table_format:hbase/none/none
138# invalid_decimal_part_tbl[1,2,3] tables are used for testing invalid decimal
139# partition key values (see IMPALA-1040)
140table_name:invalid_decimal_part_tbl1, constraint:restrict_to, table_format:text/none/none
141table_name:invalid_decimal_part_tbl2, constraint:restrict_to, table_format:text/none/none
142table_name:invalid_decimal_part_tbl3, constraint:restrict_to, table_format:text/none/none
143table_name:avro_decimal_tbl, constraint:restrict_to, table_format:avro/snap/block
144# testescape tables are used for testing text scanner delimiter handling
145table_name:table_no_newline, constraint:restrict_to, table_format:text/none/none
146table_name:table_no_newline_part, constraint:restrict_to, table_format:text/none/none
147table_name:testescape_16_lf, constraint:restrict_to, table_format:text/none/none
148table_name:testescape_16_crlf, constraint:restrict_to, table_format:text/none/none
149table_name:testescape_17_lf, constraint:restrict_to, table_format:text/none/none
150table_name:testescape_17_crlf, constraint:restrict_to, table_format:text/none/none
151table_name:testescape_32_lf, constraint:restrict_to, table_format:text/none/none
152table_name:testescape_32_crlf, constraint:restrict_to, table_format:text/none/none
153# alltimezones is used to verify that impala properly deals with timezones
154table_name:alltimezones, constraint:restrict_to, table_format:text/none/none
155# Avro schema is inferred from the column definitions (IMPALA-1136)
156table_name:no_avro_schema, constraint:restrict_to, table_format:avro/snap/block
157table_name:avro_unicode_nulls, constraint:restrict_to, table_format:avro/snap/block
158# test single and multi stream bz2 files
159table_name:bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
160table_name:large_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
161table_name:multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
162table_name:large_multistream_bzip2_tbl, constraint:restrict_to, table_format:text/bzip/block
163# Kudu can't handle certain types such as timestamp so we pick and choose the tables
164# we actually use for Kudu related tests.
165table_name:alltypes, constraint:only, table_format:kudu/none/none
166table_name:alltypessmall, constraint:only, table_format:kudu/none/none
167table_name:alltypestiny, constraint:only, table_format:kudu/none/none
168table_name:alltypesagg, constraint:only, table_format:kudu/none/none
169table_name:alltypesaggnonulls, constraint:only, table_format:kudu/none/none
170table_name:testtbl, constraint:only, table_format:kudu/none/none
171table_name:jointbl, constraint:only, table_format:kudu/none/none
172table_name:emptytable, constraint:only, table_format:kudu/none/none
173table_name:dimtbl, constraint:only, table_format:kudu/none/none
174table_name:tinytable, constraint:only, table_format:kudu/none/none
175table_name:tinyinttable, constraint:only, table_format:kudu/none/none
176table_name:zipcode_incomes, constraint:only, table_format:kudu/none/none
177table_name:nulltable, constraint:only, table_format:kudu/none/none
178table_name:nullrows, constraint:only, table_format:kudu/none/none
179table_name:nullescapedtable, constraint:only, table_format:kudu/none/none
180table_name:decimal_tbl, constraint:only, table_format:kudu/none/none
181table_name:decimal_rtf_tbl, constraint:only, table_format:kudu/none/none
182table_name:decimal_rtf_tiny_tbl, constraint:only, table_format:kudu/none/none
183table_name:decimal_tiny, constraint:only, table_format:kudu/none/none
184table_name:strings_with_quotes, constraint:only, table_format:kudu/none/none
185table_name:manynulls, constraint:only, table_format:kudu/none/none
186# Skipping header lines is only effective with text tables
187table_name:table_with_header, constraint:restrict_to, table_format:text/none/none
188table_name:table_with_header_2, constraint:restrict_to, table_format:text/none/none
189table_name:table_with_header_insert, constraint:restrict_to, table_format:text/none/none
190# We also test that skipping header lines works on compressed tables (IMPALA-5287)
191table_name:table_with_header, constraint:restrict_to, table_format:text/gzip/block
192table_name:table_with_header_2, constraint:restrict_to, table_format:text/gzip/block
193table_name:table_with_header_insert, constraint:restrict_to, table_format:text/gzip/block
194# Inserting into parquet tables should not be affected by the 'skip.header.line.count'
195# property, so we test parquet format as well.
196table_name:table_with_header_insert, constraint:restrict_to, table_format:parquet/none/none