mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
IMPALA-12019 implemented support for collections of fixed length types
in the sorting tuple. This change implements it for collections of
variable length types.
Note that the limitation that structs that contain any type of
collection are not allowed in the sorting tuple is still in place (see
IMPALA-12160).
Note that it was not and still is not allowed to sort by complex types,
this change only allows them to be present in the select list when
sortin by some other expression.
This change also allows collections of variable length types to be
non-passthrough children of UNION ALL nodes.
Testing:
- Renamed the 'simple_arrays_big' table to 'arrays_big' and extended it
with collections containing variable length types. This table is
mainly used to test that spilling works during sorting.
- Renamed
test_sort.py::TestArraySort::{test_simple_arrays,
test_simple_arrays_with_limit}
to {test_array_sort,test_array_sort_with_limit}
- Extended the tests run in test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n} with collections containing
var-len types.
- Added tests in sort-complex.test that assert that it is not allowed
to sort by collections. For structs we already have such tests in
struct-in-select-list.test.
Change-Id: Ic15b29393f260b572e11a8dbb9deeb8c02981852
Reviewed-on: http://gerrit.cloudera.org:8080/20108
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The two Parquet files (nullable.parq and nonnullable_orc.parq) were generated as testdata/data/schemas/nested/README stated. The two ORC files (nullable.orc and nonnullable.orc) were generated by the orc-tools which can convert JSON files into ORC format. However, we need to modify nullable.json and nonnullable.json to meet the format it requires. The whole file should not be a array. It should be JSON objects of each row joined by '\n'. Assume the JSON files are nullable_orc.json and nonnullable_orc.json. The ORC files can be regenerated by running the following commands in current directory: wget https://search.maven.org/remotecontent?filepath=org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar \ -O orc-tools-1.5.4-uber.jar java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct<id:bigint,int_array:array<int>,int_array_Array:array<array<int>>,int_map:map<string,int>,int_Map_Array:array<map<string,int>>,nested_struct:struct<A:int,b:array<int>,C:struct<d:array<array<struct<E:int,F:string>>>>,g:map<string,struct<H:struct<i:array<double>>>>>>" \ -o nullable.orc \ nullable_orc.json java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct<ID:bigint,Int_Array:array<int>,int_array_array:array<array<int>>,Int_Map:map<string,int>,int_map_array:array<map<string,int>>,nested_Struct:struct<a:int,B:array<int>,c:struct<D:array<array<struct<e:int,f:string>>>>,G:map<string,struct<h:struct<i:array<double>>>>>>" \ -o nonnullable.orc \ nonnullable_orc.json