mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The two Parquet files (nullable.parq and nonnullable_orc.parq) were generated as testdata/data/schemas/nested/README stated. The two ORC files (nullable.orc and nonnullable.orc) were generated by the orc-tools which can convert JSON files into ORC format. However, we need to modify nullable.json and nonnullable.json to meet the format it requires. The whole file should not be a array. It should be JSON objects of each row joined by '\n'. Assume the JSON files are nullable_orc.json and nonnullable_orc.json. The ORC files can be regenerated by running the following commands in current directory: wget https://search.maven.org/remotecontent?filepath=org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar \ -O orc-tools-1.5.4-uber.jar java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct<id:bigint,int_array:array<int>,int_array_Array:array<array<int>>,int_map:map<string,int>,int_Map_Array:array<map<string,int>>,nested_struct:struct<A:int,b:array<int>,C:struct<d:array<array<struct<E:int,F:string>>>>,g:map<string,struct<H:struct<i:array<double>>>>>>" \ -o nullable.orc \ nullable_orc.json java -jar orc-tools-1.5.4-uber.jar convert \ -s "struct<ID:bigint,Int_Array:array<int>,int_array_array:array<array<int>>,Int_Map:map<string,int>,int_map_array:array<map<string,int>>,nested_Struct:struct<a:int,B:array<int>,c:struct<D:array<array<struct<e:int,f:string>>>>,G:map<string,struct<h:struct<i:array<double>>>>>>" \ -o nonnullable.orc \ nonnullable_orc.json