IMPALA-12373 introduces small string optimisation, after which not all
strings will have a var-len part.
IMPALA-12159 adds support for ORDER BY for collections of variable
length types in the select list, but the test tables it uses only/mostly
contain short strings.
This patch has two modifications:
1. It introduces longer strings in 'collection_tbl' and
'collection_struct_mix'. It also adds two more rows to the existing one
in 'collection_tbl' so that it can be used in sorting tests. These
tables are only used by complex types tests, so the impact is limited.
2. It modifies RandomNestedDataGenerator.java, so that now it takes a
parameter for string length. Some variable names are changed to clearer
names. The references to and uses of RandomNestedDataGenerator are
updated.
Change-Id: Ief770d6bc9258fce159a733d5afa34fe594b96f8
Reviewed-on: http://gerrit.cloudera.org:8080/20718
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-12019 implemented support for collections of fixed length types
in the sorting tuple. This was made possible by implementing the
materialisation of these collections.
Building on this, this change allows such collections as non-passthrough
children of UNION ALL operations. Note that plain UNIONs are not
supported for any collections for other reasons and this patch does not
affect them or any other set operation.
Testing:
Tests in nested-array-in-select-list.test and
nested-map-in-select-list.test check that
- the newly allowed cases work correctly and
- the correct error message is given for collections of variable length
types.
Change-Id: I14c13323d587e5eb8a2617ecaab831c059a0fae3
Reviewed-on: http://gerrit.cloudera.org:8080/19903
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As a first stage of IMPALA-10939, this change implements support for
including in the sorting tuple top-level collections that only contain
fixed length types (including fixed length structs). For these types the
implementation is almost the same as the existing handling of strings.
Another limitation is that structs that contain any type of collection
are not yet allowed in the sorting tuple.
Also refactored the RawValue::Write*() functions to have a clearer
interface.
Testing:
- Added a new test table that contains many rows with arrays. This is
queried in a new test added in test_sort.py, to ensure that we handle
spilling correctly.
- Added tests that have arrays and/or maps in the sorting tuple in
test_queries.py::TestQueries::{test_sort,
test_top_n,test_partitioned_top_n}.
Change-Id: Ic7974ef392c1412e8c60231e3420367bd189677a
Reviewed-on: http://gerrit.cloudera.org:8080/19660
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
NULL values are printed as "NULL" if they are top level or in
collections, but as "null" in structs. We should print collections and
structs in JSON form, so it should be "null" in collections, too. Hive
also follows the latter (correct) approach.
This commit changes the printing of NULL values to "null" in
collections.
Testing:
- Modified the tests to expect "null" instead of "NULL" in collections.
Change-Id: Ie5e7f98df4014ea417ddf73ac0fb8ec01ef655ba
Reviewed-on: http://gerrit.cloudera.org:8080/19236
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Non-matching rows from the left side will null out all slots from the
right side in left outer joins. If the right side is a subquery, it
is possible that some returned expressions will be non-NULL even if all
slots are NULL (e.g. constants) - these expressions are wrapped as
IF(TupleIsNull(tids), NULL, expr) to null them in the non-matching
case.
The logic above used to hit a precondition for complex types. We can
safely ignore complex types for now, as currently the only possible
expression that returns a complex type is SlotRef, which doesn't
need to be wrapped. We will have to revisit this once functions are
added that return complex types.
Testing:
- added a regression test and ran it
Change-Id: Iaa8991cd4448d5c7ef7f44f73ee07e2a2b6f37ce
Reviewed-on: http://gerrit.cloudera.org:8080/18954
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adding support for MAP types in the select list.
An example of how maps are printed:
{"k1":2,"k2":null}
Nested collection types (maps and arrays) are supported in any
combination. However, structs in collections and collections in structs
are not supported.
Limitations (other than map support) as described in the commit for
IMPALA-9498 still apply, the following are to be implemented later:
- Unify HS2 / Beeswax logic with the way STRUCTs are handled.
This could be done in a "final" logic that can handle
STRUCTS/ARRAYS nested to each other
- Implement "deep copy" and "deep serialize" for collections in BE.
This would enable all operators, e.g. ORDER BY and UNION.
Testing:
- modified the FE tests that checked that maps were not allowed in the
select list - now the test expect maps are allowed there
- added FE and EE tests involving maps based on the array tests
Change-Id: I921c647f1779add36e7f5df4ce6ca237dcfaf001
Reviewed-on: http://gerrit.cloudera.org:8080/18736
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
More than 1d arrays in select list tried to register a
CollectionTableRef with name "item" for the inner arrays,
leading to name collision if there was more than one such array.
The logic is changed to always use the full path as implicit alias
in CollectionTableRefs backing arrays in select list.
As a side effect this leads to using the fully qualified names
in expressions in the explain plans of queries that use arrays
from views. This is not an intended change, but I don't consider
it to be critical. Created IMPALA-11452 to deal with more
sophisticated alias handling in collections.
Testing:
- added a new table to testdata and a regression test
Change-Id: I6f2b6cad51fa25a6f6932420eccf1b0a964d5e4e
Reviewed-on: http://gerrit.cloudera.org:8080/18734
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The expectation for predicates on unnested arrays is that they are
either picked up by the SCAN node or the UNNEST node for evaluation. If
there is only one array being unnested then the SCAN node, otherwise
the UNNEST node will be responsible for the evaluation. However, if
there is a JOIN node involved where the JOIN construction happens
before creating the UNNEST node then the JOIN node incorrectly picks
up the predicates for the unnested arrays as well. This patch is to fix
this behaviour.
Tests:
- Added E2E tests to cover result correctness.
- Added planner tests to verify that the desired node picks up the
predicates for unnested arrays.
Change-Id: I89fed4eef220ca513b259f0e2649cdfbe43c797a
Reviewed-on: http://gerrit.cloudera.org:8080/18614
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Until now ARRAYs had to be unnested in queries. This patch adds
support to return ARRAYs as STRINGs (JSON arrays) in select list,
for example:
select id, int_array from functional_parquet.complextypestbl where id = 1;
returns: 1, [1,2,3]
Returning ARRAYs from inline or HMS views is also supported -
these arrays can be used both in the select list or as relative
table references. Using them as non-relative table reference is
not supported (IMPALA-11052).
Though STRUCTs are already supported, ARRAYs and STRUCTs nested in
each other are not supported yet.
Things intentionally postponed for later commits:
- Add MAP suppport too - this shouldn't be too tricky after
ARRAY support, but I don't want to make this patch even more
complex.
- Unify HS2 / Beeswax logic with the way STRUCTs are handled.
This could be done in a "final" logic that can handle
STRUCTS/ARRAYS nested to each other
- Implement "deep copy" and "deep serialize" for ARRAYs in BE.
This would enable all operators, e.g. ORDER BY and UNION.
Testing:
- FE tests were added for analyses and authorization
- EE tests were added
- core tests were ran
Change-Id: Ibb1e42ffb21c7ddc033aba0f754b0108e46f34d0
Reviewed-on: http://gerrit.cloudera.org:8080/17811
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>