Files
impala/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
Csaba Ringhofer 7ca11dfc7f IMPALA-9482: Support for BINARY columns
This patch adds support for BINARY columns for all table formats with
the exception of Kudu.

In Hive the main difference between STRING and BINARY is that STRING is
assumed to be UTF8 encoded, while BINARY can be any byte array.
Some other differences in Hive:
- BINARY can be only cast from/to STRING
- Only a small subset of built-in STRING functions support BINARY.
- In several file formats (e.g. text) BINARY is base64 encoded.
- No NDV is calculated during COMPUTE STATISTICS.

As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly
identical, especially from the backend's perspective. For this reason,
BINARY is implemented a bit differently compared to other types:
while the frontend treats STRING and BINARY as two separate types, most
of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g.
in SlotDesc. Only the following parts of backend need to differentiate
between STRING and BINARY:
- table scanners
- table writers
- HS2/Beeswax service
These parts have access to column metadata, which allows to add special
handling for BINARY.

Only a very few builtins are allowed for BINARY at the moment:
- length
- min/max/count
- coalesce and similar "selector" functions
Other STRING functions can be only used by casting to STRING first.
Adding support for more of these functions is very easy, as simply
the BINARY type has to be "connected" to the already existing STRING
function's signature. Functions where the result depends on utf8_mode
need to ensure that with BINARY it always works as if utf8_mode=0 (for
example length() is mapped to bytes() as length count utf8 chars if
utf8_mode=1).

All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY,
though in case of legacy Hive UDFs it is only supported if the argument
and return types are set explicitely to ensure backward compatibility.
See IMPALA-11340 for details.

The original plan was to behave as close to Hive as possible, but I
realized that Hive has more relaxed casting rules than Impala, which
led to STRING<->BINARY casts being necessary in more cases in Impala.
This was needed to disallow passing a BINARY to functions that expect
a STRING argument. An example for the difference is that in
INSERT ... VALUES () string literals need to be explicitly cast to
BINARY, while this is not needed in Hive.

Testing:
- Added functional.binary_tbl for all file formats (except Kudu)
  to test scanning.
- Removed functional.unsupported_types and related tests, as now
  Impala supports all (non-complex) types that Hive does.
- Added FE/EE tests mainly based on the ones added to the DATE type

Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Reviewed-on: http://gerrit.cloudera.org:8080/16066
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-19 13:55:42 +00:00

314 lines
7.7 KiB
Plaintext

====
---- QUERY
select hive_bround(cast(3.14 as double))
---- RESULTS
3.0
---- TYPES
DOUBLE
====
---- QUERY
select hive_bround(cast(3.14 as int))
---- RESULTS
3
---- TYPES
INT
====
---- QUERY
select hive_upper('hello')
---- RESULTS
'HELLO'
---- TYPES
STRING
====
---- QUERY
#Test GenericUDF functions
select generic_identity(true), generic_identity(cast(NULL as boolean));
---- TYPES
boolean, boolean
---- RESULTS
true,NULL
====
---- QUERY
select generic_identity(cast(10 as tinyint)), generic_identity(cast(NULL as tinyint));
---- TYPES
tinyint, tinyint
---- RESULTS
10,NULL
====
---- QUERY
select generic_identity(cast(10 as smallint)), generic_identity(cast(NULL as smallint));
---- TYPES
smallint, smallint
---- RESULTS
10,NULL
====
---- QUERY
select generic_identity(cast(10 as int)), generic_identity(cast(NULL as int));
---- TYPES
int, int
---- RESULTS
10,NULL
====
---- QUERY
select generic_identity(cast(10 as bigint)), generic_identity(cast(NULL as bigint));
---- TYPES
bigint, bigint
---- RESULTS
10,NULL
====
---- QUERY
select generic_identity(cast(10.0 as float)), generic_identity(cast(NULL as float));
---- TYPES
float, float
---- RESULTS
10,NULL
====
---- QUERY
select generic_identity(cast(10.0 as double)), generic_identity(cast(NULL as double));
---- TYPES
double, double
---- RESULTS
10,NULL
====
---- QUERY
# IMPALA-1134. Tests that strings are copied correctly
select length(generic_identity("0123456789")),
length(generic_add("0123456789", "0123456789")),
length(generic_add("0123456789", "0123456789", "0123456789"));
---- TYPES
int, int, int
---- RESULTS
10,20,30
====
---- QUERY
select generic_identity(cast("a" as binary)), generic_identity(cast(NULL as binary));
---- TYPES
binary, binary
---- RESULTS
'a','NULL'
====
---- QUERY
# IMPALA-1392: Hive UDFs that throw exceptions should return NULL
select generic_throws_exception();
---- TYPES
boolean
---- RESULTS
NULL
====
---- QUERY
select generic_throws_exception() from functional.alltypestiny;
---- TYPES
boolean
---- RESULTS
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
====
---- QUERY
select generic_add(cast(1 as int), cast(2 as int));
---- TYPES
int
---- RESULTS
3
====
---- QUERY
select generic_add(generic_add(cast(1 as int), cast(2 as int)), cast(2 as int));
---- TYPES
int
---- RESULTS
5
====
---- QUERY
select generic_add(cast(generic_add(cast(1 as int), cast(2 as int)) - generic_add(cast(2 as int), cast(1 as int)) as int), cast(2 as int));
---- TYPES
int
---- RESULTS
2
====
---- QUERY
select generic_add(cast(1 as smallint), cast(2 as smallint));
---- TYPES
smallint
---- RESULTS
3
====
---- QUERY
select generic_add(cast(3.0 as float), cast(4.0 as float));
---- TYPES
float
---- RESULTS
7.0
====
---- QUERY
select generic_add(cast(1.0 as double), cast(2.0 as double));
---- TYPES
double
---- RESULTS
3.0
====
---- QUERY
select generic_add(cast(1 as boolean), cast(0 as boolean));
---- TYPES
boolean
---- RESULTS
true
====
---- QUERY
select generic_add(cast(1 as boolean), cast(1 as boolean));
---- TYPES
boolean
---- RESULTS
true
====
---- QUERY
# IMPALA-3378: test many Java UDFs being opened and run concurrently
select * from
(select max(int_col) from functional.alltypesagg
where generic_identity(bool_col) union all
(select max(int_col) from functional.alltypesagg
where generic_identity(tinyint_col) > 1 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(smallint_col) > 1 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(int_col) > 1 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(bigint_col) > 1 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(float_col) > 1.0 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(double_col) > 1.0 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(string_col) > '1' union all
(select max(int_col) from functional.alltypesagg
where not generic_identity(bool_col) union all
(select max(int_col) from functional.alltypesagg
where generic_identity(tinyint_col) > 2 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(smallint_col) > 2 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(int_col) > 2 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(bigint_col) > 2 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(float_col) > 2.0 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(double_col) > 2.0 union all
(select max(int_col) from functional.alltypesagg
where generic_identity(string_col) > '2'
)))))))))))))))) v
---- TYPES
INT
---- RESULTS
998
999
999
999
999
999
999
999
999
999
999
999
999
999
999
999
====
---- QUERY
drop table if exists replace_string_input
====
---- QUERY
create table replace_string_input as
values('toast'), ('scone'), ('stuff'), ('sssss'), ('yes'), ('scone'), ('stuff');
====
---- QUERY
# Regression test for IMPALA-4266: memory management bugs with output strings from
# Java UDFS, exposed by using the UDF as a grouping key in an aggregation.
# The UDF replaces "s" with "ss" in the strings.
select distinct generic_replace_string(_c0) as es
from replace_string_input
order by 1;
---- TYPES
string
---- RESULTS
'sscone'
'ssssssssss'
'sstuff'
'toasst'
'yess'
====
---- QUERY
# Regression test for IMPALA-8016; this UDF loads another class in the same jar.
select generic_import_nearby_classes("placeholder");
---- TYPES
string
---- RESULTS
'Hello'
====
---- QUERY
# Java Generic UDFs for DATE are not allowed yet
create function identity(Date) returns Date
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type DATE is not supported for Java UDFs.
====
---- QUERY
# Java Generic UDFs for DECIMAL are not allowed yet
create function identity(decimal(5,0)) returns decimal(5,0)
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type DECIMAL(5,0) is not supported for Java UDFs.
====
---- QUERY
# Java Generic UDFs for TIMESTAMP are not allowed yet
create function identity(Timestamp) returns Timestamp
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type TIMESTAMP is not supported for Java UDFs.
====
---- QUERY
create function identity(ARRAY<STRING>) returns INT
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type 'ARRAY<STRING>' is not supported in UDFs/UDAs.
====
---- QUERY
create function identity(MAP<STRING, STRING>) returns INT
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type 'MAP<STRING,STRING>' is not supported in UDFs/UDAs.
====
---- QUERY
create function identity(STRUCT<employer: STRING>) returns INT
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
AnalysisException: Type 'STRUCT<employer:STRING>' is not supported in UDFs/UDAs.
====
---- QUERY
create function generic_add_fail(smallint, smallint) returns int
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestGenericUdf';
---- CATCH
CatalogException: Function expected return type smallint but was created with INT
====
---- QUERY
create function var_args_func(int...) returns int
location '$FILESYSTEM_PREFIX/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestUdf';
---- CATCH
CatalogException: Variable arguments not supported in Hive UDFs.
====