Files
impala/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
Attila Jeges b5805de3e6 IMPALA-7368: Add initial support for DATE type
DATE values describe a particular year/month/day in the form
yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is 0000-01-01 to 9999-12-31.

This initial DATE type support covers TEXT and HBASE fileformats only.
'DateValue' is used as the internal type to represent DATE values.

The changes are as follows:
- Support for DATE literal syntax.

- Explicit casting between DATE and other types (note that invalid
  casts will fail with an error just like invalid DECIMAL_V2 casts,
  while failed casts to other types do no lead to warning or error):
    - from STRING to DATE. The string value must be formatted as
      yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory,
      the time component is optional. If the time component is
      present, it will be truncated silently.
    - from DATE to STRING. The resulting string value is formatted as
      yyyy-MM-dd.
    - from TIMESTAMP to DATE. The source timestamp's time of day
      component is ignored.
    - from DATE to TIMESTAMP. The target timestamp's time of day
      component is set to 00:00:00.

- Implicit casting between DATE and other types:
    - from STRING to DATE if the source string value is used in a
      context where a DATE value is expected.
    - from DATE to TIMESTAMP if the source date value is used in a
      context where a TIMESTAMP value is expected.

- Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP
  implicit conversions are now all possible, the existing function
  overload resolution logic is not adequate anymore.
  For example, it resolves the
  if(false, '2011-01-01', DATE '1499-02-02') function call to the
  if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded
  function, instead of the if(BOOLEAN, DATE, DATE) version.

  This is clearly wrong, so the function overload resolution logic had
  to be changed to resolve function calls to the best-fit overloaded
  function definition if there are multiple applicable candidates.

  An overloaded function definition is an applicable candidate for a
  function call if each actual parameter in the function call either
  matches the corresponding formal parameter's type (without casting)
  or is implicitly castable to that type.

  When looking for the best-fit applicable candidate, a parameter
  match score (i.e. the number of actual parameters in the function
  call that match their corresponding formal parameter's type without
  casting) is calculated and the applicable candidate with the highest
  parameter match score is chosen.

  There's one more issue that the new resolution logic has to address:
  if two applicable candidates have the same parameter match score and
  the only difference between the two is that the first one requires a
  STRING -> TIMESTAMP implicit cast for some of its parameters while
  the second one requires a STRING -> DATE implicit cast for the same
  parameters then the first candidate has to be chosen not to break
  backward compatibility.
  E.g: year('2019-02-15') function call must resolve to
  year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not
  implemented yet, so this is not an issue at the moment but it will
  be in the future.
  When the resolution algorithm considers overloaded function
  definitions, first it orders them lexicographically by the types in
  their parameter lists. To ensure the backward compatible behavior
  Primitivetype.DATE enum value has to come after
  PrimitiveType.TIMESTAMP.

- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
  math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.

These items are tightly coupled and it makes sense to implement them
in one change-set.

Testing:
- A new partitioned TEXT table 'functional.date_tbl' (and the
  corresponding HBASE table 'functional_hbase.date_tbl') was
  introduced for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
    - since DATE type is supported for TEXT and HBASE fileformats
      only, most DATE tests were implemented separately in
      tests/query_test/test_date_queries.py.

Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.

Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Reviewed-on: http://gerrit.cloudera.org:8080/12481
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-23 13:33:57 +00:00

142 lines
5.2 KiB
Plaintext

====
---- QUERY
# Gets all types including a row with a NULL value. The predicate pushed to
# the data source is not actually used, but the second predicate is
# evaluated by Impala.
select *
from alltypes_datasource
where float_col != 0 and
int_col >= 1990 limit 5
---- RESULTS
1990,true,0,90,1990,19900,2189,1990,1970-01-01 00:00:01.990000000,'NULL',-999998009,-9999998009,-9999999999.9999998009,-9.9999999999999999999999999999999998009,-99999.98009,1975-06-14
1991,false,1,91,1991,19910,2190.10009765625,1991,1970-01-01 00:00:01.991000000,'1991',999998008,9999998008,9999999999.9999998008,9.9999999999999999999999999999999998008,99999.98008,1975-06-15
1992,true,2,92,1992,19920,2191.199951171875,1992,1970-01-01 00:00:01.992000000,'1992',-999998007,-9999998007,-9999999999.9999998007,-9.9999999999999999999999999999999998007,-99999.98007,1975-06-16
1993,false,3,93,1993,19930,2192.300048828125,1993,1970-01-01 00:00:01.993000000,'1993',999998006,9999998006,9999999999.9999998006,9.9999999999999999999999999999999998006,99999.98006,1975-06-17
1994,true,4,94,1994,19940,2193.39990234375,1994,1970-01-01 00:00:01.994000000,'1994',-999998005,-9999998005,-9999999999.9999998005,-9.9999999999999999999999999999999998005,-99999.98005,1975-06-18
---- DBAPI_RESULTS
1990,true,0,90,1990,19900,2189,1990,1970-01-01 00:00:01.990000,'NULL',-999998009,-9999998009,-9999999999.9999998009,-9.9999999999999999999999999999999998009,-99999.98009,1975-06-14
1991,false,1,91,1991,19910,2190.10009765625,1991,1970-01-01 00:00:01.991000,'1991',999998008,9999998008,9999999999.9999998008,9.9999999999999999999999999999999998008,99999.98008,1975-06-15
1992,true,2,92,1992,19920,2191.199951171875,1992,1970-01-01 00:00:01.992000,'1992',-999998007,-9999998007,-9999999999.9999998007,-9.9999999999999999999999999999999998007,-99999.98007,1975-06-16
1993,false,3,93,1993,19930,2192.300048828125,1993,1970-01-01 00:00:01.993000,'1993',999998006,9999998006,9999999999.9999998006,9.9999999999999999999999999999999998006,99999.98006,1975-06-17
1994,true,4,94,1994,19940,2193.39990234375,1994,1970-01-01 00:00:01.994000,'1994',-999998005,-9999998005,-9999999999.9999998005,-9.9999999999999999999999999999999998005,-99999.98005,1975-06-18
---- TYPES
INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, TIMESTAMP, STRING, DECIMAL, DECIMAL, DECIMAL, DECIMAL, DECIMAL, DATE
====
---- QUERY
# Project a subset of the columns
select bigint_col, timestamp_col, double_col
from alltypes_datasource
where double_col != 0 and int_col >= 1990 limit 3
---- RESULTS
19900,1970-01-01 00:00:01.990000000,1990
19910,1970-01-01 00:00:01.991000000,1991
19920,1970-01-01 00:00:01.992000000,1992
---- DBAPI_RESULTS
19900,1970-01-01 00:00:01.990000,1990
19910,1970-01-01 00:00:01.991000,1991
19920,1970-01-01 00:00:01.992000,1992
---- TYPES
BIGINT, TIMESTAMP, DOUBLE
====
---- QUERY
# count(*) with a predicate evaluated by Impala
select count(*) from alltypes_datasource
where float_col = 0 and
string_col is not NULL
---- RESULTS
4000
---- TYPES
BIGINT
====
---- QUERY
# count(*) with no predicates has no materialized slots
select count(*) from alltypes_datasource
---- RESULTS
5000
---- TYPES
BIGINT
====
---- QUERY
select string_col from alltypes_datasource
where string_col = 'VALIDATE_PREDICATES##id LT 1 && id GT 1 && id LE 1 && id GE 1 && int_col EQ 1 && id NE 1'
and id < 1 and id > 1 and id <= 1 and id >= 1 and int_col = 1 and id != 1
---- RESULTS
'SUCCESS'
---- TYPES
STRING
====
---- QUERY
select string_col from alltypes_datasource
where string_col = 'VALIDATE_PREDICATES##id LT 1 && id GT 1 && id LE 1 && id GE 1 && int_col EQ 1 && id NE 1'
and 1 > id and 1 < id and 1 >= id and 1 <= id and 1 = int_col and 1 != id
---- RESULTS
'SUCCESS'
---- TYPES
STRING
====
---- QUERY
# Test that <=>, IS DISTINCT FROM, and IS NOT DISTINCT FROM all can be validated
# Note the duplicate predicate 1 IS NOT DISTINCT FROM id is removed.
select string_col from alltypes_datasource
where string_col = 'VALIDATE_PREDICATES##id NOT_DISTINCT 1 && id DISTINCT_FROM 1'
and 1 <=> id and 1 IS DISTINCT FROM id and 1 IS NOT DISTINCT FROM id
---- RESULTS
'SUCCESS'
---- TYPES
STRING
====
---- QUERY
# Test that <=>, IS DISTINCT FROM, and IS NOT DISTINCT FROM are evaluated just like their
# equality counterparts
select * from
(select count(*) from alltypes_datasource
where tinyint_col = 1 and smallint_col = 11) a
union all
(select count(*) from alltypes_datasource
where tinyint_col <=> 1 and smallint_col <=> 11)
---- RESULTS
50
50
---- TYPES
BIGINT
====
---- QUERY
select * from
(select count(*) from alltypes_datasource
where smallint_col = 11 and tinyint_col = 1) a
union all
(select count(*) from alltypes_datasource
where smallint_col <=> 11 and tinyint_col <=> 1)
---- RESULTS
500
500
---- TYPES
BIGINT
====
---- QUERY
select * from
(select count(*) from alltypes_datasource
where tinyint_col != 1 and smallint_col != 11) a
union all
(select count(*) from alltypes_datasource
where tinyint_col IS DISTINCT FROM 1 and smallint_col IS DISTINCT FROM 11)
---- RESULTS
4950
4950
---- TYPES
BIGINT
====
---- QUERY
select * from
(select count(*) from alltypes_datasource
where smallint_col != 11 and tinyint_col != 1) a
union all
(select count(*) from alltypes_datasource
where smallint_col IS DISTINCT FROM 11 and tinyint_col IS DISTINCT FROM 1)
---- RESULTS
4096
4096
---- TYPES
BIGINT
====