DATE values describe a particular year/month/day in the form
yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is 0000-01-01 to 9999-12-31.
This initial DATE type support covers TEXT and HBASE fileformats only.
'DateValue' is used as the internal type to represent DATE values.
The changes are as follows:
- Support for DATE literal syntax.
- Explicit casting between DATE and other types (note that invalid
casts will fail with an error just like invalid DECIMAL_V2 casts,
while failed casts to other types do no lead to warning or error):
- from STRING to DATE. The string value must be formatted as
yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory,
the time component is optional. If the time component is
present, it will be truncated silently.
- from DATE to STRING. The resulting string value is formatted as
yyyy-MM-dd.
- from TIMESTAMP to DATE. The source timestamp's time of day
component is ignored.
- from DATE to TIMESTAMP. The target timestamp's time of day
component is set to 00:00:00.
- Implicit casting between DATE and other types:
- from STRING to DATE if the source string value is used in a
context where a DATE value is expected.
- from DATE to TIMESTAMP if the source date value is used in a
context where a TIMESTAMP value is expected.
- Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP
implicit conversions are now all possible, the existing function
overload resolution logic is not adequate anymore.
For example, it resolves the
if(false, '2011-01-01', DATE '1499-02-02') function call to the
if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded
function, instead of the if(BOOLEAN, DATE, DATE) version.
This is clearly wrong, so the function overload resolution logic had
to be changed to resolve function calls to the best-fit overloaded
function definition if there are multiple applicable candidates.
An overloaded function definition is an applicable candidate for a
function call if each actual parameter in the function call either
matches the corresponding formal parameter's type (without casting)
or is implicitly castable to that type.
When looking for the best-fit applicable candidate, a parameter
match score (i.e. the number of actual parameters in the function
call that match their corresponding formal parameter's type without
casting) is calculated and the applicable candidate with the highest
parameter match score is chosen.
There's one more issue that the new resolution logic has to address:
if two applicable candidates have the same parameter match score and
the only difference between the two is that the first one requires a
STRING -> TIMESTAMP implicit cast for some of its parameters while
the second one requires a STRING -> DATE implicit cast for the same
parameters then the first candidate has to be chosen not to break
backward compatibility.
E.g: year('2019-02-15') function call must resolve to
year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not
implemented yet, so this is not an issue at the moment but it will
be in the future.
When the resolution algorithm considers overloaded function
definitions, first it orders them lexicographically by the types in
their parameter lists. To ensure the backward compatible behavior
Primitivetype.DATE enum value has to come after
PrimitiveType.TIMESTAMP.
- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.
These items are tightly coupled and it makes sense to implement them
in one change-set.
Testing:
- A new partitioned TEXT table 'functional.date_tbl' (and the
corresponding HBASE table 'functional_hbase.date_tbl') was
introduced for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
- since DATE type is supported for TEXT and HBASE fileformats
only, most DATE tests were implemented separately in
tests/query_test/test_date_queries.py.
Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.
Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Reviewed-on: http://gerrit.cloudera.org:8080/12481
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.
This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.
I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.
Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
objects, which only have microsecond precision. Therefore result
timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
VARCHAR.
I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.
Limitations:
* Not all of the same methods are implemented as for beeswax, so some
tests that have more complicated interactions with the client will not
work with HS2 yet.
* We don't have a way to get the affected row count for inserts.
I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.
Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.
Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements constant propagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes. The optimization is applied during planning. At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.
Testing: Expanded the test cases for the planner to achieve constant
propagation. Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.
Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Reviewed-on: http://gerrit.cloudera.org:8080/6389
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
There was a missing break in a switch statement leading to bad fallthrough.
An existing test already expected incorrect results. The bug is covered by
expecting correct results.
Change-Id: I5340c2eda813afc032ba72203bd59eb3f2c4f482
Reviewed-on: http://gerrit.cloudera.org:8080/4585
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Enforces that the planner treats IS NOT DISTINCT FROM as eligible for
hash joins, but does not find the minimum spanning tree of
equivalences for use in optimizing query plans; this is left as future
work.
Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0
Reviewed-on: http://gerrit.cloudera.org:8080/710
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins