IMPALA-5315: Cast to timestamp fails for YYYY-M-D format

This change allows casting of a string in 'lazy' date/time
format to timestamp. The supported lazy date formats are:
  yyyy-[M]M-[d]d
  yyyy-[M]M-[d]d [H]H:[m]m:[s]s[.SSSSSSSSS]
  [H]H:[m]m:[s]s[.SSSSSSSSS]

We will incur a SCAN performance penalty (approximately 1/2
TotalReadThroughput) when the string is in one of these
lazy date/time format.

Testing:
Benchmarked the performance consequence by executing this SQL on
a private build over 3.8 billion rows:
select min(cast (time_string as timestamp)) from private.impala_5315

Added tests for valid and invalid date/time format strings
in expr-test.cc to be inline with existing tests for CAST() function.

Added end-to-end tests into exprs.test and
select-lazy-timestamp.test to exercise the new function within
the context of a query.

Added tests to exercise the leading and trailing white space trimming
behaviour in default and lazy date/time string format (IMPALA-6630).

Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f
Reviewed-on: http://gerrit.cloudera.org:8080/7009
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This commit is contained in:
Vincent Tran
2017-05-27 03:02:19 -04:00
committed by Impala Public Jenkins
parent 6d8ce64020
commit 0d7787fe4d
7 changed files with 398 additions and 7 deletions

View File

@@ -2959,3 +2959,39 @@ from functional.alltypes where id = 7
---- TYPES
BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT , BIGINT
====
---- QUERY
# IMPALA-5315: Test support for non zero-padded date/time strings cast as timestamp
select cast('2001-1-21 12:5:30' as timestamp)
---- RESULTS
2001-01-21 12:05:30
---- TYPES
timestamp
====
---- QUERY
select cast('2001-1-2 1:5:3.123' as timestamp)
---- RESULTS
2001-01-02 01:05:03.123000000
---- TYPES
timestamp
====
---- QUERY
select cast('1:5:3' as timestamp)
---- RESULTS
01:05:03
---- TYPES
timestamp
====
---- QUERY
select cast('1:5:3.1234567' as timestamp)
---- RESULTS
01:05:03.123456700
---- TYPES
timestamp
====
---- QUERY
select cast('2001-1-2' as timestamp)
---- RESULTS
2001-01-02 00:00:00
---- TYPES
timestamp
====

View File

@@ -0,0 +1,20 @@
====
---- QUERY
select ts from lazy_ts
---- RESULTS: VERIFY_IS_EQUAL_SORTED
2001-01-02 00:00:00
2001-01-02 00:00:00
2001-01-02 00:00:00
01:06:08
01:06:08
01:06:08
01:06:08
01:06:08.123456789
01:06:08.123456789
01:06:08.123450000
2001-01-02 01:06:08
2001-01-02 01:06:08.123456000
2001-01-02 01:06:08.123456789
---- TYPES
timestamp
====