mirror of
https://github.com/apache/impala.git
synced 2026-01-08 12:02:54 -05:00
Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: - loop the new test overnight without any failures. - exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Reviewed-on: http://gerrit.cloudera.org:8080/5129 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins