mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
This now gives a clean RAT check with bin/check-rat-report.py, which is one way for the Impala community to check compliance with ASF rules on intellectual property. Change-Id: I2ad06435f84a65ba126759e42a18fdaf52cd7036 Reviewed-on: http://gerrit.cloudera.org:8080/5232 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins Reviewed-by: John Russell <jrussell@cloudera.com>
464 lines
20 KiB
XML
464 lines
20 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="timestamp">
|
|
|
|
<title>TIMESTAMP Data Type</title>
|
|
<titlealts audience="PDF"><navtitle>TIMESTAMP</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Data Types"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Dates and Times"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p>
|
|
A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements, representing a
|
|
point in time.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<p>
|
|
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
|
|
</p>
|
|
|
|
<codeblock><varname>column_name</varname> TIMESTAMP</codeblock>
|
|
|
|
<p>
|
|
<b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive
|
|
<codeph>TIMESTAMP</codeph> type. Internally, the resolution of the time portion of a
|
|
<codeph>TIMESTAMP</codeph> value is in nanoseconds.
|
|
</p>
|
|
|
|
<p>
|
|
<b>INTERVAL expressions:</b>
|
|
</p>
|
|
|
|
<p>
|
|
You can perform date arithmetic by adding or subtracting a specified number of time units, using the
|
|
<codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and <codeph>-</codeph> operators or
|
|
<codeph>date_add()</codeph> and <codeph>date_sub()</codeph> functions. You can specify units as
|
|
<codeph>YEAR[S]</codeph>, <codeph>MONTH[S]</codeph>, <codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
|
|
<codeph>HOUR[S]</codeph>, <codeph>MINUTE[S]</codeph>, <codeph>SECOND[S]</codeph>,
|
|
<codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and <codeph>NANOSECOND[S]</codeph>. You can
|
|
only specify one time unit in each interval expression, for example <codeph>INTERVAL 3 DAYS</codeph> or
|
|
<codeph>INTERVAL 25 HOURS</codeph>, but you can produce any granularity by adding together successive
|
|
<codeph>INTERVAL</codeph> values, such as <codeph><varname>timestamp_value</varname> + INTERVAL 3 WEEKS -
|
|
INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
For example:
|
|
</p>
|
|
|
|
<codeblock>select now() + interval 1 day;
|
|
select date_sub(now(), interval 5 minutes);
|
|
insert into auction_details
|
|
select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
|
|
from new_auctions;</codeblock>
|
|
|
|
<p>
|
|
<b>Time zones:</b>
|
|
</p>
|
|
|
|
<p>
|
|
By default, Impala does not store timestamps using the local timezone, to avoid undesired results from
|
|
unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or
|
|
read from data files, or when converted to or from Unix time values through functions such as
|
|
<codeph>from_unixtime()</codeph> or <codeph>unix_timestamp()</codeph>. To convert such a
|
|
<codeph>TIMESTAMP</codeph> value to one that represents the date and time in a specific time zone, convert
|
|
the original value with the <codeph>from_utc_timestamp()</codeph> function.
|
|
</p>
|
|
|
|
<p>
|
|
Because Impala does not assume that <codeph>TIMESTAMP</codeph> values are in any particular time zone, you
|
|
must be conscious of the time zone aspects of data that you query, insert, or convert.
|
|
</p>
|
|
|
|
<p>
|
|
For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> returned by the <codeph>now()</codeph>
|
|
function represents the local time in the system time zone, rather than in UTC. To store values relative to
|
|
the current time in a portable way, convert any <codeph>now()</codeph> return values using the
|
|
<codeph>to_utc_timestamp()</codeph> function first. For example, the following example shows that the current
|
|
time in California (where this Impala cluster is located) is shortly after 2 PM. If that value was written to a data
|
|
file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the
|
|
dates and times would not match up precisely because of time zone differences. Therefore, the
|
|
<codeph>to_utc_timestamp()</codeph> function converts it using a common reference point, the UTC time zone
|
|
(descended from the old Greenwich Mean Time standard). The <codeph>'PDT'</codeph> argument indicates that the
|
|
original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all
|
|
geographic locations run the same transformation on any local date and time values (with the appropriate time
|
|
zone argument), the stored data uses a consistent representation. Impala queries can use functions such as
|
|
<codeph>EXTRACT()</codeph>, <codeph>MIN()</codeph>, <codeph>AVG()</codeph>, and so on to do time-series
|
|
analysis on those timestamps.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > select now();
|
|
+-------------------------------+
|
|
| now() |
|
|
+-------------------------------+
|
|
| 2015-04-09 14:07:46.580465000 |
|
|
+-------------------------------+
|
|
[localhost:21000] > select to_utc_timestamp(now(), 'PDT');
|
|
+--------------------------------+
|
|
| to_utc_timestamp(now(), 'pdt') |
|
|
+--------------------------------+
|
|
| 2015-04-09 21:08:07.664547000 |
|
|
+--------------------------------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
The converse function, <codeph>from_utc_timestamp()</codeph>, lets you take stored <codeph>TIMESTAMP</codeph>
|
|
data or calculated results and convert back to local date and time for processing on the application side.
|
|
The following example shows how you might represent some future date (such as the ending date and time of an
|
|
auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The
|
|
final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it
|
|
back to the local time zone and comparing it against the current date and time.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
|
|
+---------------------------------------------------+
|
|
| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
|
|
+---------------------------------------------------+
|
|
| 2015-04-23 21:08:34.152923000 |
|
|
+---------------------------------------------------+
|
|
[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT');
|
|
+------------------------------------------------------------+
|
|
| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
|
|
+------------------------------------------------------------+
|
|
| 2015-04-23 14:08:34.152923000 |
|
|
+------------------------------------------------------------+
|
|
[localhost:21000] > select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') < now();
|
|
+--------------------------------------------------------------------+
|
|
| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') < now() |
|
|
+--------------------------------------------------------------------+
|
|
| false |
|
|
+--------------------------------------------------------------------+
|
|
</codeblock>
|
|
|
|
<p rev="2.2.0">
|
|
If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> values represent the local timezone
|
|
of the host where the data was written, potentially leading to inconsistent results when processed by Impala.
|
|
To avoid compatibility problems or having to code workarounds, you can specify one or both of these
|
|
<cmdname>impalad</cmdname> startup flags: <codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph>
|
|
<codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>. Although
|
|
<codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned off by default to avoid performance overhead, <ph rev="upstream">Cloudera</ph> recommends
|
|
turning it on when processing <codeph>TIMESTAMP</codeph> columns in Parquet files written by Hive, to avoid unexpected behavior.
|
|
</p>
|
|
|
|
<p rev="2.2.0">
|
|
The <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting affects conversions from
|
|
<codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from <codeph>BIGINT</codeph>
|
|
to <codeph>TIMESTAMP</codeph>. By default, Impala treats all <codeph>TIMESTAMP</codeph> values as UTC,
|
|
to simplify analysis of time-series data from different geographic regions. When you enable the
|
|
<codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting, these operations
|
|
treat the input values as if they are in the local tie zone of the host doing the processing.
|
|
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for the list of functions
|
|
affected by the <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting.
|
|
</p>
|
|
|
|
<p>
|
|
The following sequence of examples shows how the interpretation of <codeph>TIMESTAMP</codeph> values in
|
|
Parquet tables is affected by the setting of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph>
|
|
setting.
|
|
</p>
|
|
|
|
<p>
|
|
Regardless of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting,
|
|
<codeph>TIMESTAMP</codeph> columns in text tables can be written and read interchangeably by Impala and Hive:
|
|
</p>
|
|
|
|
<codeblock>Impala DDL and queries for text table:
|
|
|
|
[localhost:21000] > create table t1 (x timestamp);
|
|
[localhost:21000] > insert into t1 values (now()), (now() + interval 1 day);
|
|
[localhost:21000] > select x from t1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 15:43:02.892403000 |
|
|
| 2015-04-08 15:43:02.892403000 |
|
|
+-------------------------------+
|
|
[localhost:21000] > select to_utc_timestamp(x, 'PDT') from t1;
|
|
+-------------------------------+
|
|
| to_utc_timestamp(x, 'pdt') |
|
|
+-------------------------------+
|
|
| 2015-04-07 22:43:02.892403000 |
|
|
| 2015-04-08 22:43:02.892403000 |
|
|
+-------------------------------+
|
|
|
|
Hive query for text table:
|
|
|
|
hive> select * from t1;
|
|
OK
|
|
2015-04-07 15:43:02.892403
|
|
2015-04-08 15:43:02.892403
|
|
Time taken: 1.245 seconds, Fetched: 2 row(s)
|
|
</codeblock>
|
|
|
|
<p>
|
|
When the table uses Parquet format, Impala expects any time zone adjustment to be applied prior to writing,
|
|
while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to be in the UTC time zone. When Hive
|
|
queries Parquet data files that it wrote, it adjusts the <codeph>TIMESTAMP</codeph> values back to the local
|
|
time zone, while Impala does no conversion. Hive does no time zone conversion when it queries Impala-written
|
|
Parquet files.
|
|
</p>
|
|
|
|
<codeblock>Impala DDL and queries for Parquet table:
|
|
|
|
[localhost:21000] > create table p1 stored as parquet as select x from t1;
|
|
+-------------------+
|
|
| summary |
|
|
+-------------------+
|
|
| Inserted 2 row(s) |
|
|
+-------------------+
|
|
[localhost:21000] > select x from p1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 15:43:02.892403000 |
|
|
| 2015-04-08 15:43:02.892403000 |
|
|
+-------------------------------+
|
|
|
|
Hive DDL and queries for Parquet table:
|
|
|
|
hive> create table h1 (x timestamp) stored as parquet;
|
|
OK
|
|
hive> insert into h1 select * from p1;
|
|
...
|
|
OK
|
|
Time taken: 35.573 seconds
|
|
hive> select x from p1;
|
|
OK
|
|
2015-04-07 15:43:02.892403
|
|
2015-04-08 15:43:02.892403
|
|
Time taken: 0.324 seconds, Fetched: 2 row(s)
|
|
hive> select x from h1;
|
|
OK
|
|
2015-04-07 15:43:02.892403
|
|
2015-04-08 15:43:02.892403
|
|
Time taken: 0.197 seconds, Fetched: 2 row(s)
|
|
</codeblock>
|
|
|
|
<p>
|
|
The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying values in the
|
|
<codeph>TIMESTAMP</codeph> column are different from the ones written by Impala, even though they were copied
|
|
from one table to another by an <codeph>INSERT ... SELECT</codeph> statement in Hive. Hive did an implicit
|
|
conversion from the local time zone to UTC as it wrote the values to Parquet.
|
|
</p>
|
|
|
|
<codeblock>Impala query for TIMESTAMP values from Impala-written and Hive-written data:
|
|
|
|
[localhost:21000] > select * from p1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 15:43:02.892403000 |
|
|
| 2015-04-08 15:43:02.892403000 |
|
|
+-------------------------------+
|
|
Fetched 2 row(s) in 0.29s
|
|
[localhost:21000] > select * from h1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 22:43:02.892403000 |
|
|
| 2015-04-08 22:43:02.892403000 |
|
|
+-------------------------------+
|
|
Fetched 2 row(s) in 0.41s
|
|
|
|
Underlying integer values for Impala-written and Hive-written data:
|
|
|
|
[localhost:21000] > select cast(x as bigint) from p1;
|
|
+-------------------+
|
|
| cast(x as bigint) |
|
|
+-------------------+
|
|
| 1428421382 |
|
|
| 1428507782 |
|
|
+-------------------+
|
|
Fetched 2 row(s) in 0.38s
|
|
[localhost:21000] > select cast(x as bigint) from h1;
|
|
+-------------------+
|
|
| cast(x as bigint) |
|
|
+-------------------+
|
|
| 1428446582 |
|
|
| 1428532982 |
|
|
+-------------------+
|
|
Fetched 2 row(s) in 0.20s
|
|
</codeblock>
|
|
|
|
<p>
|
|
When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting is enabled, Impala recognizes
|
|
the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during
|
|
the query as Hive uses, making the contents of the Impala-written <codeph>P1</codeph> table and the
|
|
Hive-written <codeph>H1</codeph> table appear identical, whether represented as <codeph>TIMESTAMP</codeph>
|
|
values or the underlying <codeph>BIGINT</codeph> integers:
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > select x from p1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 15:43:02.892403000 |
|
|
| 2015-04-08 15:43:02.892403000 |
|
|
+-------------------------------+
|
|
Fetched 2 row(s) in 0.37s
|
|
[localhost:21000] > select x from h1;
|
|
+-------------------------------+
|
|
| x |
|
|
+-------------------------------+
|
|
| 2015-04-07 15:43:02.892403000 |
|
|
| 2015-04-08 15:43:02.892403000 |
|
|
+-------------------------------+
|
|
Fetched 2 row(s) in 0.19s
|
|
[localhost:21000] > select cast(x as bigint) from p1;
|
|
+-------------------+
|
|
| cast(x as bigint) |
|
|
+-------------------+
|
|
| 1428446582 |
|
|
| 1428532982 |
|
|
+-------------------+
|
|
Fetched 2 row(s) in 0.29s
|
|
[localhost:21000] > select cast(x as bigint) from h1;
|
|
+-------------------+
|
|
| cast(x as bigint) |
|
|
+-------------------+
|
|
| 1428446582 |
|
|
| 1428532982 |
|
|
+-------------------+
|
|
Fetched 2 row(s) in 0.22s
|
|
</codeblock>
|
|
|
|
<p>
|
|
<b>Conversions:</b>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
|
|
|
|
<p>
|
|
In Impala 1.3 and higher, the <codeph>FROM_UNIXTIME()</codeph> and <codeph>UNIX_TIMESTAMP()</codeph>
|
|
functions allow a wider range of format strings, with more flexibility in element order, repetition of letter
|
|
placeholders, and separator characters. In <keyword keyref="impala23_full"/> and higher, the <codeph>UNIX_TIMESTAMP()</codeph>
|
|
function also allows a numeric timezone offset to be specified as part of the input string.
|
|
See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/y2k38"/>
|
|
|
|
<p>
|
|
<b>Partitioning:</b>
|
|
</p>
|
|
|
|
<p>
|
|
Although you cannot use a <codeph>TIMESTAMP</codeph> column as a partition key, you can extract the
|
|
individual years, months, days, hours, and so on and partition based on those columns. Because the partition
|
|
key column values are represented in HDFS directory names, rather than as fields in the data files
|
|
themselves, you can also keep the original <codeph>TIMESTAMP</codeph> values if desired, without duplicating
|
|
data or wasting storage space. See <xref href="impala_partitioning.xml#partition_key_columns"/> for more
|
|
details on partitioning with date and time values.
|
|
</p>
|
|
|
|
<codeblock>[localhost:21000] > create table timeline (event string) partitioned by (happened timestamp);
|
|
ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<codeblock>select cast('1966-07-30' as timestamp);
|
|
select cast('1985-09-25 17:45:30.005' as timestamp);
|
|
select cast('08:30:00' as timestamp);
|
|
select hour('1970-01-01 15:30:00'); -- Succeeds, returns 15.
|
|
select hour('1970-01-01 15:30'); -- Returns NULL because seconds field required.
|
|
select hour('1970-01-01 27:30:00'); -- Returns NULL because hour value out of range.
|
|
select dayofweek('2004-06-13'); -- Returns 1, representing Sunday.
|
|
select dayname('2004-06-13'); -- Returns 'Sunday'.
|
|
select date_add('2004-06-13', 365); -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
|
|
select day('2004-06-13'); -- Returns 13.
|
|
select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
|
|
select now(); -- Returns current date and time in local timezone.
|
|
|
|
create table dates_and_times (t timestamp);
|
|
insert into dates_and_times values
|
|
('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/null_bad_timestamp_cast"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/partitioning_worrisome"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/hbase_ok"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/parquet_ok"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/text_bulky"/>
|
|
|
|
<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
|
|
|
|
<p conref="../shared/impala_common.xml#common/internals_16_bytes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_forever"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/sqoop_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/sqoop_timestamp_caveat"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<p>
|
|
If you cast a <codeph>STRING</codeph> with an unrecognized format to a <codeph>TIMESTAMP</codeph>, the result
|
|
is <codeph>NULL</codeph> rather than an error. Make sure to test your data pipeline to be sure any textual
|
|
date and time values are in a format that Impala <codeph>TIMESTAMP</codeph> can recognize.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<ul>
|
|
<li>
|
|
<!-- The Timestamp Literals topic is pretty brief. Consider adding more examples there. -->
|
|
<xref href="impala_literals.xml#timestamp_literals"/>.
|
|
</li>
|
|
|
|
<li>
|
|
To convert to or from different date formats, or perform date arithmetic, use the date and time functions
|
|
described in <xref href="impala_datetime_functions.xml#datetime_functions"/>. In particular, the
|
|
<codeph>from_unixtime()</codeph> function requires a case-sensitive format string such as
|
|
<codeph>"yyyy-MM-dd HH:mm:ss.SSSS"</codeph>, matching one of the allowed variations of a
|
|
<codeph>TIMESTAMP</codeph> value (date plus time, only date, only time, optional fractional seconds).
|
|
</li>
|
|
|
|
<li>
|
|
See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> for details about differences in
|
|
<codeph>TIMESTAMP</codeph> handling between Impala and Hive.
|
|
</li>
|
|
</ul>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|