mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
IMPALA-8988: [DOCS] DATE type is supported AVRO tables
Change-Id: I95f37accddadcba436676498d5cbb34cda281846 Reviewed-on: http://gerrit.cloudera.org:8080/14340 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
This commit is contained in:
@@ -104,11 +104,6 @@ under the License.
|
||||
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
|
||||
</note>
|
||||
|
||||
<!--
|
||||
To do: Expand these examples to show switching between impala-shell and Hive, loading some data, and then
|
||||
doing DESCRIBE and querying the table.
|
||||
-->
|
||||
|
||||
<p>
|
||||
The following examples demonstrate creating an Avro table in Impala, using either an inline column
|
||||
specification or one taken from a JSON file stored in HDFS:
|
||||
@@ -502,41 +497,92 @@ ALTER TABLE avro_table SET TBLPROPERTIES (
|
||||
<title>Data Type Considerations for Avro Tables</title>
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
|
||||
data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
|
||||
might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
|
||||
and the equivalent types in Impala.
|
||||
</p>
|
||||
|
||||
<codeblock><![CDATA[Primitive Types (Avro -> Impala)
|
||||
--------------------------------
|
||||
STRING -> STRING
|
||||
STRING -> CHAR
|
||||
STRING -> VARCHAR
|
||||
INT -> INT
|
||||
BOOLEAN -> BOOLEAN
|
||||
LONG -> BIGINT
|
||||
FLOAT -> FLOAT
|
||||
DOUBLE -> DOUBLE
|
||||
|
||||
Logical Types
|
||||
-------------
|
||||
BYTES + logicalType = "decimal" -> DECIMAL
|
||||
|
||||
Avro Types with No Impala Equivalent
|
||||
------------------------------------
|
||||
RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL
|
||||
|
||||
Impala Types with No Avro Equivalent
|
||||
------------------------------------
|
||||
TIMESTAMP
|
||||
]]>
|
||||
</codeblock>
|
||||
|
||||
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
|
||||
|
||||
<p> The Avro format defines a set of data types whose names differ from
|
||||
the names of the corresponding Impala data types. If you are preparing
|
||||
Avro files using other Hadoop components such as Pig or MapReduce, you
|
||||
might need to work with the type names defined by Avro. The following
|
||||
figure lists the Avro-defined types and the equivalent types in Impala. </p>
|
||||
<p><b>Primitive types:</b></p>
|
||||
<table frame="all" rowsep="1" colsep="1" id="table_uvv_plj_gjb">
|
||||
<tgroup cols="2" align="left">
|
||||
<colspec colname="c1" colnum="1" colwidth="143.44pt"/>
|
||||
<colspec colname="c2" colnum="2" colwidth="165.77pt"/>
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Avro type</entry>
|
||||
<entry>Impala type</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>STRING</entry>
|
||||
<entry>STRING</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>STRING</entry>
|
||||
<entry>CHAR</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>STRING</entry>
|
||||
<entry>VARCHAR</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>INT</entry>
|
||||
<entry>INT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>BOOLEAN</entry>
|
||||
<entry>BOOLEAN</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LONG</entry>
|
||||
<entry>BIGINT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>FLOAT</entry>
|
||||
<entry>FLOAT</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>DOUBLE</entry>
|
||||
<entry>DOUBLE</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
<p>The Avro specification allows string values up to 2**64 bytes in
|
||||
length. Impala queries for Avro tables use 32-bit integers to hold
|
||||
string lengths. </p>
|
||||
<p>In <keyword keyref="impala25_full"/> and higher, Impala truncates
|
||||
<codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> values in Avro
|
||||
tables to (2**31)-1 bytes. If a query encounters a
|
||||
<codeph>STRING</codeph> value longer than (2**31)-1 bytes in an Avro
|
||||
table, the query fails. In earlier releases, encountering such long
|
||||
values in an Avro table could cause a crash.</p>
|
||||
<p><b>Logical types:</b></p>
|
||||
<table frame="all" rowsep="1" colsep="1" id="table_ch2_1mj_gjb">
|
||||
<tgroup cols="2" align="left">
|
||||
<colspec colname="c1" colnum="1" colwidth="151.26pt"/>
|
||||
<colspec colname="c2" colnum="2" colwidth="149.58pt"/>
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Avro type</entry>
|
||||
<entry>Impala type</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>BYTES annotated</entry>
|
||||
<entry>DECIMAL</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>INT32 annotated</entry>
|
||||
<entry>DATE</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
<p>Impala does not support the following Avro data types: RECORD, MAP,
|
||||
ARRAY, UNION, ENUM, FIXED, NULL</p>
|
||||
</conbody>
|
||||
</concept>
|
||||
|
||||
|
||||
@@ -41,44 +41,38 @@ under the License.
|
||||
|
||||
<conbody>
|
||||
|
||||
<p>
|
||||
Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
|
||||
type is supported for HBase, Text, and Parquet.
|
||||
</p>
|
||||
<p> Use the <codeph>DATE</codeph> data type to store date values. The
|
||||
<codeph>DATE</codeph> type is supported for HBase, Text, Avro, and
|
||||
Parquet. </p>
|
||||
|
||||
<p>
|
||||
<b>Range:</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
0000-01-01 to 9999-12-31
|
||||
</p>
|
||||
<p> 0001-01-01 to 9999-12-31 </p>
|
||||
|
||||
<p>
|
||||
<b>Literals and expressions:</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
|
||||
For example, <codeph>DATE '2013-01-01'</codeph>
|
||||
<p> The <codeph>DATE</codeph> literals are in the form of
|
||||
<codeph>DATE'YYYY-MM-DD'</codeph>. For examplep, <codeph>DATE
|
||||
'2013-01-01'</codeph>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Parquet considerations:</b>
|
||||
<b>Parquet and Avro considerations:</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
|
||||
logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
|
||||
Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
|
||||
issue between Impala and older versions of Hive:
|
||||
</p>
|
||||
<p> Parquet and Avro use <codeph>DATE</codeph> logical type for dates. The
|
||||
<codeph>DATE</codeph> logical type annotates an <codeph>INT32</codeph>
|
||||
that stores the number of days from the Unix epoch, January 1, 1970. This
|
||||
representation introduces an interoperability issue between Impala and
|
||||
older versions of Hive: </p>
|
||||
|
||||
<p>
|
||||
If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
|
||||
those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
|
||||
higher, this is no longer an issue.
|
||||
</p>
|
||||
<p> If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a
|
||||
Parquet or Avro table, those dates would be read back incorrectly by
|
||||
Impala and vice versa. In Hive 3.1 and higher, this is no longer an issue. </p>
|
||||
|
||||
<p>
|
||||
<b>Explicit casting between DATE and other data types:</b>
|
||||
|
||||
Reference in New Issue
Block a user