mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-8988: [DOCS] DATE type is supported AVRO tables
Change-Id: I95f37accddadcba436676498d5cbb34cda281846 Reviewed-on: http://gerrit.cloudera.org:8080/14340 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
This commit is contained in:
@@ -104,11 +104,6 @@ under the License.
|
|||||||
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
|
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
|
||||||
</note>
|
</note>
|
||||||
|
|
||||||
<!--
|
|
||||||
To do: Expand these examples to show switching between impala-shell and Hive, loading some data, and then
|
|
||||||
doing DESCRIBE and querying the table.
|
|
||||||
-->
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The following examples demonstrate creating an Avro table in Impala, using either an inline column
|
The following examples demonstrate creating an Avro table in Impala, using either an inline column
|
||||||
specification or one taken from a JSON file stored in HDFS:
|
specification or one taken from a JSON file stored in HDFS:
|
||||||
@@ -502,41 +497,92 @@ ALTER TABLE avro_table SET TBLPROPERTIES (
|
|||||||
<title>Data Type Considerations for Avro Tables</title>
|
<title>Data Type Considerations for Avro Tables</title>
|
||||||
|
|
||||||
<conbody>
|
<conbody>
|
||||||
|
<p> The Avro format defines a set of data types whose names differ from
|
||||||
<p>
|
the names of the corresponding Impala data types. If you are preparing
|
||||||
The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
|
Avro files using other Hadoop components such as Pig or MapReduce, you
|
||||||
data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
|
might need to work with the type names defined by Avro. The following
|
||||||
might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
|
figure lists the Avro-defined types and the equivalent types in Impala. </p>
|
||||||
and the equivalent types in Impala.
|
<p><b>Primitive types:</b></p>
|
||||||
</p>
|
<table frame="all" rowsep="1" colsep="1" id="table_uvv_plj_gjb">
|
||||||
|
<tgroup cols="2" align="left">
|
||||||
<codeblock><![CDATA[Primitive Types (Avro -> Impala)
|
<colspec colname="c1" colnum="1" colwidth="143.44pt"/>
|
||||||
--------------------------------
|
<colspec colname="c2" colnum="2" colwidth="165.77pt"/>
|
||||||
STRING -> STRING
|
<thead>
|
||||||
STRING -> CHAR
|
<row>
|
||||||
STRING -> VARCHAR
|
<entry>Avro type</entry>
|
||||||
INT -> INT
|
<entry>Impala type</entry>
|
||||||
BOOLEAN -> BOOLEAN
|
</row>
|
||||||
LONG -> BIGINT
|
</thead>
|
||||||
FLOAT -> FLOAT
|
<tbody>
|
||||||
DOUBLE -> DOUBLE
|
<row>
|
||||||
|
<entry>STRING</entry>
|
||||||
Logical Types
|
<entry>STRING</entry>
|
||||||
-------------
|
</row>
|
||||||
BYTES + logicalType = "decimal" -> DECIMAL
|
<row>
|
||||||
|
<entry>STRING</entry>
|
||||||
Avro Types with No Impala Equivalent
|
<entry>CHAR</entry>
|
||||||
------------------------------------
|
</row>
|
||||||
RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL
|
<row>
|
||||||
|
<entry>STRING</entry>
|
||||||
Impala Types with No Avro Equivalent
|
<entry>VARCHAR</entry>
|
||||||
------------------------------------
|
</row>
|
||||||
TIMESTAMP
|
<row>
|
||||||
]]>
|
<entry>INT</entry>
|
||||||
</codeblock>
|
<entry>INT</entry>
|
||||||
|
</row>
|
||||||
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
|
<row>
|
||||||
|
<entry>BOOLEAN</entry>
|
||||||
|
<entry>BOOLEAN</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>LONG</entry>
|
||||||
|
<entry>BIGINT</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>FLOAT</entry>
|
||||||
|
<entry>FLOAT</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>DOUBLE</entry>
|
||||||
|
<entry>DOUBLE</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<p>The Avro specification allows string values up to 2**64 bytes in
|
||||||
|
length. Impala queries for Avro tables use 32-bit integers to hold
|
||||||
|
string lengths. </p>
|
||||||
|
<p>In <keyword keyref="impala25_full"/> and higher, Impala truncates
|
||||||
|
<codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> values in Avro
|
||||||
|
tables to (2**31)-1 bytes. If a query encounters a
|
||||||
|
<codeph>STRING</codeph> value longer than (2**31)-1 bytes in an Avro
|
||||||
|
table, the query fails. In earlier releases, encountering such long
|
||||||
|
values in an Avro table could cause a crash.</p>
|
||||||
|
<p><b>Logical types:</b></p>
|
||||||
|
<table frame="all" rowsep="1" colsep="1" id="table_ch2_1mj_gjb">
|
||||||
|
<tgroup cols="2" align="left">
|
||||||
|
<colspec colname="c1" colnum="1" colwidth="151.26pt"/>
|
||||||
|
<colspec colname="c2" colnum="2" colwidth="149.58pt"/>
|
||||||
|
<thead>
|
||||||
|
<row>
|
||||||
|
<entry>Avro type</entry>
|
||||||
|
<entry>Impala type</entry>
|
||||||
|
</row>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<row>
|
||||||
|
<entry>BYTES annotated</entry>
|
||||||
|
<entry>DECIMAL</entry>
|
||||||
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry>INT32 annotated</entry>
|
||||||
|
<entry>DATE</entry>
|
||||||
|
</row>
|
||||||
|
</tbody>
|
||||||
|
</tgroup>
|
||||||
|
</table>
|
||||||
|
<p>Impala does not support the following Avro data types: RECORD, MAP,
|
||||||
|
ARRAY, UNION, ENUM, FIXED, NULL</p>
|
||||||
</conbody>
|
</conbody>
|
||||||
</concept>
|
</concept>
|
||||||
|
|
||||||
|
|||||||
@@ -41,44 +41,38 @@ under the License.
|
|||||||
|
|
||||||
<conbody>
|
<conbody>
|
||||||
|
|
||||||
<p>
|
<p> Use the <codeph>DATE</codeph> data type to store date values. The
|
||||||
Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
|
<codeph>DATE</codeph> type is supported for HBase, Text, Avro, and
|
||||||
type is supported for HBase, Text, and Parquet.
|
Parquet. </p>
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Range:</b>
|
<b>Range:</b>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p> 0001-01-01 to 9999-12-31 </p>
|
||||||
0000-01-01 to 9999-12-31
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Literals and expressions:</b>
|
<b>Literals and expressions:</b>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p> The <codeph>DATE</codeph> literals are in the form of
|
||||||
The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
|
<codeph>DATE'YYYY-MM-DD'</codeph>. For examplep, <codeph>DATE
|
||||||
For example, <codeph>DATE '2013-01-01'</codeph>
|
'2013-01-01'</codeph>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Parquet considerations:</b>
|
<b>Parquet and Avro considerations:</b>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p> Parquet and Avro use <codeph>DATE</codeph> logical type for dates. The
|
||||||
Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
|
<codeph>DATE</codeph> logical type annotates an <codeph>INT32</codeph>
|
||||||
logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
|
that stores the number of days from the Unix epoch, January 1, 1970. This
|
||||||
Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
|
representation introduces an interoperability issue between Impala and
|
||||||
issue between Impala and older versions of Hive:
|
older versions of Hive: </p>
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p> If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a
|
||||||
If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
|
Parquet or Avro table, those dates would be read back incorrectly by
|
||||||
those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
|
Impala and vice versa. In Hive 3.1 and higher, this is no longer an issue. </p>
|
||||||
higher, this is no longer an issue.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Explicit casting between DATE and other data types:</b>
|
<b>Explicit casting between DATE and other data types:</b>
|
||||||
|
|||||||
Reference in New Issue
Block a user