IMPALA-8988: [DOCS] DATE type is supported AVRO tables

Change-Id: I95f37accddadcba436676498d5cbb34cda281846
Reviewed-on: http://gerrit.cloudera.org:8080/14340
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Alex Rodoni <arodoni@cloudera.com>
This commit is contained in:
Alex Rodoni
2019-10-01 16:33:53 -07:00
parent 88cc930a94
commit d24f868cef
2 changed files with 102 additions and 62 deletions

View File

@@ -104,11 +104,6 @@ under the License.
<p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
</note>
<!--
To do: Expand these examples to show switching between impala-shell and Hive, loading some data, and then
doing DESCRIBE and querying the table.
-->
<p>
The following examples demonstrate creating an Avro table in Impala, using either an inline column
specification or one taken from a JSON file stored in HDFS:
@@ -502,41 +497,92 @@ ALTER TABLE avro_table SET TBLPROPERTIES (
<title>Data Type Considerations for Avro Tables</title>
<conbody>
<p>
The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
and the equivalent types in Impala.
</p>
<codeblock><![CDATA[Primitive Types (Avro -> Impala)
--------------------------------
STRING -> STRING
STRING -> CHAR
STRING -> VARCHAR
INT -> INT
BOOLEAN -> BOOLEAN
LONG -> BIGINT
FLOAT -> FLOAT
DOUBLE -> DOUBLE
Logical Types
-------------
BYTES + logicalType = "decimal" -> DECIMAL
Avro Types with No Impala Equivalent
------------------------------------
RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL
Impala Types with No Avro Equivalent
------------------------------------
TIMESTAMP
]]>
</codeblock>
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
<p> The Avro format defines a set of data types whose names differ from
the names of the corresponding Impala data types. If you are preparing
Avro files using other Hadoop components such as Pig or MapReduce, you
might need to work with the type names defined by Avro. The following
figure lists the Avro-defined types and the equivalent types in Impala. </p>
<p><b>Primitive types:</b></p>
<table frame="all" rowsep="1" colsep="1" id="table_uvv_plj_gjb">
<tgroup cols="2" align="left">
<colspec colname="c1" colnum="1" colwidth="143.44pt"/>
<colspec colname="c2" colnum="2" colwidth="165.77pt"/>
<thead>
<row>
<entry>Avro type</entry>
<entry>Impala type</entry>
</row>
</thead>
<tbody>
<row>
<entry>STRING</entry>
<entry>STRING</entry>
</row>
<row>
<entry>STRING</entry>
<entry>CHAR</entry>
</row>
<row>
<entry>STRING</entry>
<entry>VARCHAR</entry>
</row>
<row>
<entry>INT</entry>
<entry>INT</entry>
</row>
<row>
<entry>BOOLEAN</entry>
<entry>BOOLEAN</entry>
</row>
<row>
<entry>LONG</entry>
<entry>BIGINT</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>FLOAT</entry>
</row>
<row>
<entry>DOUBLE</entry>
<entry>DOUBLE</entry>
</row>
</tbody>
</tgroup>
</table>
<p>The Avro specification allows string values up to 2**64 bytes in
length. Impala queries for Avro tables use 32-bit integers to hold
string lengths. </p>
<p>In <keyword keyref="impala25_full"/> and higher, Impala truncates
<codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> values in Avro
tables to (2**31)-1 bytes. If a query encounters a
<codeph>STRING</codeph> value longer than (2**31)-1 bytes in an Avro
table, the query fails. In earlier releases, encountering such long
values in an Avro table could cause a crash.</p>
<p><b>Logical types:</b></p>
<table frame="all" rowsep="1" colsep="1" id="table_ch2_1mj_gjb">
<tgroup cols="2" align="left">
<colspec colname="c1" colnum="1" colwidth="151.26pt"/>
<colspec colname="c2" colnum="2" colwidth="149.58pt"/>
<thead>
<row>
<entry>Avro type</entry>
<entry>Impala type</entry>
</row>
</thead>
<tbody>
<row>
<entry>BYTES annotated</entry>
<entry>DECIMAL</entry>
</row>
<row>
<entry>INT32 annotated</entry>
<entry>DATE</entry>
</row>
</tbody>
</tgroup>
</table>
<p>Impala does not support the following Avro data types: RECORD, MAP,
ARRAY, UNION, ENUM, FIXED, NULL</p>
</conbody>
</concept>

View File

@@ -41,44 +41,38 @@ under the License.
<conbody>
<p>
Use the <codeph>DATE</codeph> data type to store date values. The <codeph>DATE</codeph>
type is supported for HBase, Text, and Parquet.
</p>
<p> Use the <codeph>DATE</codeph> data type to store date values. The
<codeph>DATE</codeph> type is supported for HBase, Text, Avro, and
Parquet. </p>
<p>
<b>Range:</b>
</p>
<p>
0000-01-01 to 9999-12-31
</p>
<p> 0001-01-01 to 9999-12-31 </p>
<p>
<b>Literals and expressions:</b>
</p>
<p>
The <codeph>DATE</codeph> literals are in the form of <codeph>DATE'YYYY-MM-DD'</codeph>.
For example, <codeph>DATE '2013-01-01'</codeph>
<p> The <codeph>DATE</codeph> literals are in the form of
<codeph>DATE'YYYY-MM-DD'</codeph>. For examplep, <codeph>DATE
'2013-01-01'</codeph>
</p>
<p>
<b>Parquet considerations:</b>
<b>Parquet and Avro considerations:</b>
</p>
<p>
Parquet uses <codeph>DATE</codeph> logical type for dates. The <codeph>DATE</codeph>
logical type annotates an <codeph>INT32</codeph> that stores the number of days from the
Unix epoch, January 1, 1970. This representation introduces a parquet interoperability
issue between Impala and older versions of Hive:
</p>
<p> Parquet and Avro use <codeph>DATE</codeph> logical type for dates. The
<codeph>DATE</codeph> logical type annotates an <codeph>INT32</codeph>
that stores the number of days from the Unix epoch, January 1, 1970. This
representation introduces an interoperability issue between Impala and
older versions of Hive: </p>
<p>
If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a parquet table,
those dates will be read back incorrectly by Impala and vice versa. In Hive 3.1 and
higher, this is no longer an issue.
</p>
<p> If Hive versions lower than 3.1 wrote dates earlier than 1582-10-15 to a
Parquet or Avro table, those dates would be read back incorrectly by
Impala and vice versa. In Hive 3.1 and higher, this is no longer an issue. </p>
<p>
<b>Explicit casting between DATE and other data types:</b>