mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Also, cleaned up confusing examples. Change-Id: Id89dcf44e31f1bc56d888527585b3ec90229981a Reviewed-on: http://gerrit.cloudera.org:8080/11022 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
250 lines
8.4 KiB
XML
250 lines
8.4 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="char" rev="2.0.0">
|
|
|
|
<title>CHAR Data Type (<keyword keyref="impala20"/> or higher only)</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>CHAR</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Data Types"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Schemas"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p rev="2.0.0">
|
|
A fixed-length character type, padded with trailing spaces if necessary to achieve the
|
|
specified length. If values are longer than the specified length, Impala truncates any
|
|
trailing characters.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<p>
|
|
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
|
|
</p>
|
|
|
|
<codeblock><varname>column_name</varname> CHAR(<varname>length</varname>)</codeblock>
|
|
|
|
<p>
|
|
The maximum <varname>length</varname> you can specify is 255.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Semantics of trailing spaces:</b>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
When you store a <codeph>CHAR</codeph> value shorter than the specified length in a
|
|
table, queries return the value padded with trailing spaces if necessary; the resulting
|
|
value has the same length as specified in the column definition.
|
|
</li>
|
|
|
|
<li>
|
|
Leading spaces in <codeph>CHAR</codeph> are preserved within the data file.
|
|
</li>
|
|
|
|
<li>
|
|
If you store a <codeph>CHAR</codeph> value containing trailing spaces in a table, those
|
|
trailing spaces are not stored in the data file. When the value is retrieved by a query,
|
|
the result could have a different number of trailing spaces. That is, the value includes
|
|
however many spaces are needed to pad it to the specified length of the column.
|
|
</li>
|
|
|
|
<li>
|
|
If you compare two <codeph>CHAR</codeph> values that differ only in the number of
|
|
trailing spaces, those values are considered identical.
|
|
</li>
|
|
|
|
<li>
|
|
When comparing or processing <codeph>CHAR</codeph> values:
|
|
<ul>
|
|
<li>
|
|
<codeph>CAST()</codeph> truncates any longer string to fit within
|
|
the defined length. For example:
|
|
<codeblock>SELECT CAST('x' AS CHAR(4)) = CAST('x ' AS CHAR(4)); -- Returns TRUE.
|
|
</codeblock>
|
|
</li>
|
|
<li>
|
|
If a <codeph>CHAR</codeph> value is shorter than the specified
|
|
length, it is padded on the right with spaces until it matches the
|
|
specified length.
|
|
</li>
|
|
<li>
|
|
<codeph>CHAR_LENGTH()</codeph> returns the length including any
|
|
trailing spaces.
|
|
</li>
|
|
<li>
|
|
<codeph>LENGTH()</codeph> returns the length excluding trailing
|
|
spaces.
|
|
</li>
|
|
<li>
|
|
<codeph>CONCAT()</codeph> returns the length including trailing
|
|
spaces.
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/partitioning_bad"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/hbase_no"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/parquet_blurb"/>
|
|
|
|
<ul>
|
|
<li>
|
|
This type can be read from and written to Parquet files.
|
|
</li>
|
|
|
|
<li>
|
|
There is no requirement for a particular level of Parquet.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet files generated by Impala and containing this type can be freely interchanged
|
|
with other components such as Hive and MapReduce.
|
|
</li>
|
|
|
|
<li>
|
|
Any trailing spaces, whether implicitly or explicitly specified, are not written to the
|
|
Parquet data files.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet data files might contain values that are longer than allowed by the
|
|
<codeph>CHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra
|
|
trailing characters when it processes those values during a query.
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/text_blurb"/>
|
|
|
|
<p>
|
|
Text data files might contain values that are longer than allowed for a particular
|
|
<codeph>CHAR(<varname>n</varname>)</codeph> column. Any extra trailing characters are
|
|
ignored when Impala processes those values during a query. Text data files can also
|
|
contain values that are shorter than the defined length limit, and Impala pads them with
|
|
trailing spaces up to the specified length. Any text data files produced by Impala
|
|
<codeph>INSERT</codeph> statements do not include any trailing blanks for
|
|
<codeph>CHAR</codeph> columns.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Avro considerations:</b>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
|
|
|
|
<p>
|
|
This type is available using <keyword keyref="impala20_full"/> or higher.
|
|
</p>
|
|
|
|
<p>
|
|
Some other database systems make the length specification optional. For Impala, the length
|
|
is required.
|
|
</p>
|
|
|
|
<!--
|
|
<p>
|
|
The Impala maximum length is larger than for the <codeph>CHAR</codeph> data type in Hive.
|
|
If a Hive query encounters a <codeph>CHAR</codeph> value longer than 255 during processing,
|
|
it silently treats the value as length 255.
|
|
</p>
|
|
-->
|
|
|
|
<p conref="../shared/impala_common.xml#common/internals_max_bytes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_20"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/udf_blurb_no"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
|
|
|
|
<p>
|
|
<b>Performance consideration:</b>
|
|
</p>
|
|
|
|
<p>
|
|
The <codeph>CHAR</codeph> type currently does not have the Impala Codegen support, and we
|
|
recommend using <codeph>VARCHAR</codeph> or <codeph>STRING</codeph> over
|
|
<codeph>CHAR</codeph> as the performance gain of Codegen outweighs the benefits of fixed
|
|
width <codeph>CHAR</codeph>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<p>
|
|
Because the blank-padding behavior requires allocating the maximum length for each value
|
|
in memory, for scalability reasons, you should avoid declaring <codeph>CHAR</codeph>
|
|
columns that are much longer than typical values in that column.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>
|
|
|
|
<p>
|
|
When an expression compares a <codeph>CHAR</codeph> with a <codeph>STRING</codeph> or
|
|
<codeph>VARCHAR</codeph>, the <codeph>CHAR</codeph> value is implicitly converted to
|
|
<codeph>STRING</codeph> first, with trailing spaces preserved.
|
|
</p>
|
|
|
|
<p>
|
|
This behavior differs from other popular database systems. To get the expected result of
|
|
<codeph>TRUE</codeph>, cast the expressions on both sides to <codeph>CHAR</codeph> values
|
|
of the appropriate length. For example:
|
|
</p>
|
|
|
|
<codeblock>SELECT CAST("foo " AS CHAR(5)) = CAST('foo' AS CHAR(3)); -- Returns TRUE.</codeblock>
|
|
|
|
<p>
|
|
This behavior is subject to change in future releases.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_string.xml#string"/>, <xref href="impala_varchar.xml#varchar"/>,
|
|
<xref href="impala_literals.xml#string_literals"/>,
|
|
<xref href="impala_string_functions.xml#string_functions"/>
|
|
</p>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|