mirror of
https://github.com/apache/impala.git
synced 2025-12-30 03:01:44 -05:00
For this change to land in master, the audience="hidden" code review needs to be completed first. Otherwise, the doc build would still work but the audience="hidden" content would be visible rather than hidden as desired. Some work happening in parallel might introduce additional instances of audience="Cloudera". I suggest addressing those in a followup CR so this global change can land quickly. Since the changes apply across so many different files, but are so narrow in scope, I suggest that the way to validate (check that no extraneous changes were introduced accidentally) is to diff just the changed lines: git diff -U0 HEAD^ HEAD In patch set 2, I updated other topics marked audience="Cloudera" by CRs that were pushed in the meantime. Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8 Reviewed-on: http://gerrit.cloudera.org:8080/5613 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins
236 lines
8.0 KiB
XML
236 lines
8.0 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="varchar" rev="2.0.0">
|
|
|
|
<title>VARCHAR Data Type (<keyword keyref="impala20"/> or higher only)</title>
|
|
<titlealts audience="PDF"><navtitle>VARCHAR</navtitle></titlealts>
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Data Types"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Schemas"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p rev="2.0.0">
|
|
<indexterm audience="hidden">VARCHAR data type</indexterm>
|
|
A variable-length character type, truncated during processing if necessary to fit within the specified
|
|
length.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<p>
|
|
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
|
|
</p>
|
|
|
|
<codeblock><varname>column_name</varname> VARCHAR(<varname>max_length</varname>)</codeblock>
|
|
|
|
<p>
|
|
The maximum length you can specify is 65,535.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/partitioning_bad"/>
|
|
|
|
<!--
|
|
<p>
|
|
This type can be used for partition key columns.
|
|
Because of the efficiency advantage of numeric values over character-based values,
|
|
if the partition key is a string representation of a number,
|
|
prefer to use an integer data type with sufficient range (<codeph>INT</codeph>,
|
|
<codeph>BIGINT</codeph>, and so on) rather than this type.
|
|
</p>
|
|
-->
|
|
|
|
<p conref="../shared/impala_common.xml#common/hbase_no"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/parquet_blurb"/>
|
|
|
|
<ul>
|
|
<li>
|
|
This type can be read from and written to Parquet files.
|
|
</li>
|
|
|
|
<li>
|
|
There is no requirement for a particular level of Parquet.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet files generated by Impala and containing this type can be freely interchanged with other components
|
|
such as Hive and MapReduce.
|
|
</li>
|
|
|
|
<li>
|
|
Parquet data files can contain values that are longer than allowed by the
|
|
<codeph>VARCHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra trailing characters
|
|
when it processes those values during a query.
|
|
</li>
|
|
</ul>
|
|
|
|
<p conref="../shared/impala_common.xml#common/text_blurb"/>
|
|
|
|
<p>
|
|
Text data files can contain values that are longer than allowed by the
|
|
<codeph>VARCHAR(<varname>n</varname>)</codeph> length limit. Any extra trailing characters are ignored when
|
|
Impala processes those values during a query.
|
|
</p>
|
|
|
|
<p><b>Avro considerations:</b></p>
|
|
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>
|
|
|
|
<p>
|
|
You can use <codeph>ALTER TABLE ... CHANGE</codeph> to switch column data types to and from
|
|
<codeph>VARCHAR</codeph>. You can convert from <codeph>STRING</codeph> to
|
|
<codeph>VARCHAR(<varname>n</varname>)</codeph>, or from <codeph>VARCHAR(<varname>n</varname>)</codeph> to
|
|
<codeph>STRING</codeph>, or from <codeph>CHAR(<varname>n</varname>)</codeph> to
|
|
<codeph>VARCHAR(<varname>n</varname>)</codeph>, or from <codeph>VARCHAR(<varname>n</varname>)</codeph> to
|
|
<codeph>CHAR(<varname>n</varname>)</codeph>. When switching back and forth between <codeph>VARCHAR</codeph>
|
|
and <codeph>CHAR</codeph>, you can also change the length value. This schema evolution works the same for
|
|
tables using any file format. If a table contains values longer than the maximum length defined for a
|
|
<codeph>VARCHAR</codeph> column, Impala does not return an error. Any extra trailing characters are ignored
|
|
when Impala processes those values during a query.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
|
|
|
|
<p>
|
|
This type is available on CDH 5.2 or higher.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/internals_min_bytes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_20"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/column_stats_variable"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
The following examples show how long and short <codeph>VARCHAR</codeph> values are treated. Values longer
|
|
than the maximum specified length are truncated by <codeph>CAST()</codeph>, or when queried from existing
|
|
data files. Values shorter than the maximum specified length are represented as the actual length of the
|
|
value, with no extra padding as seen with <codeph>CHAR</codeph> values.
|
|
</p>
|
|
|
|
<codeblock>create table varchar_1 (s varchar(1));
|
|
create table varchar_4 (s varchar(4));
|
|
create table varchar_20 (s varchar(20));
|
|
|
|
insert into varchar_1 values (cast('a' as varchar(1))), (cast('b' as varchar(1))), (cast('hello' as varchar(1))), (cast('world' as varchar(1)));
|
|
insert into varchar_4 values (cast('a' as varchar(4))), (cast('b' as varchar(4))), (cast('hello' as varchar(4))), (cast('world' as varchar(4)));
|
|
insert into varchar_20 values (cast('a' as varchar(20))), (cast('b' as varchar(20))), (cast('hello' as varchar(20))), (cast('world' as varchar(20)));
|
|
|
|
select * from varchar_1;
|
|
+---+
|
|
| s |
|
|
+---+
|
|
| a |
|
|
| b |
|
|
| h |
|
|
| w |
|
|
+---+
|
|
select * from varchar_4;
|
|
+------+
|
|
| s |
|
|
+------+
|
|
| a |
|
|
| b |
|
|
| hell |
|
|
| worl |
|
|
+------+
|
|
[localhost:21000] > select * from varchar_20;
|
|
+-------+
|
|
| s |
|
|
+-------+
|
|
| a |
|
|
| b |
|
|
| hello |
|
|
| world |
|
|
+-------+
|
|
select concat('[',s,']') as s from varchar_20;
|
|
+---------+
|
|
| s |
|
|
+---------+
|
|
| [a] |
|
|
| [b] |
|
|
| [hello] |
|
|
| [world] |
|
|
+---------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following example shows how identical <codeph>VARCHAR</codeph> values compare as equal, even if the
|
|
columns are defined with different maximum lengths. Both tables contain <codeph>'a'</codeph> and
|
|
<codeph>'b'</codeph> values. The longer <codeph>'hello'</codeph> and <codeph>'world'</codeph> values from the
|
|
<codeph>VARCHAR_20</codeph> table were truncated when inserted into the <codeph>VARCHAR_1</codeph> table.
|
|
</p>
|
|
|
|
<codeblock>select s from varchar_1 join varchar_20 using (s);
|
|
+-------+
|
|
| s |
|
|
+-------+
|
|
| a |
|
|
| b |
|
|
+-------+
|
|
</codeblock>
|
|
|
|
<p>
|
|
The following examples show how <codeph>VARCHAR</codeph> values are freely interchangeable with
|
|
<codeph>STRING</codeph> values in contexts such as comparison operators and built-in functions:
|
|
</p>
|
|
|
|
<codeblock>select length(cast('foo' as varchar(100))) as length;
|
|
+--------+
|
|
| length |
|
|
+--------+
|
|
| 3 |
|
|
+--------+
|
|
select cast('xyz' as varchar(5)) > cast('abc' as varchar(10)) as greater;
|
|
+---------+
|
|
| greater |
|
|
+---------+
|
|
| true |
|
|
+---------+
|
|
</codeblock>
|
|
|
|
<p conref="../shared/impala_common.xml#common/udf_blurb_no"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
<xref href="impala_string.xml#string"/>, <xref href="impala_char.xml#char"/>,
|
|
<xref href="impala_literals.xml#string_literals"/>,
|
|
<xref href="impala_string_functions.xml#string_functions"/>
|
|
</p>
|
|
</conbody>
|
|
</concept>
|