mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.
Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);
"where": Case-insensitive trim direction. Valid options are "leading",
"trailing", and "both". "leading" means trimming characters from the
start; "trailing" means trimming characters from the end; "both" means
trimming characters from both sides. For Syntax #2, since no "where"
is specified, the option "both" is implied by default.
"charset": Case-sensitive characters to be removed. This argument is
regarded as a character set going to be removed. The occurrence order
of each character doesn't matter and duplicated instances of the same
character will be ignored. NULL argument implies " " (standard space)
by default. Empty argument ("" or '') makes TRIM return the string
untouched. For Syntax #1, since no "charset" is specified, it trims
" " (standard space) by default.
"string": Case-sensitive target string to trim. This argument can be
NULL.
The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().
UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.
Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.
2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.
3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.
Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
1879 lines
61 KiB
XML
1879 lines
61 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
|
<concept id="string_functions">
|
|
|
|
<title>Impala String Functions</title>
|
|
|
|
<titlealts audience="PDF">
|
|
|
|
<navtitle>String Functions</navtitle>
|
|
|
|
</titlealts>
|
|
|
|
<prolog>
|
|
<metadata>
|
|
<data name="Category" value="Impala"/>
|
|
<data name="Category" value="Impala Functions"/>
|
|
<data name="Category" value="SQL"/>
|
|
<data name="Category" value="Data Analysts"/>
|
|
<data name="Category" value="Developers"/>
|
|
<data name="Category" value="Querying"/>
|
|
</metadata>
|
|
</prolog>
|
|
|
|
<conbody>
|
|
|
|
<p rev="2.0.0">
|
|
String functions are classified as those primarily accepting or returning
|
|
<codeph>STRING</codeph>, <codeph>VARCHAR</codeph>, or <codeph>CHAR</codeph> data types,
|
|
for example to measure the length of a string or concatenate two strings together.
|
|
<ul>
|
|
<li>
|
|
All the functions that accept <codeph>STRING</codeph> arguments also accept the
|
|
<codeph>VARCHAR</codeph> and <codeph>CHAR</codeph> types introduced in Impala 2.0.
|
|
</li>
|
|
|
|
<li>
|
|
Whenever <codeph>VARCHAR</codeph> or <codeph>CHAR</codeph> values are passed to a
|
|
function that returns a string value, the return type is normalized to
|
|
<codeph>STRING</codeph>. For example, a call to <codeph>CONCAT()</codeph> with a mix
|
|
of <codeph>STRING</codeph>, <codeph>VARCHAR</codeph>, and <codeph>CHAR</codeph>
|
|
arguments produces a <codeph>STRING</codeph> result.
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/related_info"/>
|
|
|
|
<p>
|
|
The string functions operate mainly on these data types:
|
|
<xref href="impala_string.xml#string"/>, <xref href="impala_varchar.xml#varchar"/>, and
|
|
<xref href="impala_char.xml#char"/>.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Function reference:</b>
|
|
</p>
|
|
|
|
<p>
|
|
Impala supports the following string functions:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<xref href="#string_functions/ascii">ASCII</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/base64decode">BASE64DECODE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/base64encode">BASE64ENCODE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/btrim">BTRIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/bytes">BYTES</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/char_length">CHAR_LENGTH</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/chr">CHR</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/concat">CONCAT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/concat_ws">CONCAT_WS</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/find_in_set">FIND_IN_SET</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/group_concat">GROUP_CONCAT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/initcap">INITCAP</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/instr">INSTR</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/jaro_distance">JARO_DISTANCE, JARO_DIST</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/jaro_similarity">JARO_SIMILARITY, JARO_SIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/jaro_winkler_distance"
|
|
>JARO_WINKER_DISTANCE,
|
|
JW_DST</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/jaro_winkler_similarity"
|
|
>JARO_WINKER_SIMILARITY,
|
|
JW_SIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/left">LEFT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/length">LENGTH</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/levenshtein">LEVENSHTEIN, LE_DST</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/locate">LOCATE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/lower">LOWER, LCASE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/lpad">LPAD</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/ltrim">LTRIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/parse_url">PARSE_URL</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/prettyprint_bytes">PRETTYPRINT_BYTES</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/regexp_escape">REGEXP_ESCAPE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/regexp_extract">REGEXP_EXTRACT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/regexp_like">REGEXP_LIKE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/regexp_replace">REGEXP_REPLACE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/repeat">REPEAT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/replace">REPLACE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/reverse">REVERSE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/right">RIGHT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/rpad">RPAD</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/rtrim">RTRIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/space">SPACE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/split_part">SPLIT_PART</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/strleft">STRLEFT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/strright">STRRIGHT</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/substr">SUBSTR, SUBSTRING</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/translate">TRANSLATE</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/trim">TRIM</xref>
|
|
</li>
|
|
|
|
<li>
|
|
<xref href="#string_functions/upper">UPPER, UCASE</xref>
|
|
</li>
|
|
</ul>
|
|
|
|
<dl>
|
|
<dlentry id="ascii">
|
|
|
|
<dt>
|
|
ASCII(STRING str)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the numeric ASCII code of the first character of the argument.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="base64decode" rev="2.6.0 IMPALA-2107">
|
|
|
|
<dt>
|
|
BASE64DECODE(STRING str)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b>
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
For general information about Base64 encoding, see
|
|
<xref
|
|
keyref="base64"/>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_use_cases"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_charset"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_alignment"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_error_handling"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_examples"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="base64encode" rev="2.6.0 IMPALA-2107">
|
|
|
|
<dt>
|
|
BASE64ENCODE(STRING str)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b>
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
For general information about Base64 encoding, see
|
|
<xref
|
|
keyref="base64"/>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_use_cases"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_charset"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_alignment"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/base64_examples"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="2.3.0" id="btrim">
|
|
|
|
<dt>
|
|
BTRIM(STRING a), BTRIM(STRING a, STRING chars_to_trim)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Removes all instances of one or more characters from the start and end
|
|
of a <codeph>STRING</codeph> value. By default, removes only spaces. If a
|
|
non-<codeph>NULL</codeph> optional second argument is specified, the function removes
|
|
all occurrences of characters in that second argument from the beginning and end of
|
|
the string.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_230"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
The following examples show the default <codeph>BTRIM()</codeph> behavior, and what
|
|
changes when you specify the optional second argument. All the examples bracket the
|
|
output value with <codeph>[ ]</codeph> so that you can see any leading or trailing
|
|
spaces in the <codeph>BTRIM()</codeph> result. By default, the function removes and
|
|
number of both leading and trailing spaces. When the second argument is specified,
|
|
any number of occurrences of any character in the second argument are removed from
|
|
the start and end of the input string; in this case, spaces are not removed (unless
|
|
they are part of the second argument) and any instances of the characters are not
|
|
removed if they do not come right at the beginning or end of the string.
|
|
</p>
|
|
<codeblock>-- Remove multiple spaces before and one space after.
|
|
select concat('[',btrim(' hello '),']');
|
|
+---------------------------------------+
|
|
| concat('[', btrim(' hello '), ']') |
|
|
+---------------------------------------+
|
|
| [hello] |
|
|
+---------------------------------------+
|
|
|
|
-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
|
|
select concat('[',btrim('xy hello zyzzxx','xyz'),']');
|
|
+------------------------------------------------------+
|
|
| concat('[', btrim('xy hello zyzzxx', 'xyz'), ']') |
|
|
+------------------------------------------------------+
|
|
| [ hello ] |
|
|
+------------------------------------------------------+
|
|
|
|
-- Remove any instances of x or y or z at beginning or end.
|
|
-- Leave x, y, z alone in the middle of the string.
|
|
select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
|
|
+----------------------------------------------------+
|
|
| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
|
|
+----------------------------------------------------+
|
|
| [helxyzlo] |
|
|
+----------------------------------------------------+
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="bytes">
|
|
<dt>BYTES</dt>
|
|
<dd><b>Purpose:</b> Returns the number of bytes contained in the specified byte string. <p>
|
|
<b>Syntax:</b> BYTES (<i>byte_expression</i>)</p><p>Where:</p><p><i>byte_expression</i>
|
|
is the byte string for which the number of bytes is to be returned.</p><p
|
|
conref="../shared/impala_common.xml#common/usage_notes_blurb"/><p>The BYTES function is
|
|
similar to the LENGTH() function except that it always returns the number of bytes
|
|
regardless of the status of UTF-8 mode whether it is turned ON or OFF.</p><p>The
|
|
following is the list of supported string data types to be used in
|
|
<i>byte_expression</i>:</p><ul>
|
|
<li>STRING</li>
|
|
<li>VARCHAR</li>
|
|
<li>CHAR</li>
|
|
</ul><note>The BYTES (<i>byte_expression</i>) function counts the trailing zeros because
|
|
trailing zero bytes are considered bytes. In the case of a fixed <i>length</i> column,
|
|
the length of the value is always equal to the length defined for the column whereas the
|
|
length of the value in a <i>variable</i> length column is always equal to the number of
|
|
bytes, including any trailing zero bytes, contained in that value.</note><p
|
|
conref="../shared/impala_common.xml#common/example_blurb"/><p>The following example
|
|
obtains the number of bytes from “cloudera” by applying the BYTES function to the column
|
|
“cloudera”, which is type
|
|
VARCHAR.</p><codeblock>SELECT bytes(cast('cloudera' as varchar(20))) FROM xyz;
|
|
+----------------------------------------+
|
|
| bytes(cast('cloudera' as varchar(20))) |
|
|
+----------------------------------------+
|
|
| 8 |
|
|
+----------------------------------------+
|
|
</codeblock><note>Use the TRIM function on the <i>byte_expression</i> to
|
|
exclude the trailing blanks included in the byte count for a data value.</note>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="1.3.0" id="char_length">
|
|
|
|
<dt>
|
|
CHAR_LENGTH(STRING a), CHARACTER_LENGTH(STRING a)
|
|
</dt>
|
|
|
|
<dd rev="IMPALA-6391 IMPALA-2172">
|
|
<b>Purpose:</b> Returns the length in characters of the argument string, including any
|
|
trailing spaces that pad a <codeph>CHAR</codeph> value.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
When applied to a <codeph>STRING</codeph> value, it returns the same result as the
|
|
<codeph>length()</codeph> function. When applied to a <codeph>CHAR</codeph> value,
|
|
it might return a larger value than <codeph>length()</codeph> does, to account for
|
|
trailing spaces in the <codeph>CHAR</codeph>.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/length_demo"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="2.3.0" id="chr">
|
|
|
|
<dt>
|
|
CHR(INT character_code)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a character specified by a decimal code point value. The
|
|
interpretation and display of the resulting character depends on your system locale.
|
|
Because consistent processing of Impala string values is only guaranteed for values
|
|
within the ASCII range, only use this function for values corresponding to ASCII
|
|
characters. In particular, parameter values greater than 255 return an empty string.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b> Can be used as the inverse of the <codeph>ascii()</codeph>
|
|
function, which converts a character to its numeric ASCII code.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_230"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
<codeblock>SELECT chr(65);
|
|
+---------+
|
|
| chr(65) |
|
|
+---------+
|
|
| A |
|
|
+---------+
|
|
|
|
SELECT chr(97);
|
|
+---------+
|
|
| chr(97) |
|
|
+---------+
|
|
| a |
|
|
+---------+
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="concat">
|
|
|
|
<dt>
|
|
CONCAT(STRING a, STRING b...)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a single string representing all the argument values joined
|
|
together. If any argument is <codeph>NULL</codeph>, it returns <codeph>NULL</codeph>.
|
|
<p/>
|
|
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/concat_blurb"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="concat_ws">
|
|
|
|
<dt>
|
|
CONCAT_WS(STRING sep, STRING a, STRING b...)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a single string representing the second and following argument
|
|
values joined together, delimited by a specified separator.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/concat_blurb"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="find_in_set">
|
|
|
|
<dt>
|
|
FIND_IN_SET(STRING str, STRING strList)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the position (starting from 1) of the first occurrence of a
|
|
specified string within a comma-separated string. Returns <codeph>NULL</codeph> if
|
|
either argument is <codeph>NULL</codeph>, 0 if the search string is not found, or 0 if
|
|
the search string contains a comma.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="1.2" id="group_concat">
|
|
|
|
<dt>
|
|
GROUP_CONCAT(STRING s [, STRING sep])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a single string representing the argument value concatenated
|
|
together for each row of the result set. If the optional separator string is
|
|
specified, the separator is added between each pair of concatenated values.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/concat_blurb"/>
|
|
|
|
<p>
|
|
By default, returns a single string covering the whole result set. To include other
|
|
columns or values in the result set, or to produce multiple concatenated strings for
|
|
subsets of rows, include a <codeph>GROUP BY</codeph> clause in the query.
|
|
</p>
|
|
|
|
<p>
|
|
Strictly speaking, <codeph>GROUP_CONCAT()</codeph> is an aggregate function, not a
|
|
scalar function like the others in this list. For additional details and examples,
|
|
see <xref
|
|
href="impala_group_concat.xml#group_concat"/>.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="1.2" id="initcap">
|
|
|
|
<dt>
|
|
INITCAP(STRING str)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the input string with the first letter of each word
|
|
capitalized and all other letters in lowercase.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Example: </b>
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>INITCAP("i gOt mY ChiCkeNs in tHe yard.")</codeph> returns <codeph>" I Got
|
|
My Chickens In The Yard."</codeph>.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="instr">
|
|
|
|
<dt>
|
|
INSTR(STRING str, STRING substr <ph rev="IMPALA-3973">[, BIGINT position [, BIGINT
|
|
occurrence ] ]</ph>)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the position (starting from 1) of the first
|
|
occurrence of a <varname>substr</varname> within a longer string. <p>
|
|
<b>Return type:</b>
|
|
<codeph>INT</codeph>
|
|
</p>
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
<p> If the <varname>substr</varname> is not present in
|
|
<varname>str</varname>, the function returns 0. </p>
|
|
<p rev="IMPALA-3973 2.8.0"> The optional third and fourth arguments
|
|
let you find instances of the <varname>substr</varname> other than
|
|
the first instance starting from the left. </p>
|
|
<ul>
|
|
<li> The third argument, <varname>position</varname>, lets you
|
|
specify a starting point within the <varname>str</varname> other
|
|
than 1.
|
|
<codeblock>
|
|
-- Restricting the search to positions 7..end,
|
|
-- the first occurrence of 'b' is at position 9.
|
|
select instr('foo bar bletch', 'b', 7);
|
|
+---------------------------------+
|
|
| instr('foo bar bletch', 'b', 7) |
|
|
+---------------------------------+
|
|
| 9 |
|
|
+---------------------------------+
|
|
</codeblock>
|
|
</li>
|
|
<li> If there are no more occurrences after the specified position,
|
|
the result is 0. </li>
|
|
<li> If <varname>position</varname> is negative, the search works
|
|
right-to-left starting that many characters from the right. The
|
|
return value still represents the position starting from the left
|
|
side of <varname>str</varname>.
|
|
<codeblock rev="IMPALA-3973 2.8.0">
|
|
-- Scanning right to left, the first occurrence of 'o'
|
|
-- is at position 8. (8th character from the left.)
|
|
select instr('hello world','o',-1);
|
|
+-------------------------------+
|
|
| instr('hello world', 'o', -1) |
|
|
+-------------------------------+
|
|
| 8 |
|
|
+-------------------------------+
|
|
|
|
</codeblock></li>
|
|
<li> The fourth argument, <varname>occurrence</varname>, lets you
|
|
specify an occurrence other than the first.
|
|
<codeblock rev="IMPALA-3973 2.8.0">
|
|
-- 2nd occurrence of 'b' is at position 9.
|
|
select instr('foo bar bletch', 'b', 1, 2);
|
|
+------------------------------------+
|
|
| instr('foo bar bletch', 'b', 1, 2) |
|
|
+------------------------------------+
|
|
| 9 |
|
|
+------------------------------------+
|
|
</codeblock></li>
|
|
<li> If <varname>occurrence</varname> is greater than the number of
|
|
matching occurrences, the function returns 0. </li>
|
|
<li>
|
|
<varname>occurrence</varname> cannot be negative or zero. A
|
|
non-positive value for this argument causes an error. </li>
|
|
<li> If either of the optional arguments,
|
|
<varname>position</varname> or <varname>occurrence</varname>, is
|
|
<codeph>NULL</codeph>, the function also returns
|
|
<codeph>NULL</codeph>.</li>
|
|
</ul>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="jaro_distance">
|
|
|
|
<dt>
|
|
JARO_DISTANCE(STRING str1, STRING str2), JARO_DST(STRING str1, STRING str2)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the
|
|
<xref
|
|
href="https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Jaro%E2%80%93Winkler_distance.html"
|
|
format="html" scope="external">Jaro
|
|
distance</xref> between two input strings. The Jaro distance is a measure of
|
|
similarity between two strings and is the complementary of
|
|
<codeph>JARO_SIMILARITY()</codeph>, i.e. (1 - <codeph>JARO_SIMILARITY()</codeph>).
|
|
<p>
|
|
<b>Return type:</b> <codeph>DOUBLE</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b>
|
|
</p>
|
|
|
|
<p>
|
|
If the two input strings are identical, the function returns 0.0.
|
|
</p>
|
|
|
|
<p>
|
|
If there is no matching character between the input strings, the function returns
|
|
1.0.
|
|
</p>
|
|
|
|
<p>
|
|
If either input strings is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
If the length of either input string is bigger than 255 characters, the function
|
|
returns an error.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="jaro_similarity">
|
|
|
|
<dt>
|
|
JARO_SIMILARITY(STRING str1, STRING str2), JARO_SIM(STRING str1, STRING str2)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the
|
|
<xref
|
|
href="https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Jaro%E2%80%93Winkler_distance.html"
|
|
format="html" scope="external">Jaro
|
|
similarity</xref> of two strings. The higher the Jaro similarity for two strings is,
|
|
the more similar the strings are.
|
|
<p>
|
|
<b>Return type:</b> <codeph>DOUBLE</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b>
|
|
</p>
|
|
|
|
<p>
|
|
If the two input strings are identical, the function returns 1.0.
|
|
</p>
|
|
|
|
<p>
|
|
If there is no matching character between the input strings, the function returns
|
|
0.0.
|
|
</p>
|
|
|
|
<p>
|
|
If either input strings is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
If the length of either input string is bigger than 255 characters, the function
|
|
returns an error.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="jaro_winkler_distance">
|
|
|
|
<dt>
|
|
JARO_WINKLER_DISTANCE(STRING str1, STRING str2[, DOUBLE scaling_factor, DOUBLE
|
|
boost_threshold]), JW_DST(STRING str1, STRING str2[, DOUBLE scaling_factor, DOUBLE
|
|
boost_threshold])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the
|
|
<xref
|
|
href="https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Jaro%E2%80%93Winkler_distance.html"
|
|
format="html" scope="external">Jaro-Winkler
|
|
distance</xref> of two input strings. It is the complementary of
|
|
<codeph>JARO_WINKLER_SIMILARITY()</codeph>, i.e. 1 -
|
|
<codeph>JARO_WINKLER_SIMILARITY()</codeph>.
|
|
<p>
|
|
<b>Return type:</b> <codeph>DOUBLE</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b>
|
|
</p>
|
|
|
|
<p>
|
|
If the two input strings are identical, the function returns 0.0.
|
|
</p>
|
|
|
|
<p>
|
|
If there is no matching character between the input strings, the function returns
|
|
1.0.
|
|
</p>
|
|
|
|
<p>
|
|
The function returns an error in the following cases:
|
|
<ul>
|
|
<li>
|
|
The length of either input string is bigger than 255 characters.
|
|
</li>
|
|
|
|
<li>
|
|
<varname>scaling_factor</varname> < 0.0 or <varname>scaling_factor</varname>
|
|
> 0.25
|
|
</li>
|
|
|
|
<li>
|
|
<varname>boost_threshold</varname> < 0.0 or
|
|
<varname>boost_threshold</varname> > 1.0
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
If either input strings is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
The default <varname>scaling factor</varname> is 0.1.
|
|
</p>
|
|
|
|
<p>
|
|
The prefix weight will only be applied if the Jaro-distance exceeds the optional
|
|
<varname>boost_threshold</varname>. By default, the
|
|
<varname>boost_threshold</varname> value is 0.7.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="jaro_winkler_similarity">
|
|
|
|
<dt>
|
|
JARO_WINKLER_SIMILARITY(STRING str1, STRING str2[, DOUBLE scaling_factor, DOUBLE
|
|
boost_threshold]), JARO_SIM(STRING str1, STRING str2[, DOUBLE scaling_factor, DOUBLE
|
|
boost_threshold])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the
|
|
<xref
|
|
href="https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Jaro%E2%80%93Winkler_distance.html"
|
|
format="html" scope="external">Jaro-Winkler
|
|
Similarity</xref> between two input strings. The Jaro-Winkler similarity uses a prefix
|
|
weight, specified by <varname>scaling factor</varname>, which gives more favorable
|
|
ratings to strings that match from the beginning for a set prefix length, up to a
|
|
maximum of four characters.
|
|
<p>
|
|
Use Jaro or Jaro-Winkler functions to perform fuzzy matches on relatively short
|
|
strings, e.g. to scrub user inputs of names against the records in the database.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Return type:</b> <codeph>DOUBLE</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b>
|
|
</p>
|
|
|
|
<p>
|
|
If the two input strings are identical, the function returns 1.0.
|
|
</p>
|
|
|
|
<p>
|
|
If there is no matching character between the input strings, the function returns
|
|
0.0.
|
|
</p>
|
|
|
|
<p>
|
|
The function returns an error in the following cases:
|
|
<ul>
|
|
<li>
|
|
The length of either input string is bigger than 255 characters.
|
|
</li>
|
|
|
|
<li>
|
|
<varname>scaling_factor</varname> < 0.0 or <varname>scaling_factor</varname>
|
|
> 0.25
|
|
</li>
|
|
|
|
<li>
|
|
<varname>boost_threshold</varname> < 0.0 or
|
|
<varname>boost_threshold</varname> > 1.0
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>
|
|
If either input strings is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
The default <varname>scaling factor</varname> is 0.1.
|
|
</p>
|
|
|
|
<p>
|
|
The prefix weight will only be applied if the Jaro-similarity exceeds the optional
|
|
<varname>boost_threshold</varname>. By default, the
|
|
<varname>boost_threshold</varname> value is 0.7.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="left">
|
|
|
|
<dt>
|
|
LEFT(STRING a, INT num_chars)
|
|
</dt>
|
|
|
|
<dd>
|
|
See the <codeph>STRLEFT()</codeph> function.
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="length">
|
|
|
|
<dt>
|
|
LENGTH(STRING a)
|
|
</dt>
|
|
|
|
<dd rev="IMPALA-6391 IMPALA-2172">
|
|
<b>Purpose:</b> Returns the length in characters of the argument string, ignoring any
|
|
trailing spaces in <codeph>CHAR</codeph> values.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
When applied to a <codeph>STRING</codeph> value, it returns the same result as the
|
|
<codeph>CHAR_LENGTH()</codeph> function. When applied to a <codeph>CHAR</codeph>
|
|
value, it might return a smaller value than <codeph>CHAR_LENGTH()</codeph> does,
|
|
because <codeph>LENGTH()</codeph> ignores any trailing spaces in the
|
|
<codeph>CHAR</codeph>.
|
|
</p>
|
|
|
|
<note>
|
|
Because the behavior of <codeph>LENGTH()</codeph> with <codeph>CHAR</codeph> values
|
|
containing trailing spaces is not standardized across the industry, when porting
|
|
code from other database systems, evaluate the behavior of <codeph>LENGTH()</codeph>
|
|
on the source system and switch to <codeph>CHAR_LENGTH()</codeph> for Impala if
|
|
necessary.
|
|
</note>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/length_demo"/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
</dl>
|
|
|
|
<dl>
|
|
<dlentry id="levenshtein" rev="3.2">
|
|
|
|
<dt>
|
|
LEVENSHTEIN(STRING str1, STRING str2), <ph id="le_dst"
|
|
>LE_DST(STRING
|
|
str1, STRING str2)</ph>
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the Levenshtein distance between two input strings. The
|
|
Levenshtein distance between two strings is the minimum number of single-character
|
|
edits required to transform one string to other. The function indicates how different
|
|
the input strings are.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
<b>Usage notes:</b>
|
|
<p>
|
|
If input strings are equal, the function returns 0.
|
|
</p>
|
|
|
|
<p>
|
|
If either input exceeds 255 characters, the function returns an error.
|
|
</p>
|
|
|
|
<p>
|
|
If either input string is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
If the length of one input string is zero, the function returns the length of the
|
|
other string.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Example:</b>
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>LEVENSHTEIN ('welcome', 'We come')</codeph> returns 2, first change to
|
|
replace '<codeph>w</codeph>' to '<codeph>W</codeph>', and then to replace
|
|
'<codeph>l</codeph>' to a space character.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
</dl>
|
|
|
|
<dl>
|
|
<dlentry id="locate">
|
|
|
|
<dt>
|
|
LOCATE(STRING substr, STRING str[, INT pos])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the position (starting from 1) of the first occurrence of a
|
|
substring within a longer string, optionally after a particular position.
|
|
<p>
|
|
<b>Return type:</b> <codeph>INT</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="lower">
|
|
|
|
<dt>
|
|
LOWER(STRING a), <ph id="lcase">LCASE(STRING a)</ph>
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string converted to all-lowercase.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p
|
|
conref="../shared/impala_common.xml#common/case_insensitive_comparisons_tip"
|
|
/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="lpad">
|
|
|
|
<dt>
|
|
LPAD(STRING str, INT len, STRING pad)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a string of a specified length, based on the first argument
|
|
string. If the specified string is too short, it is padded on the left with a
|
|
repeating sequence of the characters from the pad string. If the specified string is
|
|
too long, it is truncated on the right.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="ltrim">
|
|
|
|
<dt>
|
|
LTRIM(STRING a [, STRING chars_to_trim])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string with all occurrences of characters
|
|
specified by the second argument removed from the left side. Removes spaces if the
|
|
second argument is not specified.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="parse_url">
|
|
|
|
<dt>
|
|
PARSE_URL(STRING urlString, STRING partToExtract [, STRING keyToExtract])
|
|
</dt>
|
|
|
|
<dd>
|
|
<indexterm audience="hidden">parse_url() function</indexterm>
|
|
<b>Purpose:</b> Returns the portion of a URL corresponding to a specified part. The
|
|
part argument can be <codeph>'PROTOCOL'</codeph>, <codeph>'HOST'</codeph>,
|
|
<codeph>'PATH'</codeph>, <codeph>'REF'</codeph>, <codeph>'AUTHORITY'</codeph>,
|
|
<codeph>'FILE'</codeph>, <codeph>'USERINFO'</codeph>, or <codeph>'QUERY'</codeph>.
|
|
Uppercase is required for these literal values. When requesting the
|
|
<codeph>QUERY</codeph> portion of the URL, you can optionally specify a key to
|
|
retrieve just the associated value from the key-value pairs in the query string.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b> This function is important for the traditional Hadoop use case
|
|
of interpreting web logs. For example, if the web traffic data features raw URLs not
|
|
divided into separate table columns, you can count visitors to a particular page by
|
|
extracting the <codeph>'PATH'</codeph> or <codeph>'FILE'</codeph> field, or analyze
|
|
search terms by extracting the corresponding key from the <codeph>'QUERY'</codeph>
|
|
field.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="prettyprint_bytes">
|
|
|
|
<dt>
|
|
PRETTYPRINT_BYTES(TINYINT / SMALLINT / INT / BIGINT bytes)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Formats numeric bytes in a human readable manner.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Examples:</b>
|
|
</p>
|
|
|
|
<p>
|
|
<codeblock>
|
|
select prettyprint_bytes(1), prettyprint_bytes(12345678), prettyprint_bytes(1234567890), prettyprint_bytes(1234567890123);
|
|
|
|
+-----------------------+------------------------------+--------------------------------+-------------------------------+
|
|
| prettyprint_bytes(1) | prettyprint_bytes(12345678) | prettyprint_bytes(1234567890) | prettyprint_bytes(1234567890123) |
|
|
+----------------------+-----------------------------+-------------------------------+----------------------------------+
|
|
| 1.00 B | 11.77 MB | 1.15 GB | 1149.78 GB |
|
|
+----------------------+-----------------------------+-------------------------------+----------------------------------+
|
|
</codeblock>
|
|
</p>
|
|
</dd>
|
|
</dlentry>
|
|
|
|
<dlentry rev="3.0" id="regexp_escape">
|
|
|
|
<dt>
|
|
REGEXP_ESCAPE(STRING source)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> The <codeph>REGEXP_ESCAPE()</codeph> function returns a string escaped
|
|
for the special character in RE2 library so that the special characters are
|
|
interpreted literally rather than as special characters. The following special
|
|
characters are escaped by the function:
|
|
<codeblock><![CDATA[.\+*?[^]$(){}=!<>|:-]]></codeblock>
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_escapes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
This example shows escaping one of special characters in RE2.
|
|
</p>
|
|
<codeblock>
|
|
+------------------------------------------------------+
|
|
| regexp_escape('Hello.world') |
|
|
+------------------------------------------------------+
|
|
| Hello\.world |
|
|
+------------------------------------------------------+
|
|
</codeblock>
|
|
<p>
|
|
This example shows escaping all the special characters in RE2.
|
|
</p>
|
|
<codeblock><![CDATA[
|
|
+------------------------------------------------------------+
|
|
| regexp_escape('a.b\\c+d*e?f[g]h$i(j)k{l}m=n!o<p>q|r:s-t') |
|
|
+------------------------------------------------------------+
|
|
| a\.b\\c\+d\*e\?f\[g\]h\$i\(j\)k\{l\}m\=n\!o\<p\>q\|r\:s\-t |
|
|
+------------------------------------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="regexp_extract">
|
|
|
|
<dt>
|
|
REGEXP_EXTRACT(STRING subject, STRING pattern, INT index)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the specified () group from a string based on a regular
|
|
expression pattern. Group 0 refers to the entire extracted string, while group 1, 2,
|
|
and so on refers to the first, second, and so on <codeph>(...)</codeph> portion.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_escapes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
This example shows how group 0 matches the full pattern string, including the
|
|
portion outside any <codeph>()</codeph> group:
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
|
|
+------------------------------------------------------+
|
|
| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
|
|
+------------------------------------------------------+
|
|
| abcdef123ghi456 |
|
|
+------------------------------------------------------+
|
|
Returned 1 row(s) in 0.11s</codeblock>
|
|
<p>
|
|
This example shows how group 1 matches just the contents inside the first
|
|
<codeph>()</codeph> group in the pattern string:
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
|
|
+------------------------------------------------------+
|
|
| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
|
|
+------------------------------------------------------+
|
|
| 456 |
|
|
+------------------------------------------------------+
|
|
Returned 1 row(s) in 0.11s</codeblock>
|
|
<p rev="2.0.0">
|
|
Unlike in earlier Impala releases, the regular expression library used in Impala 2.0
|
|
and later supports the <codeph>.*?</codeph> idiom for non-greedy matches. This
|
|
example shows how a pattern string starting with <codeph>.*?</codeph> matches the
|
|
shortest possible portion of the source string, returning the rightmost set of
|
|
lowercase letters. A pattern string both starting and ending with
|
|
<codeph>.*?</codeph> finds two potential matches of equal length, and returns the
|
|
first one found (the leftmost set of lowercase letters).
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
|
|
+--------------------------------------------------------+
|
|
| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
|
|
+--------------------------------------------------------+
|
|
| def |
|
|
+--------------------------------------------------------+
|
|
[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
|
|
+-----------------------------------------------------------+
|
|
| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
|
|
+-----------------------------------------------------------+
|
|
| bcd |
|
|
+-----------------------------------------------------------+
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="2.3.0" id="regexp_like">
|
|
|
|
<dt>
|
|
REGEXP_LIKE(STRING source, STRING pattern[, STRING options])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns <codeph>true</codeph> or <codeph>false</codeph> to indicate
|
|
whether the source string contains anywhere inside it the regular expression given by
|
|
the pattern. The optional third argument consists of letter flags that change how the
|
|
match is performed, such as <codeph>i</codeph> for case-insensitive matching.
|
|
<p
|
|
conref="../shared/impala_common.xml#common/syntax_blurb"/>
|
|
|
|
<p>
|
|
The flags that you can include in the optional third argument are:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<codeph>c</codeph>: Case-sensitive matching (the default).
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>i</codeph>: Case-insensitive matching. If multiple instances of
|
|
<codeph>c</codeph> and <codeph>i</codeph> are included in the third argument, the
|
|
last such option takes precedence.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>m</codeph>: Multi-line matching. The <codeph>^</codeph> and
|
|
<codeph>$</codeph> operators match the start or end of any line within the source
|
|
string, not the start and end of the entire string.
|
|
</li>
|
|
|
|
<li>
|
|
<codeph>n</codeph>: Newline matching. The <codeph>.</codeph> operator can match
|
|
the newline character. A repetition operator such as <codeph>.*</codeph> can match
|
|
a portion of the source string that spans multiple lines.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
<b>Return type:</b> <codeph>BOOLEAN</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_escapes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
This example shows how <codeph>regexp_like()</codeph> can test for the existence of
|
|
various kinds of regular expression patterns within a source string:
|
|
</p>
|
|
<codeblock><![CDATA[
|
|
-- Matches because the 'f' appears somewhere in 'foo'.
|
|
select regexp_like('foo','f');
|
|
+-------------------------+
|
|
| regexp_like('foo', 'f') |
|
|
+-------------------------+
|
|
| true |
|
|
+-------------------------+
|
|
|
|
-- Does not match because the comparison is case-sensitive by default.
|
|
select regexp_like('foo','F');
|
|
+-------------------------+
|
|
| regexp_like('foo', 'f') |
|
|
+-------------------------+
|
|
| false |
|
|
+-------------------------+
|
|
|
|
-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive.
|
|
select regexp_like('foo','F','i');
|
|
+------------------------------+
|
|
| regexp_like('foo', 'f', 'i') |
|
|
+------------------------------+
|
|
| true |
|
|
+------------------------------+
|
|
|
|
-- The familiar regular expression notations work, such as ^ and $ anchors...
|
|
select regexp_like('foo','f$');
|
|
+--------------------------+
|
|
| regexp_like('foo', 'f$') |
|
|
+--------------------------+
|
|
| false |
|
|
+--------------------------+
|
|
|
|
select regexp_like('foo','o$');
|
|
+--------------------------+
|
|
| regexp_like('foo', 'o$') |
|
|
+--------------------------+
|
|
| true |
|
|
+--------------------------+
|
|
|
|
-- ...and repetition operators such as * and +
|
|
select regexp_like('foooooobar','fo+b');
|
|
+-----------------------------------+
|
|
| regexp_like('foooooobar', 'fo+b') |
|
|
+-----------------------------------+
|
|
| true |
|
|
+-----------------------------------+
|
|
|
|
select regexp_like('foooooobar','fx*y*o*b');
|
|
+---------------------------------------+
|
|
| regexp_like('foooooobar', 'fx*y*o*b') |
|
|
+---------------------------------------+
|
|
| true |
|
|
+---------------------------------------+
|
|
]]>
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="regexp_replace">
|
|
|
|
<dt>
|
|
REGEXP_REPLACE(STRING initial, STRING pattern, STRING replacement)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the initial argument with the regular expression pattern
|
|
replaced by the final argument string.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_re2_warning"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/regexp_escapes"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
|
|
<p>
|
|
These examples show how you can replace parts of a string matching a pattern with
|
|
replacement text, which can include backreferences to any <codeph>()</codeph> groups
|
|
in the pattern string. The backreference numbers start at 1, and any
|
|
<codeph>\</codeph> characters must be escaped as <codeph>\\</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
Replace a character pattern with new text:
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_replace('aaabbbaaa','b+','xyz');
|
|
+------------------------------------------+
|
|
| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
|
|
+------------------------------------------+
|
|
| aaaxyzaaa |
|
|
+------------------------------------------+
|
|
Returned 1 row(s) in 0.11s</codeblock>
|
|
<p>
|
|
Replace a character pattern with substitution text that includes the original
|
|
matching text:
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_replace('aaabbbaaa','(b+)','<\\1>');
|
|
+----------------------------------------------+
|
|
| regexp_replace('aaabbbaaa', '(b+)', '<\\1>') |
|
|
+----------------------------------------------+
|
|
| aaa<bbb>aaa |
|
|
+----------------------------------------------+
|
|
Returned 1 row(s) in 0.11s</codeblock>
|
|
<p>
|
|
Remove all characters that are not digits:
|
|
</p>
|
|
<codeblock>[localhost:21000] > select regexp_replace('123-456-789','[^[:digit:]]','');
|
|
+---------------------------------------------------+
|
|
| regexp_replace('123-456-789', '[^[:digit:]]', '') |
|
|
+---------------------------------------------------+
|
|
| 123456789 |
|
|
+---------------------------------------------------+
|
|
Returned 1 row(s) in 0.12s</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="repeat">
|
|
|
|
<dt>
|
|
REPEAT(STRING str, INT n)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string repeated a specified number of times.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="replace" rev="2.9.0 IMPALA-4729">
|
|
|
|
<dt>
|
|
REPLACE(STRING initial, STRING target, STRING replacement)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the initial argument with all occurrences of the target string
|
|
replaced by the replacement string.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p>
|
|
Because this function does not use any regular expression patterns, it is typically
|
|
faster than <codeph>regexp_replace()</codeph> for simple string substitutions.
|
|
</p>
|
|
|
|
<p>
|
|
If any argument is <codeph>NULL</codeph>, the return value is <codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
Matching is case-sensitive.
|
|
</p>
|
|
|
|
<p>
|
|
If the replacement string contains another instance of the target string, the
|
|
expansion is only performed once, instead of applying again to the newly constructed
|
|
string.
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/added_in_290"/>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"/>
|
|
<codeblock>-- Replace one string with another.
|
|
select replace('hello world','world','earth');
|
|
+------------------------------------------+
|
|
| replace('hello world', 'world', 'earth') |
|
|
+------------------------------------------+
|
|
| hello earth |
|
|
+------------------------------------------+
|
|
|
|
-- All occurrences of the target string are replaced.
|
|
select replace('hello world','o','0');
|
|
+----------------------------------+
|
|
| replace('hello world', 'o', '0') |
|
|
+----------------------------------+
|
|
| hell0 w0rld |
|
|
+----------------------------------+
|
|
|
|
-- If no match is found, the original string is returned unchanged.
|
|
select replace('hello world','xyz','abc');
|
|
+--------------------------------------+
|
|
| replace('hello world', 'xyz', 'abc') |
|
|
+--------------------------------------+
|
|
| hello world |
|
|
+--------------------------------------+
|
|
</codeblock>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="reverse">
|
|
|
|
<dt>
|
|
REVERSE(STRING a)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string with characters in reversed order.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="right">
|
|
|
|
<dt>
|
|
RIGHT(STRING a, INT num_chars)
|
|
</dt>
|
|
|
|
<dd>
|
|
See the <codeph>STRRIGHT()</codeph> function.
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="rpad">
|
|
|
|
<dt>
|
|
RPAD(STRING str, INT len, STRING pad)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a string of a specified length, based on the first argument
|
|
string. If the specified string is too short, it is padded on the right with a
|
|
repeating sequence of the characters from the pad string. If the specified string is
|
|
too long, it is truncated on the right.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="rtrim">
|
|
|
|
<dt>
|
|
RTRIM(STRING a [, STRING chars_to_trim])
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string with all occurrences of characters
|
|
specified by the second argument removed from the right side. Removes spaces if the
|
|
second argument is not specified.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="space">
|
|
|
|
<dt>
|
|
SPACE(INT n)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns a concatenated string of the specified number of spaces.
|
|
Shorthand for <codeph>REPEAT(' ',<varname>n</varname>)</codeph>.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry rev="2.3.0 IMPALA-2084" id="split_part">
|
|
|
|
<dt>
|
|
SPLIT_PART(STRING source, STRING delimiter, BIGINT index)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the requested <codeph>index</codeph>th part of the input
|
|
<varname>source</varname> string split by the <varname>delimiter</varname>.
|
|
<ul>
|
|
<li>
|
|
If <varname>index</varname> is a positive number, returns the
|
|
<varname>index</varname>th part from the left within the <varname>source</varname>
|
|
string.
|
|
</li>
|
|
|
|
<li>
|
|
If <varname>index</varname> is a negative number, returns the
|
|
<varname>index</varname>th part from the right within the
|
|
<varname>source</varname> string.
|
|
</li>
|
|
|
|
<li>
|
|
If <varname>index</varname> is 0, returns an error.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
The <varname>delimiter</varname> can consist of multiple characters, not just a
|
|
single character.
|
|
</p>
|
|
|
|
<p>
|
|
All matching of the delimiter is done exactly, not using any regular expression
|
|
patterns.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/example_blurb"
|
|
/>
|
|
|
|
<p>
|
|
<codeph>SPLIT_PART('x,y,z',',',2)</codeph> returns <codeph>'y'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>SPLIT_PART('one***two***three','***',2)</codeph> returns
|
|
<codeph>'two'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>SPLIT_PART('abc@@def@@ghi', '@@', 3)</codeph> returns
|
|
<codeph>'ghi'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>SPLIT_PART('abc@@def@@ghi', '@@', -3)</codeph> returns
|
|
<codeph>'abc'</codeph>.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="strleft">
|
|
|
|
<dt>
|
|
STRLEFT(STRING a, INT num_chars)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the leftmost characters of the string. Shorthand for a call to
|
|
<codeph>SUBSTR()</codeph> with 2 arguments.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="strright">
|
|
|
|
<dt>
|
|
STRRIGHT(STRING a, INT num_chars)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the rightmost characters of the string. Shorthand for a call
|
|
to <codeph>SUBSTR()</codeph> with 2 arguments.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="substr">
|
|
|
|
<dt>
|
|
SUBSTR(STRING a, INT start [, INT len]),
|
|
<ph id="substring"
|
|
>SUBSTRING(STRING a, INT start [, INT len])</ph>
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the portion of the string starting at a specified point,
|
|
optionally with a specified maximum length. The characters in the string are indexed
|
|
starting at 1.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="translate">
|
|
|
|
<dt>
|
|
TRANSLATE(STRING input, STRING from, STRING to)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the <codeph>input</codeph> string with each character in the
|
|
<codeph>from</codeph> argument replaced with the corresponding character in the
|
|
<codeph>to</codeph> argument. The characters are matched in the order they appear in
|
|
<codeph>from</codeph> and <codeph>to</codeph>.
|
|
<p>
|
|
For example: <codeph>translate ('hello world','world','earth')</codeph> returns
|
|
<codeph>'hetta earth'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p>
|
|
<b>Usage notes:</b>
|
|
</p>
|
|
|
|
<p>
|
|
If <codeph>from</codeph> contains more characters than <codeph>to</codeph>, the
|
|
<codeph>from</codeph> characters that are beyond the length of <codeph>to</codeph>
|
|
are removed in the result.
|
|
</p>
|
|
|
|
<p>
|
|
For example:
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>translate('abcdedg', 'bcd', '1')</codeph> returns <codeph>'a1eg'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<codeph>translate('Unit Number#2', '# ', '_')</codeph> returns
|
|
<codeph>'UnitNumber_2'</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
If <codeph>from</codeph> is <codeph>NULL</codeph>, the function returns
|
|
<codeph>NULL</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
If <codeph>to</codeph> contains more characters than <codeph>from</codeph>, the
|
|
extra characters in <codeph>to</codeph> are ignored.
|
|
</p>
|
|
|
|
<p>
|
|
If <codeph>from</codeph> contains duplicate characters, the duplicate character is
|
|
replaced with the first matching character in <codeph>to</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
For example: <codeph>translate ('hello','ll','67')</codeph> returns
|
|
<codeph>'he66o'</codeph>.
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="trim">
|
|
|
|
<dt>
|
|
TRIM(STRING a), TRIM({{LEADING | TRAILING | BOTH} [STRING chars_to_trim] | STRING chars_to_trim} FROM STRING a)
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the input string with both leading and trailing spaces
|
|
removed. The same as passing the string through both <codeph>LTRIM()</codeph> and
|
|
<codeph>RTRIM()</codeph>.
|
|
<p>
|
|
<b>Usage notes:</b> Often used during data cleansing operations during the ETL
|
|
cycle, if input values might still have surrounding spaces. For a more
|
|
general-purpose function that can remove other leading and trailing characters
|
|
besides spaces, see <codeph>BTRIM()</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
TRIM-FROM syntax is a SQL-standardized wrapper around
|
|
<codeph>LTRIM</codeph> / <codeph>RTRIM</codeph> / <codeph>BTRIM</codeph>.
|
|
Depending on the first parameter <codeph>LEADING</codeph> /
|
|
<codeph>TRAILING</codeph> / <codeph>BOTH</codeph>, wrapper resolves itself to
|
|
the corresponding underlying function. Default value is <codeph>BOTH</codeph>.
|
|
If present, the <codeph>chars_to_trim</codeph> parameter is passed forward to
|
|
the underlying function. Thus, given syntax may come in three different forms:
|
|
</p>
|
|
|
|
<p>
|
|
Syntax #1: <codeph>TRIM(<where> FROM <string>)</codeph> -->
|
|
<codeph>{L|R|B}TRIM(string)</codeph>;
|
|
</p>
|
|
|
|
<p>
|
|
Syntax #2: <codeph>TRIM(<charset> FROM <string>)</codeph> -->
|
|
<codeph>BTRIM(string, charset)</codeph>;
|
|
</p>
|
|
|
|
<p>
|
|
Syntax #3: <codeph>TRIM(<where> <charset> FROM <string>)</codeph>
|
|
--><codeph>{L|R|B}TRIM(string, charset)</codeph>.
|
|
</p>
|
|
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
|
|
<dlentry id="upper">
|
|
|
|
<dt>
|
|
UPPER(STRING a), <ph id="ucase">UCASE(STRING a)</ph>
|
|
</dt>
|
|
|
|
<dd>
|
|
<b>Purpose:</b> Returns the argument string converted to all-uppercase.
|
|
<p>
|
|
<b>Return type:</b> <codeph>STRING</codeph>
|
|
</p>
|
|
|
|
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
|
|
|
|
<p
|
|
conref="../shared/impala_common.xml#common/case_insensitive_comparisons_tip"
|
|
/>
|
|
</dd>
|
|
|
|
</dlentry>
|
|
</dl>
|
|
|
|
</conbody>
|
|
|
|
</concept>
|