String functions are classified as those primarily accepting or returning STRING,
VARCHAR, or CHAR data types, for example to measure the length of a string
or concatenate two strings together.
-
All the functions that accept STRING arguments also accept the VARCHAR
and CHAR types introduced in Impala 2.0.
-
Whenever VARCHAR or CHAR values are passed to a function that returns a
string value, the return type is normalized to STRING. For example, a call to
concat() with a mix of STRING, VARCHAR, and
CHAR arguments produces a STRING result.
The string functions operate mainly on these data types: ,
, and .
Function reference:
Impala supports the following string functions:
-
ascii(string str)
-
ascii() function
Purpose: Returns the numeric ASCII code of the first character of the argument.
Return type: int
-
base64decode(string str)
-
base64decode() function
Purpose:
Return type: string
For general information about Base64 encoding, see
.
-
base64encode(string str)
-
base64encode() function
Purpose:
Return type: string
For general information about Base64 encoding, see
.
-
btrim(string a),
btrim(string a, string chars_to_trim)
-
btrim() function
Purpose: Removes all instances of one or more characters
from the start and end of a STRING value.
By default, removes only spaces.
If a non-NULL optional second argument is specified, the function removes all
occurrences of characters in that second argument from the beginning and
end of the string.
Return type: string
The following examples show the default btrim() behavior,
and what changes when you specify the optional second argument.
All the examples bracket the output value with [ ]
so that you can see any leading or trailing spaces in the btrim() result.
By default, the function removes and number of both leading and trailing spaces.
When the second argument is specified, any number of occurrences of any
character in the second argument are removed from the start and end of the
input string; in this case, spaces are not removed (unless they are part of the second
argument) and any instances of the characters are not removed if they do not come
right at the beginning or end of the string.
-- Remove multiple spaces before and one space after.
select concat('[',btrim(' hello '),']');
+---------------------------------------+
| concat('[', btrim(' hello '), ']') |
+---------------------------------------+
| [hello] |
+---------------------------------------+
-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
select concat('[',btrim('xy hello zyzzxx','xyz'),']');
+------------------------------------------------------+
| concat('[', btrim('xy hello zyzzxx', 'xyz'), ']') |
+------------------------------------------------------+
| [ hello ] |
+------------------------------------------------------+
-- Remove any instances of x or y or z at beginning or end.
-- Leave x, y, z alone in the middle of the string.
select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
+----------------------------------------------------+
| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
+----------------------------------------------------+
| [helxyzlo] |
+----------------------------------------------------+
-
char_length(string a), character_length(string a)
-
char_length() function
character_length() function
Purpose: Returns the length in characters of the argument string, including any
trailing spaces that pad a CHAR value.
Return type: int
When applied to a STRING value, it returns the
same result as the length() function. When applied
to a CHAR value, it might return a larger value
than length() does, to account for trailing spaces
in the CHAR.
-
chr(int character_code)
-
chr() function
Purpose: Returns a character specified by a decimal code point value.
The interpretation and display of the resulting character depends on your system locale.
Because consistent processing of Impala string values is only guaranteed
for values within the ASCII range, only use this function for values
corresponding to ASCII characters.
In particular, parameter values greater than 255 return an empty string.
Return type: string
Usage notes: Can be used as the inverse of the ascii() function, which
converts a character to its numeric ASCII code.
SELECT chr(65);
+---------+
| chr(65) |
+---------+
| A |
+---------+
SELECT chr(97);
+---------+
| chr(97) |
+---------+
| a |
+---------+
-
concat(string a, string b...)
-
concat() function
Purpose: Returns a single string representing all the argument values joined together.
Return type: string
-
concat_ws(string sep, string a, string b...)
-
concat_ws() function
Purpose: Returns a single string representing the second and following argument values joined
together, delimited by a specified separator.
Return type: string
-
find_in_set(string str, string strList)
-
find_in_set() function
Purpose: Returns the position (starting from 1) of the first occurrence of a specified string
within a comma-separated string. Returns NULL if either argument is
NULL, 0 if the search string is not found, or 0 if the search string contains a comma.
Return type: int
-
group_concat(string s [, string sep])
-
group_concat() function
Purpose: Returns a single string representing the argument value concatenated together for each
row of the result set. If the optional separator string is specified, the separator is added between each
pair of concatenated values.
Return type: string
By default, returns a single string covering the whole result set. To include other columns or values
in the result set, or to produce multiple concatenated strings for subsets of rows, include a
GROUP BY clause in the query.
Strictly speaking, group_concat() is an aggregate function, not a scalar
function like the others in this list.
For additional details and examples, see .
-
initcap(string str)
-
initcap() function
Purpose: Returns the input string with the first letter capitalized.
Return type: string
-
instr(string str, string substr [, bigint position [, bigint occurrence ] ])
-
instr() function
Purpose: Returns the position (starting from 1) of the first occurrence of a substring within a
longer string.
Return type: int
If the substring is not present in the string, the function returns 0:
select instr('foo bar bletch', 'z');
+------------------------------+
| instr('foo bar bletch', 'z') |
+------------------------------+
| 0 |
+------------------------------+
The optional third and fourth arguments let you find instances of the substring
other than the first instance starting from the left:
-
The third argument lets you specify a starting point within the string
other than 1:
-- Restricting the search to positions 7..end,
-- the first occurrence of 'b' is at position 9.
select instr('foo bar bletch', 'b', 7);
+---------------------------------+
| instr('foo bar bletch', 'b', 7) |
+---------------------------------+
| 9 |
+---------------------------------+
-- If there are no more occurrences after the
-- specified position, the result is 0.
select instr('foo bar bletch', 'b', 10);
+----------------------------------+
| instr('foo bar bletch', 'b', 10) |
+----------------------------------+
| 0 |
+----------------------------------+
If the third argument is negative, the search works right-to-left
starting that many characters from the right. The return value still
represents the position starting from the left side of the string.
-- Scanning right to left, the first occurrence of 'o'
-- is at position 8. (8th character from the left.)
select instr('hello world','o',-1);
+-------------------------------+
| instr('hello world', 'o', -1) |
+-------------------------------+
| 8 |
+-------------------------------+
-- Scanning right to left, starting from the 6th character
-- from the right, the first occurrence of 'o' is at
-- position 5 (5th character from the left).
select instr('hello world','o',-6);
+-------------------------------+
| instr('hello world', 'o', -6) |
+-------------------------------+
| 5 |
+-------------------------------+
-- If there are no more occurrences after the
-- specified position, the result is 0.
select instr('hello world','o',-10);
+--------------------------------+
| instr('hello world', 'o', -10) |
+--------------------------------+
| 0 |
+--------------------------------+
-
The fourth argument lets you specify an occurrence other than the first:
-- 2nd occurrence of 'b' is at position 9.
select instr('foo bar bletch', 'b', 1, 2);
+------------------------------------+
| instr('foo bar bletch', 'b', 1, 2) |
+------------------------------------+
| 9 |
+------------------------------------+
-- Negative position argument means scan right-to-left.
-- This example finds second instance of 'b' from the right.
select instr('foo bar bletch', 'b', -1, 2);
+-------------------------------------+
| instr('foo bar bletch', 'b', -1, 2) |
+-------------------------------------+
| 5 |
+-------------------------------------+
If the fourth argument is greater than the number of matching occurrences,
the function returns 0:
-- There is no 3rd occurrence within the string.
select instr('foo bar bletch', 'b', 1, 3);
+------------------------------------+
| instr('foo bar bletch', 'b', 1, 3) |
+------------------------------------+
| 0 |
+------------------------------------+
-- There is not even 1 occurrence when scanning
-- the string starting at position 10.
select instr('foo bar bletch', 'b', 10, 1);
+-------------------------------------+
| instr('foo bar bletch', 'b', 10, 1) |
+-------------------------------------+
| 0 |
+-------------------------------------+
The fourth argument cannot be negative or zero. A non-positive value for
this argument causes an error:
select instr('foo bar bletch', 'b', 1, 0);
ERROR: UDF ERROR: Invalid occurrence parameter to instr function: 0
select instr('aaaaaaaaa','aa', 1, -1);
ERROR: UDF ERROR: Invalid occurrence parameter to instr function: -1
-
If either of the optional arguments is NULL,
the function also returns NULL:
select instr('foo bar bletch', 'b', null);
+------------------------------------+
| instr('foo bar bletch', 'b', null) |
+------------------------------------+
| NULL |
+------------------------------------+
select instr('foo bar bletch', 'b', 1, null);
+---------------------------------------+
| instr('foo bar bletch', 'b', 1, null) |
+---------------------------------------+
| NULL |
+---------------------------------------+
-
left(string a, int num_chars)
-
See the strleft function.
-
length(string a)
-
length() function
Purpose: Returns the length in characters of the argument string,
ignoring any trailing spaces in CHAR values.
Return type: int
When applied to a STRING value, it returns the
same result as the char_length() function. When applied
to a CHAR value, it might return a smaller value
than char_length() does, because length()
ignores any trailing spaces in the CHAR.
Because the behavior of length() with CHAR
values containing trailing spaces is not standardized across the industry,
when porting code from other database systems, evaluate the behavior of
length() on the source system and switch to
char_length() for Impala if necessary.
-
locate(string substr, string str[, int pos])
-
locate() function
Purpose: Returns the position (starting from 1) of the first occurrence of a substring within a
longer string, optionally after a particular position.
Return type: int
-
lower(string a), lcase(string a)
-
lower() function
Purpose: Returns the argument string converted to all-lowercase.
Return type: string
-
lpad(string str, int len, string pad)
-
lpad() function
Purpose: Returns a string of a specified length, based on the first argument string. If the
specified string is too short, it is padded on the left with a repeating sequence of the characters from
the pad string. If the specified string is too long, it is truncated on the right.
Return type: string
-
ltrim(string a [, string chars_to_trim])
-
ltrim() function
Purpose: Returns the argument string with all occurrences
of characters specified by the second argument removed from
the left side. Removes spaces if the second argument is not specified.
Return type: string
-
parse_url(string urlString, string partToExtract [, string keyToExtract])
-
parse_url() function
Purpose: Returns the portion of a URL corresponding to a specified part. The part argument can be
'PROTOCOL', 'HOST', 'PATH', 'REF',
'AUTHORITY', 'FILE', 'USERINFO', or
'QUERY'. Uppercase is required for these literal values. When requesting the
QUERY portion of the URL, you can optionally specify a key to retrieve just the
associated value from the key-value pairs in the query string.
Return type: string
Usage notes: This function is important for the traditional Hadoop use case of interpreting web
logs. For example, if the web traffic data features raw URLs not divided into separate table columns,
you can count visitors to a particular page by extracting the 'PATH' or
'FILE' field, or analyze search terms by extracting the corresponding key from the
'QUERY' field.
-
regexp_escape(string source)
-
Purpose: The regexp_escape function returns
a string escaped for the special character in RE2 library so that the
special characters are interpreted literally rather than as special
characters. The following special characters are escaped by the
function:
|:-]]>
Return type:
string
This example shows escaping one of special characters in RE2.
+------------------------------------------------------+
| regexp_escape('Hello.world') |
+------------------------------------------------------+
| Hello\.world |
+------------------------------------------------------+
This example shows escaping all the special characters in RE2.
q|r:s-t') |
+------------------------------------------------------------+
| a\.b\\c\+d\*e\?f\[g\]h\$i\(j\)k\{l\}m\=n\!o\q\|r\:s\-t |
+------------------------------------------------------------+
]]>
-
regexp_like(string source, string pattern[, string options])
-
regexp_like() function
Purpose: Returns true or false to indicate
whether the source string contains anywhere inside it the regular expression given by the pattern.
The optional third argument consists of letter flags that change how the match is performed,
such as i for case-insensitive matching.
The flags that you can include in the optional third argument are:
-
c: Case-sensitive matching (the default).
-
i: Case-insensitive matching. If multiple instances of c and i
are included in the third argument, the last such option takes precedence.
-
m: Multi-line matching. The ^ and $
operators match the start or end of any line within the source string, not the
start and end of the entire string.
-
n: Newline matching. The . operator can match the
newline character. A repetition operator such as .* can
match a portion of the source string that spans multiple lines.
Return type: boolean
This example shows how regexp_like() can test for the existence
of various kinds of regular expression patterns within a source string:
-
regexp_replace(string initial, string pattern, string replacement)
-
regexp_replace() function
Purpose: Returns the initial argument with the regular expression pattern replaced by the final
argument string.
Return type: string
These examples show how you can replace parts of a string matching a pattern with replacement text,
which can include backreferences to any () groups in the pattern string. The
backreference numbers start at 1, and any \ characters must be escaped as
\\.
Replace a character pattern with new text:
[localhost:21000] > select regexp_replace('aaabbbaaa','b+','xyz');
+------------------------------------------+
| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
+------------------------------------------+
| aaaxyzaaa |
+------------------------------------------+
Returned 1 row(s) in 0.11s
Replace a character pattern with substitution text that includes the original matching text:
[localhost:21000] > select regexp_replace('aaabbbaaa','(b+)','<\\1>');
+----------------------------------------------+
| regexp_replace('aaabbbaaa', '(b+)', '<\\1>') |
+----------------------------------------------+
| aaa<bbb>aaa |
+----------------------------------------------+
Returned 1 row(s) in 0.11s
Remove all characters that are not digits:
[localhost:21000] > select regexp_replace('123-456-789','[^[:digit:]]','');
+---------------------------------------------------+
| regexp_replace('123-456-789', '[^[:digit:]]', '') |
+---------------------------------------------------+
| 123456789 |
+---------------------------------------------------+
Returned 1 row(s) in 0.12s
-
repeat(string str, int n)
-
repeat() function
Purpose: Returns the argument string repeated a specified number of times.
Return type: string
-
replace(string initial, string target, string replacement)
-
replace() function
Purpose: Returns the initial argument with all occurrences of the target string
replaced by the replacement string.
Return type: string
Because this function does not use any regular expression patterns, it is typically faster
than regexp_replace() for simple string substitutions.
If any argument is NULL, the return value is NULL.
Matching is case-sensitive.
If the replacement string contains another instance of the target
string, the expansion is only performed once, instead of
applying again to the newly constructed string.
-- Replace one string with another.
select replace('hello world','world','earth');
+------------------------------------------+
| replace('hello world', 'world', 'earth') |
+------------------------------------------+
| hello earth |
+------------------------------------------+
-- All occurrences of the target string are replaced.
select replace('hello world','o','0');
+----------------------------------+
| replace('hello world', 'o', '0') |
+----------------------------------+
| hell0 w0rld |
+----------------------------------+
-- If no match is found, the original string is returned unchanged.
select replace('hello world','xyz','abc');
+--------------------------------------+
| replace('hello world', 'xyz', 'abc') |
+--------------------------------------+
| hello world |
+--------------------------------------+
-
reverse(string a)
-
reverse() function
Purpose: Returns the argument string with characters in reversed order.
Return type: string
-
right(string a, int num_chars)
-
See the strright function.
-
rpad(string str, int len, string pad)
-
rpad() function
Purpose: Returns a string of a specified length, based on the first argument string. If the
specified string is too short, it is padded on the right with a repeating sequence of the characters from
the pad string. If the specified string is too long, it is truncated on the right.
Return type: string
-
rtrim(string a [, string chars_to_trim])
-
rtrim() function
Purpose: Returns the argument string with all occurrences
of characters specified by the second argument removed from
the right side. Removes spaces if the second argument is not specified.
Return type: string
-
space(int n)
-
space() function
Purpose: Returns a concatenated string of the specified number of spaces. Shorthand for
repeat(' ',n).
Return type: string
-
split_part(string source, string delimiter, bigint n)
-
split_part() function
Purpose: Returns the nth field within a delimited string. The
fields are numbered starting from 1. The delimiter can consist of
multiple characters, not just a single character. All matching of the
delimiter is done exactly, not using any regular expression patterns.
Return type: string
These examples show how to retrieve the nth field from a delimited
string:
These examples show what happens for out-of-range field positions.
Specifying a value less than 1 produces an error. Specifying a value
greater than the number of fields returns a zero-length string
(which is not the same as NULL).
These examples show how the delimiter can be a multi-character value:
-
strleft(string a, int num_chars)
-
strleft() function
Purpose: Returns the leftmost characters of the string. Shorthand for a call to
substr() with 2 arguments.
Return type: string
-
strright(string a, int num_chars)
-
strright() function
Purpose: Returns the rightmost characters of the string. Shorthand for a call to
substr() with 2 arguments.
Return type: string
-
substr(string a, int start [, int len]), substring(string a, int start [, int
len])
-
substr() function
Purpose: Returns the portion of the string starting at a specified point, optionally with a
specified maximum length. The characters in the string are indexed starting at 1.
Return type: string
-
translate(string input, string from, string to)
-
translate() function
Purpose: Returns the input string with a set of characters replaced by another set of characters.
Return type: string
-
trim(string a)
-
trim() function
Purpose: Returns the input string with both leading and trailing spaces removed. The same as
passing the string through both ltrim() and rtrim().
Usage notes: Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces.
For a more general-purpose function that can remove other leading and trailing characters besides spaces, see btrim().
Return type: string
-
upper(string a), ucase(string a)
-
upper() function
ucase() function
Purpose: Returns the argument string converted to all-uppercase.
Return type: string