mirror of
https://github.com/apache/impala.git
synced 2026-01-05 12:01:11 -05:00
This patch implements a new built-in function
regexp_match_count. This function returns the number of
matching occurrences in input.
The regexp_match_count() function has the following syntax:
int = regexp_match_count(string input, string pattern)
int = regexp_match_count(string input, string pattern,
int start_pos, string flags)
The input value specifies the string on which the regular
expression is processed.
The pattern value specifies the regular expression.
The start_pos value specifies the character position
at which to start the search for a match. It is set
to 1 by default if it's not specified.
The flags value (if specified) dictates the behavior of
the regular expression matcher:
m: Specifies that the input data might contain more than
one line so that the '^' and the '$' matches should take
that into account.
i: Specifies that the regex matcher is case insensitive.
c: Specifies that the regex matcher is case sensitive.
n: Specifies that the '.' character matches newlines.
By default, the flag value is set to 'c'. Note that the
flags are consistent with other existing built-in functions
(e.g. regexp_like) so certain flags in IBM netezza such as
's' are not supported to avoid confusion.
Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5
Reviewed-on: http://gerrit.cloudera.org:8080/1248
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
57 KiB
57 KiB