As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.
Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);
"where": Case-insensitive trim direction. Valid options are "leading",
"trailing", and "both". "leading" means trimming characters from the
start; "trailing" means trimming characters from the end; "both" means
trimming characters from both sides. For Syntax #2, since no "where"
is specified, the option "both" is implied by default.
"charset": Case-sensitive characters to be removed. This argument is
regarded as a character set going to be removed. The occurrence order
of each character doesn't matter and duplicated instances of the same
character will be ignored. NULL argument implies " " (standard space)
by default. Empty argument ("" or '') makes TRIM return the string
untouched. For Syntax #1, since no "charset" is specified, it trims
" " (standard space) by default.
"string": Case-sensitive target string to trim. This argument can be
NULL.
The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().
UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.
Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.
2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.
3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.
Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
The prettyprint_duration function was originally
implemented in IMPALA-12824 to work with the workload
management tables which stored durations in integer
nanoseconds. These tables have changed to store decimal
seconds.
The prettyprint_duration function would have required a
large investment of time to make it work with decimal
values, and since the new format is more human readable
anyways, this function has been removed.
Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc
Reviewed-on: http://gerrit.cloudera.org:8080/21208
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The prettyprint_duration function takes an integer input containing a
number of nanoseconds and returns a human readable value breaking down
the input by hours, minutes, seconds, milliseconds, microseconds, and
nanoseconds.
The prettyprint_bytes function takes an integer input containing a
number of bytes and returns a human readable values breaking down the
input by gigabytes, megabytes, kilobytes, and bytes.
Functionality tests were added to the existing expr-test suite that
tests built-in functions.
Functional-query workloads were added in two new .test files under the
testdata directory to exercise these two new functions. Corresponding
pytests were added to run the tests in these new .test files.
Benchmarks were added to expr-benchmark, and new benchmarks were
generated with a release build running on a machine with the cpu
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz.
Documentation was added to the built-in string functions docs.
Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Reviewed-on: http://gerrit.cloudera.org:8080/21038
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
- The function titles were changed to upper case.
- The function titles no longer use <codeph>. <codeph> font appears
smaller than the <p> font.
- Return type were changed to upper case data types.
- Minor typos were fixed, such as extra commas and periods in titles.
- The indexterm dita elememts were removed. Indexterm was incomplete
and WIP. No plan to go ahead and implement it, so removed.
Change-Id: I797532463da8d29fe5bc7543cfdfb5b2b82db197
Reviewed-on: http://gerrit.cloudera.org:8080/11619
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
- Added the following new functions to the string functions doc:
left() and right()
- Added the following new functions to the date time functions doc:
monthname(), quarter(), week()
- Added quarter to extract() and date_part() as a unit
Change-Id: Icf31b50584628603c0c86ff0772a12ac6ac5c7b6
Reviewed-on: http://gerrit.cloudera.org:8080/9880
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Modify both char_length() and length() usage notes to say when they
return the same or different results.
Include the same example, showing both STRING and CHAR types,
under both functions.
Change-Id: I18cabfce66351bb890bfbfc26b93466204a82625
Reviewed-on: http://gerrit.cloudera.org:8080/9014
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Included syntax, mention of performance benefit, and examples.
Also added an item in the "new features" list with a link to the
detailed writeup.
Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Reviewed-on: http://gerrit.cloudera.org:8080/6979
Reviewed-by: Greg Rahn <grahn@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
For history and tracking purposes, there are many
instances of rev="CDH-1234" for various CDH- JIRA
numbers. This produces no visible output, it's just
FYI for the person editing the source. Removing all
these now from the upstream doc source, so as not
to have "CDH" all through the doc source files.
Change-Id: I29089e5a31cd72e876b2ccb8375d1c10693c6aba
Reviewed-on: http://gerrit.cloudera.org:8080/6349
Reviewed-by: Ambreen Kazi <ambreen.kazi@cloudera.com>
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
For this change to land in master, the audience="hidden" code review
needs to be completed first. Otherwise, the doc build would still work
but the audience="hidden" content would be visible rather than hidden as
desired.
Some work happening in parallel might introduce additional instances of
audience="Cloudera". I suggest addressing those in a followup CR so this
global change can land quickly.
Since the changes apply across so many different files, but are so
narrow in scope, I suggest that the way to validate (check that no
extraneous changes were introduced accidentally) is to diff just the
changed lines:
git diff -U0 HEAD^ HEAD
In patch set 2, I updated other topics marked audience="Cloudera"
by CRs that were pushed in the meantime.
Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8
Reviewed-on: http://gerrit.cloudera.org:8080/5613
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
This now gives a clean RAT check with bin/check-rat-report.py, which
is one way for the Impala community to check compliance with ASF rules
on intellectual property.
Change-Id: I2ad06435f84a65ba126759e42a18fdaf52cd7036
Reviewed-on: http://gerrit.cloudera.org:8080/5232
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
Reviewed-by: John Russell <jrussell@cloudera.com>
These are refugees from doc_prototype. They can be rendered with the
DITA Open Toolkit version 2.3.3 by:
/tmp/dita-ot-2.3.3/bin/dita \
-i impala.ditamap \
-f html5 \
-o $(mktemp -d) \
-filter impala_html.ditaval
Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Reviewed-on: http://gerrit.cloudera.org:8080/5014
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: John Russell <jrussell@cloudera.com>