Files
impala/be
Daniel Vanko 321429eac6 IMPALA-14237: Fix Iceberg partition values encoding
This patch modifies the string overload of
IcebergFunctions::TruncatePartitionTransform so that it always handles
strings as UTF-8-encoded ones, because the Iceberg specification states
that that strings are UTF-8 encoded.

Also, for an Iceberg table UrlEncode is called in not the
Hive-compatible way, rather than the standard way, similar to Java's
URLEncoder.encode() (which the Iceberg API also uses) to conform with
existing practices by Hive, Spark and Trino. This included a change in
the set of characters which are not escaped to follow the URL Standard's
application/x-www-form-urlencoded format. [1] Also renamed it from
ShouldNotEscape to IsUrlSafe for better readability.

Testing:
 * add and extend e2e tests to check partitions with Unicode characters
 * add be tests to coding-util-test.cc

[1]: https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set

Change-Id: Iabb39727f6dd49b76c918bcd6b3ec62532555755
Reviewed-on: http://gerrit.cloudera.org:8080/23190
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-08 18:54:07 +00:00
..
2017-08-31 01:40:47 +00:00