mirror of
https://github.com/apache/impala.git
synced 2026-02-01 12:00:22 -05:00
test_unescaped_string_partition in metadata/test_recover_partitions.py use hdfs clients to create four partition directories with special characters, i.e. single quote, double quotes and back slash. It aims to test on whether ALTER TABLE RECOVER PARTITIONS can recognize those directories correctly. However, when running against s3, only two directories are created as expected, which causes the failure. The reason is that when running against s3, we use hadoop cli for operations. A shell command will be launched for each operation. Passing arguments through shell results in duplicate unescaping. So the 4 dirs, [p=', p=", p=\', p=\"] finally became [p=', p=", p=', p="], resulting in two distinct directories. When the test running against hdfs, we use webhdfs_client so don't have this issue. Actually, we shouldn't use special characters in partition paths. Hive converts them to their ascii hex values when creating partition directories. E.g. partition paths of [p=', p=", p=\', p=\"] are [p=%27, p=%22, p=%5C%27, p=%5C%22]. We should follow this rule when creating directories in test. Also we won't have the above shell issue on s3 anymore. Tests: - Added two more special partitions in test_unescaped_string_partition. - Ran test_unescaped_string_partition in S3. Change-Id: I63d149c9bdec52c2e1c0b25c8c3f0448cf7bdadb Reviewed-on: http://gerrit.cloudera.org:8080/15475 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>