Files
impala/testdata/workloads/functional-query/queries/QueryTest/special-strings.test
stiga-huang e7839c4530 IMPALA-10416: Add raw string mode for testfiles to verify non-ascii results
Currently, the result section of the testfile is required to used
escaped strings. Take the following result section as an example:
  --- RESULTS
  'Alice\nBob'
  'Alice\\nBob'
The first line is a string with a newline character. The second line is
a string with a '\' and an 'n' character. When comparing with the actual
query results, we need to escape the special characters in the actual
results, e.g. replace newline characters with '\n'. This is done by
invoking encode('unicode_escape') on the actual result strings. However,
the input type of this method is unicode instead of str. When calling it
on str vars, Python will implicitly convert the input vars to unicode
type. The default encoding, ascii, is used. This causes
UnicodeDecodeError when the str contains non-ascii bytes. To fix this,
this patch explicitly decodes the input str using 'utf-8' encoding.

After fixing the logic of escaping the actual result strings, the next
problem is that it's painful to write unicode-escaped expected results.
Here is an example:
  ---- QUERY
  select "你好\n你好"
  ---- RESULTS
  '\u4f60\u597d\n\u4f60\u597d'
  ---- TYPES
  STRING
It's painful to manually translate the unicode characters.

This patch adds a new comment, RAW_STRING, for the result section to use
raw strings instead of unicode-escaped strings. Here is an example:
  ---- QUERY
  select "你好"
  ---- RESULTS: RAW_STRING
  '你好'
  ---- TYPES
  STRING
If the result contains special characters, it's recommended to use the
default string mode. If the special characters only contain newline
characters, we can use RAW_STRING and the existing MULTI_LINE comment
together.

This patch also fixes the issue that pytest fails to report assertion
failures if any of the compared str values contain non-ascii bytes
(IMPALA-10419). However, pytest works if the compared values are both
in unicode type. So we explicitly converting the actual and expected str
values to unicode type.

Test:
 - Add tests in special-strings.test for raw string mode and the escaped
   string mode (default).
 - Run test_exprs.py::TestExprs::test_special_strings locally.

Change-Id: I7cc2ea3e5849bd3d973f0cb91322633bcc0ffa4b
Reviewed-on: http://gerrit.cloudera.org:8080/16919
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-06 04:39:56 +00:00

109 lines
1.7 KiB
Plaintext

====
---- QUERY
# Reproduces IMPALA-6734. Before fixing this would pass if the results had a single
# quote for each value instead of the correct string.
select "a comma,"
from alltypestiny
---- RESULTS
'a comma,'
'a comma,'
'a comma,'
'a comma,'
'a comma,'
'a comma,'
'a comma,'
'a comma,'
---- TYPES
STRING
====
---- QUERY
# Test that escaping single quotes in result string works.
select "'"
---- RESULTS
''''
---- TYPES
STRING
====
---- QUERY
select "你好"
---- RESULTS
'\u4f60\u597d'
---- TYPES
STRING
====
---- QUERY
select "你好"
---- RESULTS: RAW_STRING
'你好'
---- TYPES
STRING
====
---- QUERY
select "你好\n你好"
---- RESULTS
'\u4f60\u597d\n\u4f60\u597d'
---- TYPES
STRING
====
---- QUERY
select "你好\n你好"
---- RESULTS: RAW_STRING,MULTI_LINE
['你好
你好']
---- TYPES
STRING
====
---- QUERY
select "你好\\n你好"
---- RESULTS
'\u4f60\u597d\\n\u4f60\u597d'
---- TYPES
STRING
====
---- QUERY
select "你好\\n你好"
---- RESULTS: RAW_STRING
'你好\n你好'
---- TYPES
STRING
====
---- QUERY
select "你好", "Halló", "여보세요"
---- RESULTS
'\u4f60\u597d','Hall\xf3','\uc5ec\ubcf4\uc138\uc694'
---- TYPES
STRING,STRING,STRING
====
---- QUERY
select "你好", "Halló", "여보세요"
---- RESULTS: RAW_STRING
'你好','Halló','여보세요'
---- TYPES
STRING,STRING,STRING
====
---- QUERY
values ("你好"), ("Halló"), ("여보세요")
---- RESULTS: RAW_STRING,VERIFY_IS_SUBSET
'你好'
'여보세요'
---- TYPES
STRING
====
---- QUERY
values ("你好"), ("Halló"), ("여보세요")
---- RESULTS: RAW_STRING,VERIFY_IS_SUPERSET
'你好'
'여보세요'
'Halló'
'hello'
---- TYPES
STRING
====
---- QUERY
values ("你好"), ("Halló"), ("여보세요")
---- RESULTS: RAW_STRING,VERIFY_IS_NOT_IN
'hello'
---- TYPES
STRING
====