mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-11325: Fix UnicodeDecodeError for shell file output
When using the --output_file commandline option for
impala-shell, the shell fails with UnicodeDecodeError
if the output contains Unicode characters.
For example, if running this command:
impala-shell -B -q "select '引'" --output_file=output.txt
This fails with:
UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
This happens due to an encode('utf-8') call happening
in OutputStream::write() on a string that is already UTF-8 encoded.
This changes the code to skip the encode('utf-8') call for Python 2.
Python 3 is using a string and still needs the encode call.
This is mostly a pragmatic fix to make the code a little bit
more functional, and there is more work to be done to have
clear contracts for the format() methods and clear points
of conversion to/from bytes.
Testing:
- Ran shell tests with Python 2 and Python 3 on Ubuntu 18
- Added a shell test that outputs a Unicode character
to an output file. Without the fix, this test fails.
Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
Reviewed-on: http://gerrit.cloudera.org:8080/18576
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
@@ -1202,6 +1202,23 @@ class TestImpalaShell(ImpalaTestSuite):
|
||||
rows_from_file = [line.rstrip() for line in f]
|
||||
assert rows_from_stdout == rows_from_file
|
||||
|
||||
def test_output_file_utf8(self, vector, tmp_file):
|
||||
"""Test that writing UTF-8 output to a file using '--output_file' produces the
|
||||
same output as written to stdout."""
|
||||
# This is purely about UTF-8 output, so it doesn't need multiple rows.
|
||||
query = "select '引'"
|
||||
# Run the query normally and keep the stdout
|
||||
output = run_impala_shell_cmd(vector, ['-q', query, '-B', '--output_delimiter=;'])
|
||||
assert "Fetched 1 row(s)" in output.stderr
|
||||
rows_from_stdout = output.stdout.strip().split('\n')
|
||||
# Run the query with output sent to a file using '--output_file'.
|
||||
result = run_impala_shell_cmd(vector, ['-q', query, '-B', '--output_delimiter=;',
|
||||
'--output_file=%s' % tmp_file])
|
||||
assert "Fetched 1 row(s)" in result.stderr
|
||||
with open(tmp_file, "r") as f:
|
||||
rows_from_file = [line.rstrip() for line in f]
|
||||
assert rows_from_stdout == rows_from_file
|
||||
|
||||
def test_http_socket_timeout(self, vector):
|
||||
"""Test setting different http_socket_timeout_s values."""
|
||||
if (vector.get_value('strict_hs2_protocol') or
|
||||
|
||||
Reference in New Issue
Block a user