mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
IMPALA-14514: Handle serializing bytes in bin/run-workload.py
On python 3, when Impyla receives a result with a string that is not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that contains invalid UTF-8, so bin/run-workload.py can fail while trying to dump this to JSON. This modifies CustomJSONEncoder to handle serializing bytes by converting it to a string with invalid unicode handled with backslashes. Testing: - Ran bin/run-workload.py against TPC-DS scale 20 Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea Reviewed-on: http://gerrit.cloudera.org:8080/23602 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This commit is contained in:
@@ -145,6 +145,11 @@ class CustomJSONEncoder(json.JSONEncoder):
|
||||
if isinstance(obj, datetime):
|
||||
# Convert datetime into an standard iso string
|
||||
return obj.isoformat()
|
||||
if isinstance(obj, bytes):
|
||||
# Impyla can leave a string value as bytes when it is unable to decode it to UTF-8.
|
||||
# TPC-DS has queries that produce non-UTF-8 results (e.g. Q30 on scale 20)
|
||||
# Convert bytes to strings to make JSON encoding work
|
||||
return obj.decode(encoding="utf-8", errors="backslashreplace")
|
||||
elif isinstance(obj, (Query, HiveQueryResult, QueryExecConfig, TableFormatInfo)):
|
||||
# Serialize these objects manually by returning their __dict__ methods.
|
||||
return obj.__dict__
|
||||
|
||||
Reference in New Issue
Block a user