mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
IMPALA-6543: Limit RowBatch serialization size to INT_MAX
The serialization format of a row batch relies on tuple offsets. In its current form, the tuple offsets are int32s. This means that it is impossible to generate a valid serialization of a row batch that is larger than INT_MAX. This changes RowBatch::SerializeInternal() to return an error if trying to serialize a row batch larger than INT_MAX. This prevents a DCHECK on debug builds when creating a row larger than 2GB. This also changes the compression logic in RowBatch::Serialize() to avoid a DCHECK if LZ4 will not be able to compress the row batch. Instead, it returns an error. This modifies row-batch-serialize-test to verify behavior at each of the limits. Specifically: RowBatches up to size LZ4_MAX_INPUT_SIZE succeed. RowBatches with size range [LZ4_MAX_INPUT_SIZE+1, INT_MAX] fail on LZ4 compression. RowBatches with size > INT_MAX fail with RowBatch too large. Change-Id: I3b022acdf3bc93912d6d98829b30e44b65890d91 Reviewed-on: http://gerrit.cloudera.org:8080/9367 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins
This commit is contained in:
committed by
Impala Public Jenkins
parent
46f74df0ad
commit
93e7a72dba
@@ -352,6 +352,9 @@ error_codes = (
|
||||
("PARQUET_BIT_PACKED_LEVELS", 115,
|
||||
"Can not read Parquet file $0 with deprecated BIT_PACKED encoding for rep or "
|
||||
"def levels. Support was removed in Impala 3.0 - see IMPALA-6077."),
|
||||
|
||||
("ROW_BATCH_TOO_LARGE", 116,
|
||||
"Row batch cannot be serialized: size of $0 bytes exceeds supported limit of $1"),
|
||||
)
|
||||
|
||||
import sys
|
||||
|
||||
Reference in New Issue
Block a user