mirror of
https://github.com/apache/impala.git
synced 2026-01-02 03:00:32 -05:00
Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Reviewed-on: http://gerrit.cloudera.org:8080/6527 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins
34 lines
1.2 KiB
Plaintext
34 lines
1.2 KiB
Plaintext
====
|
|
---- QUERY
|
|
# Read from the corrupt files. We may get partial results.
|
|
select * from bad_avro_snap_strings
|
|
---- RESULTS: VERIFY_IS_SUPERSET
|
|
'valid'
|
|
---- TYPES
|
|
string
|
|
---- ERRORS
|
|
row_regex: .*Problem parsing file $NAMENODE/.*
|
|
File '$NAMENODE/test-warehouse/bad_avro_snap_strings_avro_snap/truncated_string.avro' is corrupt: truncated data block at offset 155
|
|
File '$NAMENODE/test-warehouse/bad_avro_snap_strings_avro_snap/negative_string_len.avro' is corrupt: invalid length -7 at offset 164
|
|
File '$NAMENODE/test-warehouse/bad_avro_snap_strings_avro_snap/invalid_union.avro' is corrupt: invalid union value 4 at offset 174 (1 of 2 similar)
|
|
====
|
|
---- QUERY
|
|
# Read from the corrupt files. We may get partial results.
|
|
select * from bad_avro_snap_floats
|
|
---- RESULTS: VERIFY_IS_SUPERSET
|
|
1
|
|
---- TYPES
|
|
float
|
|
---- ERRORS
|
|
Problem parsing file $NAMENODE/test-warehouse/bad_avro_snap_floats_avro_snap/truncated_float.avro at 159
|
|
File '$NAMENODE/test-warehouse/bad_avro_snap_floats_avro_snap/truncated_float.avro' is corrupt: truncated data block at offset 159
|
|
====
|
|
---- QUERY
|
|
select * from bad_avro_decimal_schema
|
|
---- TYPES
|
|
string,decimal
|
|
---- RESULTS
|
|
---- ERRORS
|
|
Column 'value': invalid Avro decimal type with precision = '5' scale = '7'
|
|
====
|