mirror of
https://github.com/apache/impala.git
synced 2025-12-19 18:12:08 -05:00
TestTupleCacheFullCluster.test_scan_range_distributed is flaky on s3 builds. The addition of a single file is changing scheduling significantly even with scan ranges sorted oldest to newest. This is because modification times on S3 have a granularity of one second. Multiple files have the same modification time, and the fix for IMPALA-13548 did not properly break ties for sorting. This adds logic to break ties for files with the same modification time. It compares the path (absolute path or relative path + partition) as well as the offset within the file. These should be enough to break all conceivable ties, as it is not possible to have two scan ranges with the same file at the same offset. In debug builds, this does additional validation to make sure that when a != b, comp(a, b) != comp(b, a). The test requires that adding a single file to the table changes exactly one cache key. If that final file has the same modification time as an existing file, scheduling may still mix up the files and change more than one cache key, even with tie-breaking. This adds a sleep just before generating the final file to guarantee that it gets a newer modification time. Testing: - Ran TestTupleCacheFullCluster.test_scan_range_distributed for 15 iterations on S3 Change-Id: I3f2e40d3f975ee370c659939da0374675a28cd38 Reviewed-on: http://gerrit.cloudera.org:8080/23458 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
46 KiB
46 KiB