mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Fixes writing an empty parquet page when a page fills (or reaches parquet_page_row_count_limit) at the same time that its dictionary fills. When a page filled (or reached parquet_page_row_count_limit) at the same time that the dictionary filled, Impala would first detect the page was full and create a new page. It would then detect the dictionary is full and create another page, resulting in an empty page. Parquet readers like Hive error if they encounter an empty page. This patch attempts to make it impossible to generate an empty page by reworking AppendRow and adding DCHECKs for empty pages. Dictionary size is checked on FinalizeCurrentPage so whenever a page is written, we also flush the dictionary if full. Addresses clang-tidy by adding override in source files. Testing: - new test for full page size reached with full dictionary - new test for parquet_page_row_count_limit with full dictionary - new test for parquet_page_row_count_limit followed by large value. This seems useful as a theoretical corner-case; it currently writes the too-large value to the page anyway, but if we ever start checking whether the first value will fit the page this could become an issue. Change-Id: I90d30d958f07c6289a1beba1b5df1ab3d7213799 Reviewed-on: http://gerrit.cloudera.org:8080/19898 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>