Files
impala/testdata/empty_parquet_page_source_impala10186
Michael Smith 2fc4f74796 IMPALA-10186: Fix writing empty parquet page
Fixes writing an empty parquet page when a page fills (or reaches
parquet_page_row_count_limit) at the same time that its dictionary
fills.

When a page filled (or reached parquet_page_row_count_limit) at the same
time that the dictionary filled, Impala would first detect the page was
full and create a new page. It would then detect the dictionary is full
and create another page, resulting in an empty page.

Parquet readers like Hive error if they encounter an empty page. This
patch attempts to make it impossible to generate an empty page by
reworking AppendRow and adding DCHECKs for empty pages. Dictionary size
is checked on FinalizeCurrentPage so whenever a page is written, we also
flush the dictionary if full.

Addresses clang-tidy by adding override in source files.

Testing:
- new test for full page size reached with full dictionary
- new test for parquet_page_row_count_limit with full dictionary
- new test for parquet_page_row_count_limit followed by large value.
  This seems useful as a theoretical corner-case; it currently writes
  the too-large value to the page anyway, but if we ever start checking
  whether the first value will fit the page this could become an issue.

Change-Id: I90d30d958f07c6289a1beba1b5df1ab3d7213799
Reviewed-on: http://gerrit.cloudera.org:8080/19898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 12:02:42 +00:00
..