mirror of
https://github.com/apache/impala.git
synced 2026-02-02 15:00:38 -05:00
Adds a new query option 'topn_bytes_limit' that places a limit on the number of estimated bytes that a TopN operator can process. If the Impala planner estimates that a TopN operator will process more bytes than this limit, it will replace the TopN operator with a sort operator. Since the TopN operator cannot spill to disk, it has to buffer everything in memory. This can cause frequent OOM issues when running with a large limit + offset. Switching to a sort operator allows Impala to spill to disk. We prefer to use the TopN operator when possible as it has better performance than the sort operator for 'order by limit [offset]' queries. The default limit is set to 512MB and is based on micro-benchmarking the topn vs. sort operator for various limits (see the JIRA for full details). The default is set to an intentionally high value in order to avoid performance regressions. Testing: * Added a new planner test to fuctional-planner/ to validate that 'topn_bytes_limit' properly switches between topn and sort operators. Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Reviewed-on: http://gerrit.cloudera.org:8080/11698 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>