This change aims to decrease back-pressure in the sorter. It offers an
alternative for the in-memory run formation strategy and sorting
algorithm by introducing a new in-memory merge level between the
in-memory quicksort and the external merge phase.
Instead of forming one big run, it produces many smaller in-memory runs
(called miniruns), sorts those with quicksort, then merges them
in memory, before spilling or serving GetNext().
The external merge phase remains the same.
Works with MAX_SORT_RUN_SIZE development query option that determines
the maximum number of pages in a 'minirun'. The default value of
MAX_SORT_RUN_SIZE is 0, which keeps the original implementation of 1
big initial in-memory run. Other options are integers of 2 and above.
The recommended value is 10 or more, to avoid high fragmentation
in case of large workloads and variable length data.
Testing:
- added MAX_SORT_RUN_SIZE as an additional test dimension to
test_sort.py with values [0, 2, 20]
- additional partial sort test case (inserting into partitioned
kudu table)
- manual E2E testing
Change-Id: I58c0ae112e279b93426752895ded7b1a3791865c
Reviewed-on: http://gerrit.cloudera.org:8080/18393
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
This patch adds two perf tools to the repository. Both can be used to
generate flame graphs (https://www.brendangregg.com/flamegraphs.html).
perf-record.sh:
It samples the CPU stack traces for the entire system, or for a
specific PID until the user hits Ctrl+C. It can be useful if the
developer wants to take a look at what Impala is doing.
The resulting flame graph is written to an SVG file.
perf-query.sh:
It takes a query string as a parameter and passes it to the impala
shell to execute. While the query is executing the script samples
the CPU stack traces for the entire system. The resulting flame
graph is written to an SVG file.
E.g.:
perf-query.sh "select count(*) from tpch.lineitem group by l_returnflag"
Change-Id: Ib3da696b939204d23c5285dcf1bf6ee3a3738415
Reviewed-on: http://gerrit.cloudera.org:8080/17834
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>