Commit Graph

2 Commits

Author SHA1 Message Date
Aaron Davidson
cafb7b72f8 External sorting
This is an experimental implementation of external sorting. This patch includes the following additions:
(1) creation and implementation of the Sorter interface, which can sort Impala Tuples.
(2) normalization of Tuples to allow memcmp-able sorting.
(3) a testing framework for the Sorter,
(4) a benchmark to compare the current state of the Sorter with other sorts,
(5) an implementation of a Vector which can store data whose size is only known at runtime,
(6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector,
(7) implementation of a simple in-memory Merger, and
(8) logic to stream blocks of memory in and out of memory for the actual external merging.

I have a local branch for experimental optimizations and benchmarking -- this should be considered
a "basic", working sort.

The following optimizations have been implemented:
(i)   Optionally extracting keys instead of writing them in place.
(ii)  Optionally opportunistically parallelize run building (sorting & prepare for output).
(iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping
      them in memory until right when they're needed.
(iv)  Prepare auxililary data backwards so the buffers can be released as we go, and still
      go out in an order which preserves the first buffers of the run.
(v)   Always merge maximum number of runs at a time, taking from the next merge level if
      available.

Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/126
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>
2014-01-08 10:52:27 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00