This patch does a few things:
1. Moves the buffer block mgr from the sorter to the runtime state. This is now
one that is shared across the query fragment. The partitioned hash join and agg
will use this as well.
2. Adds a Client interface to the block mgr. Each exec node is a different client
and can reserve a minimum number of buffers. This avoid starvation.
3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse
two existing APIs.
Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574
This patch does two things in preparation for external joins. The
hash table used to contain a directory structure (buckets and nodes)
both of which were contiguous. The nodes contained the tuple ptrs
within it.
This patch changes it so the nodes are not stored contiguously but
allocated in pages. (this structure is dense and does not require
random lookups by index). The bucket structure is still contiguous
since we rely on the doubling property and random lookup by index.
The second change is that the node's no longer store the tuple ptrs
within them. This makes it easier to build the hash table ontop of
existing data.
Here's a quick benchmark doing a self join on tpch lineitem. Both
build and probe times decreased a bit.
Before:
HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%)
- BuildBuckets: 2.10M (2097152)
- BuildRows: 6.00M (6001215)
- BuildTime: 527.991ms
- LeftChildRows: 6.00M (6001215)
- LeftChildTime: 451.964ms
- LoadFactor: 0.50
- RowsReturned: 30.01M (30012985)
- RowsReturnedRate: 26.33 M/sec
After:
HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%)
- BuildBuckets: 2.10M (2097152)
- BuildRows: 6.00M (6001215)
- BuildTime: 423.175ms
- LeftChildRows: 6.00M (6001215)
- LeftChildTime: 406.67ms
- LoadFactor: 0.50
- RowsReturned: 30.01M (30012985)
- RowsReturnedRate: 29.45 M/sec
Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.
Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
the top-n.test file
Extra time required:
Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds
Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec
Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205