Commit Graph

9 Commits

Author SHA1 Message Date
Tim Armstrong
89aa6597f4 IMPALA-3354: bad sorter pivot selection on some inputs
Switch to a median of three random tuples that should be very robust to
a range of inputs. It may be slightly worse than the existing pivot
selection on some inputs where the original algorithm is close to
optimal (e.g. already sorted inputs), but should be typically
better overall.

Always recurse on the smaller partition: this prevent the stack
overflow even with bad pivot selection.

The overhead is minimal - in profiles for small sorts I'm seeing pivot
selection take at most 0.5% of CPU time.

The improved pivot selections gives modest improvements of 2-5% on the
targeted perf order by benchmarks on a single node run with TPC-H
scale factor 20.

Change-Id: Iae50112b6deca3d6268e18b6f4daae1af279b452
Reviewed-on: http://gerrit.cloudera.org:8080/2824
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:39 -07:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
Nong Li
f96fb27982 BlockMgr reservation improvements.
Change-Id: I9c7fa0ce54cbbf5b2a7a368c27c379c2a5241fc8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4732
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5092
2014-11-03 22:32:57 -08:00
Nong Li
e08ffde009 PA/PHJ: Increase fanout to 32 and fix interaction with small buffers.
Small buffers introduced an issue that is exacerbated by the large fanout. A stream can
only be appended to forever once it has grabbed the initial io sized buffer. With small
buffers, we don't grab that at the beginning anymore and, before this patch, it is
grabbed when the stream first needs it. This means when one stream needs it, another
stream could have already grabbed it (meaning this stream is pinned with multiple
buffers).

This patch has all the streams grab an IO buffer as soon as the first stream needs an
io buffer. This guarantees that all streams get 1 before any get 2.

Change-Id: I1be1219fc5f1fa3ceedd4d5e76ae056c8bb8ff3d
2014-10-06 15:16:16 -07:00
Nong Li
28e16b02bb Change BufferedBlockMgr to be a query wide singleton.
Similar to some of our other resource management objects, the buffered block mgr
will be shared by all fragments within a query.

The memory given to the block mgr is based on the query limit (e.g. 80% of query limit).
We can't have each fragment having a block mgr that uses 80% of the query limit and
we probably don't want to impose per fragment limits.

Change-Id: Idcd89f302534b37ed236cdd42784ae8d717ec29e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3965
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4179
2014-09-05 00:08:05 -07:00
Nong Li
7dc57aaa9e Change buffered block mgr to support multiple clients.
This patch does a few things:
1. Moves the buffer block mgr from the sorter to the runtime state. This is now
   one that is shared across the query fragment. The partitioned hash join and agg
   will use this as well.
2. Adds a Client interface to the block mgr. Each exec node is a different client
   and can reserve a minimum number of buffers. This avoid starvation.
3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse
   two existing APIs.

Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574
2014-07-22 12:45:37 -07:00
Nong Li
188a0ea833 Rework structure of hash table.
This patch does two things in preparation for external joins. The
hash table used to contain a directory structure (buckets and nodes)
both of which were contiguous. The nodes contained the tuple ptrs
within it.

This patch changes it so the nodes are not stored contiguously but
allocated in pages. (this structure is dense and does not require
random lookups by index). The bucket structure is still contiguous
since we rely on the doubling property and random lookup by index.

The second change is that the node's no longer store the tuple ptrs
within them. This makes it easier to build the hash table ontop of
existing data.

Here's a quick benchmark doing a self join on tpch lineitem. Both
build and probe times decreased a bit.

Before:
 HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 527.991ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 451.964ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 26.33 M/sec
After:
HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%)
         - BuildBuckets: 2.10M (2097152)
         - BuildRows: 6.00M (6001215)
         - BuildTime: 423.175ms
         - LeftChildRows: 6.00M (6001215)
         - LeftChildTime: 406.67ms
         - LoadFactor: 0.50
         - RowsReturned: 30.01M (30012985)
         - RowsReturnedRate: 29.45 M/sec

Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-07-15 16:57:09 -07:00
Alex Behm
881f3a8c33 Re-order union operands descending by their estimated per-host memory.
Re-order union operands descending by their estimated per-host memory,
s.t. parent nodes can gauge the peak memory consumption of a MergeNode after
opening it during execution (a MergeNode opens its first operand in Open()).
Scan nodes are always ordered last because they can dynamically scale down their
memory usage, whereas many other nodes cannot (e.g., joins, aggregations).
One goal is to decrease the likelihood of a SortNode parent claiming too much
memory in its Open(), possibly causing the mem limit to be hit when subsequent
union operands are executed.

Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213
Tested-by: jenkins
2014-06-20 18:46:10 -07:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00