Commit Graph

2051 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
68fef6a5bf IMPALA-2213: make Parquet scanner fail query if the file size metadata is stale
This patch changes the Parquet scanner to check if it can't read the
full footer scan range, indicating that file has been overwritten by a
shorter file without refreshing the table metadata. Before it would
DCHECK. This patch adds a test for this case, as well as the case
where the new file is longer than the metadata states (which fails
with an existing error).

Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07
Reviewed-on: http://gerrit.cloudera.org:8080/1084
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-10-01 13:58:39 -07:00
Juan Yu
6bac14a283 IMPALA-2005: Cleanup the newly created table if CTAS fails.
If CTAS query fails during the DML part Impala
should drop the newly created table.

Change-Id: I39e04a6923a36afa48f3252addd50ddda83d1706
(cherry picked from commit e03ce43585f68590a95038341e74db458f34bf32)
Reviewed-on: http://gerrit.cloudera.org:8080/870
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:38 -07:00
Skye Wanderman-Milne
0c5e6a804f IMPALA-2443: add support for more Parquet array encodings
This patch adds full support for the various Parquet array encodings,
as well as tests that use files from
https://github.com/apache/hive/tree/master/data/files. This should
allow us to read any existing array data.

Change-Id: I3d22ae237b1dc82ee75a83c1d4890d76316fadee
Reviewed-on: http://gerrit.cloudera.org:8080/826
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:37 -07:00
Skye Wanderman-Milne
eb9db01092 IMPALA-2377: fix error case in ArrayValueBuilder::GetFreeMemory()
We weren't actually returning and would try to allocate an array
larger than INT_MAX. I tested by hand that a very large array (200M
elements) would fail to be allocated and the appropriate error is
returned to the shell, as well as adding an ArrayValueBuilder unit test.

Change-Id: Iedc3b3ca8c9c07100d6602f8e8cc9cfd57151747
Reviewed-on: http://gerrit.cloudera.org:8080/886
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:35 -07:00
Skye Wanderman-Milne
4d68fcc87e Fix MemPool allocations of INT_MAX.
Since MemPool::Allocate() takes an int, INT_MAX is the largest
possible requested allocation. This and slightly lower values would
cause the 'num_bytes' variable in the private Allocate() function
overflow, which would yield a valid pointer to a buffer too small to
hold the requested number of bytes. This patch fixes this problem by
making 'num_bytes' an int64_t, and adds a test for this behavior.

This may help some problems related to IMPALA-1619, although it's not
quite the same since MemPool was never limited to 1GB, only FreePool.

Change-Id: Ia8040d87d3be32896944b5d44ff0c1f2667837e0
Reviewed-on: http://gerrit.cloudera.org:8080/1107
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:54 -07:00
Skye Wanderman-Milne
62d4e5406b Change bad Status in LlvmCodeGen::LoadImpalaIR() to CHECK
This should help diagnose IMPALA-2439 if we see this condition again.

Change-Id: I37e7bf0e8a1a620a9dddcaacf37290dafe9c61cc
Reviewed-on: http://gerrit.cloudera.org:8080/1108
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:53 -07:00
Matthew Jacobs
70b9954593 IMPALA-2189: [RM] Retry logic for Llama RPC may throw exception
The code in resource-broker.cc that makes RPCs to Llama will
attempt to retry the RPC some number of times (which is
configurable) if the RPC returns a failure. If the RPC
throws (which thrift may do), we try to reset the connection
and then make the RPC again, but this time not guarded by a
try/catch block. If this RPC throws, the process will crash.

This fixes the issue by removing the try/catch and instead
using the ClientCache DoRpc function which handles this
already. Some additional Llama RPC calling wrappers were
removed as well.

Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a
Reviewed-on: http://gerrit.cloudera.org:8080/881
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:52 -07:00
Tim Armstrong
4ebe8de6a1 IMPALA-2444: Part 2: Parquet scanner perf improvements
This implements some miscellaneous improvements for the inner loops
of the Parquet scanner.

Hoist out a definition level check. This only matters when scanning
collections.

Pull out status construction from the hot parts of the code into
separate functions to avoid polluting instruction caches.
Also add unlikely annotations. This gives ~2% improvement.

Negate conjuncts_failed to be conjuncts_passed, allowing a branch
to be replaced with bitwise &. This gives ~0.5% improvement.

Change-Id: I787a2d125998fca49de03b34c372729a30ef090b
Reviewed-on: http://gerrit.cloudera.org:8080/1098
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:51 -07:00
ishaan
fcd642caac IMPALA-2282: Use kerberos client defaults while kinit'ing
This patch subtly changes the behaviour or how Impala renews its tickets. Previously, it
would choose its own default for renew_lifetime. With this patch, it now issues a kinit
without the -r flag, which enables kinit to use the default in the krb5.conf. This is
consistent with the rest of the eco system.

Testing:

Case 1:
renew_lifetime unset (or 0)
ticket_lifetime = 15 minutes.
reinit_interval = 5 minutes.
Result: impalad fails to create a renewable ticket.

Case 2:
renew_lifetime = 30 minutes
ticket_lifetime = 15 minutes.
reinit_interval = 5 minutes.
Result: Impala starts up, is able to renew its tickets and issue queries.

Case 3:
renew_lifetime = 1day
ticket_lifetime = 15 minutes
reinit_interval = 4 minute
Result: Impala starts up and is able to renew its tickets.
Additionally, verified that Impala works fine in the interval between max_lifetime and
renew_lifetime.

Case 4:
renew_lifetime = 1d
ticket_lifetime = 1 minutes.
reinit_interval = 5 minutes.
Result: Impala's ok, but all queries will fail because after a minute of renewal because the
ticket is dead (this is expected, and a bad config).

Change-Id: I32b059bde0d565d31ead7edf38b0ca74c555121d
Reviewed-on: http://gerrit.cloudera.org:8080/879
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:50 -07:00
Tim Armstrong
24d39886c8 IMPALA-2444: Part 1: Parquet scanner perf improvements
This is a set of related changes that avoid executing nested
types-specific logic in cases when it is not needed. The additional
branches had a measurable performance overhead.

Add template arguments for AssembleRows and ReadRow so that code is
separately optimized for the case when we're assembling collection rows
versus top-level rows. Several branches are not needed when assembling
top-level rows.

Reorganise ColumnReader so that ReadValue is the virtual function rather
than ReadSlot, and so that ReadValue is called even if we are not
materializing slots. This lets use add a template argument to
ScalarColumnReader so we can produce a specialized class to handle
non-materialized slots, and so we can avoid checking whether it has a
slot descriptor in ReadRow. This also lets us explicitly specify which
version of NextLevels is being called to guarantee it won't be called
virtually.

This brings the regression down to ~15% down from ~20%.

Change-Id: I494f1c8e139e6d39c0fc93a2ca841f59ff867717
Reviewed-on: http://gerrit.cloudera.org:8080/1088
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:49 -07:00
Matthew Jacobs
851056489d IMPALA-2440: Fix old HJ full outer join with no rows
When a full outer join on the old (non-partitioned)
HashJoinNode, if any join fragment has 0 build rows and 0
probe rows an extra null row will be produced.

Change-Id: I75373edc4f6b3b0c23afba3c1fa363c613f23507
Reviewed-on: http://gerrit.cloudera.org:8080/1068
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:47 -07:00
Juan Yu
7c498627f6 IMPALA-2249: Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()
Due to IMPALA-1619, allocating StringBuffer larger than 1GB could
cause Impala crash. Check the requested buffer size in advance and
fail the request if it is larger than 1GB. Once IMPALA-1619 is
fixed, we should revert this change.

Change-Id: Iffd1e701614b520ce58922ada2400386661eedb1
(cherry picked from commit 74ba16770eeade36ab77c86ed99d9248c60b0131)
Reviewed-on: http://gerrit.cloudera.org:8080/869
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:46 -07:00
Tim Armstrong
2d29aff8cd IMPALA-2378: account bitmap memory usage in NLJ
Count size of row bitmaps towards memory limit in nested loop join.

Rename size_ member of bitmap to num_bits_ to be less ambiguous.

Update comments to clarify when do query maintenance.

Change-Id: I911e3cfd0cc2a6f4f794a2df5bc3c9520708f0a1
Reviewed-on: http://gerrit.cloudera.org:8080/914
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:45 -07:00
Alex Behm
cb713840b7 IMPALA-2434: Always set the eos return value in SubplanNode::GetNext().
The bug was that SubplanNode::GetNext() was not explicitly setting the returned
eos to false if eos had not been reached yet. As a result, a UnionNode with
a SubplanNode as its second operand could return fewer rows than expected becasue
the eos was carried over from the previous union operand, and only a single batch
was returned from the SubplanNode.

The fix is to always set the eos return value in SubplanNode::GetNext().

Change-Id: I9f7516d7b740b9e089ea29b9fe416f3f47314e2c
Reviewed-on: http://gerrit.cloudera.org:8080/1076
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:44 -07:00
Dimitris Tsirogiannis
7d338af638 IMPALA-2369, IMPALA-2435: Impala crashes when the sorter hits an OOM error
This commit fixes the issue where the impalad will crash if the sort
node can't get a new block from the buffered block manager. The fix
removes the DCHECKS after the calls to BufferedBlockMgr::GetNewBlock()
and returns a proper OOM error instead. It also ensures that the callers
of Sorter::Run::Init() don't ignore the returned status (IMPALA-2435).

Change-Id: I611f173fac3add770988e9d4aaa48efc4229fbd6
Reviewed-on: http://gerrit.cloudera.org:8080/976
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:43 -07:00
Tim Armstrong
f5db872fa2 IMPALA-2417: crash when updating counter in DataStreamRecvr
There were several possible races in DataStreamRecvr where the
RuntimeState for the query was torn down while a thread was waiting in
DataStreamRecvr. This is possible when waiting on a condition variable
because the lock is released. After the thread wakes up from the
condition variable, it can crash by updating a counter on a
RuntimeProfile that no longer exists.

To address the problem this patch adds CANCEL_SAFE_SCOPED_TIMER, a
variant of SCOPED_TIMER that checks a boolean variable for cancellation
before updating the RuntimeProfile. This should be used instead of
SCOPED_TIMER anywhere that it is possible that the RuntimeProfile will
be torn down while the timer is active.

Change-Id: Ib4339090d3dfb097e4c160a21b470f00b9c44bbf
Reviewed-on: http://gerrit.cloudera.org:8080/1061
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:41 -07:00
Skye Wanderman-Milne
dbd7a01023 IMPALA-2327: fix Parquet scanner exit conditions
Change-Id: I14969235702f5dd2edeb3feb332e03fbf2444c18
Reviewed-on: http://gerrit.cloudera.org:8080/1043
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:38 -07:00
Skye Wanderman-Milne
7a1336dbfd IMPALA-2376: remove parse_status_ from HdfsParquetScanner
It shadows the parse_status_ defined in HdfsScanner, making it so the
Parquet scanner doesn't pick up a bad parse status set in an
HdfsScanner function.

Change-Id: I1e64f1909eaf1af1b1130dda2380db7157a64c2b
Reviewed-on: http://gerrit.cloudera.org:8080/884
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:33 -07:00
Matthew Jacobs
325916eefe IMPALA-2046: Print hostname along with backend number
When the coordinator prints the 'backend number' of
fragments that are finished or result in an error, the
hostname associated with that backend is also printed.

Change-Id: I0b27549bd9155ab9b077933ab6f621f4f0887371
Reviewed-on: http://gerrit.cloudera.org:8080/912
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:31 -07:00
Sailesh Mukil
b0bfe41046 IMPALA-2252: Crash (likely race) tearing down BufferedBlockMgr on query failure
When running certain heavy workloads it was observed that a
'RuntimeState' member variable was being accessed in WriteComplete()
on a RequestContext::Cancel() path after that RuntimeState object was
destroyed. This is a temporary fix to ensure that the destroyed member
variable is not accessed if the write status is cancelled.

A test is not included as it is not deterministically reproducible.

Change-Id: I8a55c070d25f0ca5c830a955e84df450061753a3
Reviewed-on: http://gerrit.cloudera.org:8080/897
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:30 -07:00
Skye Wanderman-Milne
24d63dd3b2 Fix reporting of custom OOM error messages.
Before this patch, a common pattern was:
  Status status = Status::MemLimitExceeded();
  status.SetErrorMsg(<custom error msg>);
  state_->SetMemLimitExceeded();
  return status;

This could cause the custom error message to be dropped, since
RuntimeState::SetMemLimitExceeded() sets query_status_ to the generic
"Memory limit exceeded" error, which then prevents query_status_ from
being set to the custom error status. (The custom error message is
often logged in the runtime state, but not always.)

This patch has RuntimeState::SetMemLimitExceeded() take an optional
ErrorMsg argument which is used to construct the new query status, and
changes existing uses of SetErrorMsg() + SetMemLimitExceeded() to use
this new functionality.

Change-Id: I9fe20da0bcc2cf01f2fd1fe29ae32c1a00708da1
Reviewed-on: http://gerrit.cloudera.org:8080/885
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2015-09-27 15:13:29 -07:00
Alex Behm
e00130e39a IMPALA-2368: Prevent double Reset() with nested subplans.
ExecNode::Reset() is not idempotent. It always requires a preceding
call to ExecNode::Open(). The bug was that with nested subplans the
following situation could lead to calling Reset() on the same exec
node multiple times resulting in a DCHECK or crash depending on node
being reset. The scenario is illustrated as follows.

Example Plan:

01:subplan
  05:nested-loop join
     02:singular row src
  04:subplan
     ...
     arbitrary exec node tree
     ...
  03:unnest
00:scan

Sequence of calls leading to double Reset():

1. Node 04 calls Reset() on its child(1)
2. Node 04 returns from GetNext() without calling Open() on its child(1)
3. Node 01 calls Reset() on its child(1)

The problem was that SubplanNode::Reset() of node 04 used to call Reset()
on its child(1) even though child(1) had not been opened yet.

The fix is to only call Reset() on child(1) if it has been opened.

Change-Id: I1d6f5f893c9b773c05dc6cbfe99de9f74e47a888
Reviewed-on: http://gerrit.cloudera.org:8080/916
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:28 -07:00
Alex Behm
14aef7f6a7 IMPALA-2357: Fix spilling sorts with var-len slots that are NULL or empty.
The bug: Several places in the sort assumed that the presence of var-len
slots in the tuple to be sorted implied having var-len blocks.
However, if all var-len slots are NULL or empty, then we can have no
var-len blocks. This case is now properly handled.

Change-Id: Ia3ad3669313e9d494ce2472af7775febfa6f247c
Reviewed-on: http://gerrit.cloudera.org:8080/913
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:27 -07:00
Tim Armstrong
d37bf390a8 IMPALA-2406: avoid rows with no tuples
In some cases the planner generated plans with rows with no
materialized tuples. Recent changes to the backend caused these to
hit a DCHECK. This patch addresses one case in the planner where it
was possible to create such plans: when the planner generated an
empty node from a select subquery with no from clause. The fix is to
create a materialized tuple based on the select list expressions, in
the same way as we handle these selects when the planner cannot
statically determine they have no result rows.

An example query is included as a test.

It also adds additional checks to the frontend and backend to catch
these invalid rows earlier.

Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4
Reviewed-on: http://gerrit.cloudera.org:8080/911
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:25 -07:00
Alex Behm
7122f5a58f Nested Types: Simple BE performance improvements.
This patch avoids a few unecessary memory allocations, locks and atomics
that can become a bottleneck when executing subplans.

On the following benchmark the end-to-end runtime of a subplan-heavy
query was improved from 37s to 27s.

Benchmark:
select count(*) from huge_customer c, c.c_orders o, o.o_lineitems

The huge_customer table had 48 files totalling 6.5GB. The table was created
by copying the files of tpch_nested_parquet.customer several times into the
huge_customer table. I ran the benchmark with a single impalad.

There are still several easy opportunities for improving the performance
of subplan execution.

Change-Id: I9fce1c2857a8f8e6ed3f1b4842d07fd80c11296a
Reviewed-on: http://gerrit.cloudera.org:8080/894
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:58 -07:00
Matthew Jacobs
59df1b83e3 IMPALA-2245: Fix for client->num_tmp_reserved_buffers_ accounting
Fixes some logic in the buffered-block-mgr that we believe
is wrong, and leads to the following DCHECK failing:

  Check failed: client->num_tmp_reserved_buffers_ == 0
  (buffered-block-mgr.cc:259)

It may be possible for other issues to exist in the
accounting for num_tmp_reserved_buffers_, but this fix seems
to work as evidenced by the stress tests running without
hitting this DCHECK.

Change-Id: Ic6415afc6722461dc57f763a46e3abbc3aa09af6
Reviewed-on: http://gerrit.cloudera.org:8080/880
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:55 -07:00
Matthew Jacobs
b8d15adcaf IMPALA-2375: Legacy agg does not handle parent Prepare failure
When the legacy agg node's parent fails to Prepare, the
agg Close needs to check before touching memory that would
be allocated by its Prepare but didn't get allocated
because Prepare returned early when the parent Prepare
failed.

test_failpoints tests this case

Change-Id: I660e9653958bcfe1f44c20275abe35235de94329
Reviewed-on: http://gerrit.cloudera.org:8080/888
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:54 -07:00
Martin Grund
579be1c542 IMPALA-2284: Disallow long (1<<30) strings in group_concat()
This is the first step to fix issues with large memory allocations. In
this patch, the built-in `group_concat` is no longer allowed to allocate
arbitraryly large strings and crash impala, but is limited to the upper
bound of possible allocations in Impala.

This patch does not perform any functional change, but rather avoids
unnecessary crashes. However, it changes the parameter type of
FindChunk() in MemPool to be a signed 64bit integer. This change allows
the mempool to allocate internally memory of more than one 1GB, but the
public interface of Allocate() is not changed, so the general limitation
remains. The reason for this change is as follows:

  1) In a UDF FunctionContext::Reallocate() would allocate slightly more
  than 512MB from the FreePool.
  2) The free pool tries to double this size to alloocate 1GB from the
  MemPool.
  3) The MemPool doubles the size again and overflows the signed 32bit
  integer in the FindChunk() method. This will then only allocate 1GB
  instead of the expected 2GB.

What happens is that one of the callers expected a larger allocation
than actually happened, which will in turn lead to memory corruption as
soon as the memory is accessed.

Change-Id: I068835dfa0ac8f7538253d9fa5cfc3fb9d352f6a
Reviewed-on: http://gerrit.cloudera.org:8080/858
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2015-09-23 15:15:55 -07:00
Tim Armstrong
474026e785 Nested types: reduce query maintenance for small arrays
This patch makes a couple of adjustments to reduce query maintenance when
processing small arrays in subplans.

In UnnestNode, we skip query maintenance on the first call to GetNext().

In NestedLoopJoinNode, the query maintenance check was previously always
checked if the iteration was a multiple of the row batch size, i.e. did
maintenance on iterations 0, 1024, 2048, etc. This patch adjusts it so
that it performance query maintenance on iterations 1023, 2047, etc
instead, which saves some overhead for subplans processing small numbers
of rows.

Change-Id: I210bfbbddb83580aa784fc327927d08333b1b5ed
Reviewed-on: http://gerrit.cloudera.org:8080/901
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:54 -07:00
Tim Armstrong
8bdc7a2a33 IMPALA-2381: Include tuple pointers in mem limit
A recent change that allocated a row batch's tuple pointers with malloc
rather than a MemPool meant that the tuple pointers were no longer
counted towards query or process memory limits. For some workload the
number of row batches and therefore the aggregate size of the tuple
pointers can be significant, so it is important to correctly account for
the memory.

This change simply pairs malloc() and free() calls with matching calls
to MemTracker::Consume() and MemTracker::Release().

Change-Id: I7fb5f2844ad0d51f71a3701d8f0897d8f0b24e18
Reviewed-on: http://gerrit.cloudera.org:8080/895
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:53 -07:00
Tim Armstrong
f61b86f060 Update and reenable some scratch file tests.
These tests were disabled because they relied on blacklisting, which
was disabled. It is still useful to have tests to exercise the error
handling code and ensure that scratch directories are used or not
used when they should be.

Change-Id: I89195a9fdd7ed858ae2addce93413871cffaf29b
Reviewed-on: http://gerrit.cloudera.org:8080/846
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:52 -07:00
Tim Armstrong
549b01c233 IMPALA-2378: check memory limit in DeepCopyBuildBatches()
NestedLoopJoinNode::DeepCopyBuildBatches() previously copied a
potentially unbounded number of row batches without checking memory
limits or doing other query maintenance. This fix does the usual query
maintenance once per copied batch. This will detect any memory limit
overruns before we copy the next batch.

Change-Id: Iab2e4e639a8a840f872d3283a186f49e68db89c9
Reviewed-on: http://gerrit.cloudera.org:8080/896
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:51 -07:00
Ippokratis Pandis
48699de6e3 IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce min mem needed for PAGG/PHJ
PAGG and PHJ were using an all-or-nothing approach wrt spilling. In
particular, they were trying to switch to IO-sized buffers for both
streams (aggregated and unaggregated in PAGG; build and probe in PHJ)
of every partition (currently 16 partitions for a total of 32
streams), even if some of the streams had very few rows, they were
empty or simply they would not spill so there was no need to allocate
IO-buffers for them. That was increasing the min mem needed by those
operators in many queries.

This patch decouples the decision to switch to IO-buffers for each
stream of each partition. Streams will switch to IO-sized buffers
whenever the rows they contain do not fit in the first two small
buffers (64KB and 512KB respectively). When we decide to spill a
partition, we switch to IO buffers both streams.

With these change many streams of PAGG and PHJ nodes do not need to
use IO-sized buffers, reducing the min mem requirement. For example,
below is the min mem needed (in MBs) for some of the TPC-H queries.
Some need half or less mem from the mem they needed before:

  TPC-H Q3: 645 -> 240
  TPC-H Q5: 375 -> 245
  TPC-H Q7: 685 -> 265
  TPC-H Q8: 740 -> 250
  TPC-H Q9: 650 -> 400
  TPC-H Q18: 1100 -> 425
  TPC-H Q20: 420 -> 250
  TPC-H Q21: 975 -> 620

To make this small buffer optimization to work, we had to fix
IMPALA-2352. That is, the AllocateRow() call of
PAGG::ConstructIntermediateTuple() could return unsuccessfully just
because the small buffers of the stream were exhausted. In that case,
previously we would treat it as an indication that there is no memory
left, start spilling a partition and switching all stream to
IO-buffes. Now we make a best effort, trying to first
SwitchToIoffers() and if that is successful, we re-attempt the
AllocateRow() call. See IMPALA-2352 for more details.

Another change is that now SwitchToIoBuffers() will reset the flag
using_small_buffers_ back to false, in case we are in a very low
memory situation and it fails to get a buffer. That allows us to
retry calling SwitchToIoBuffers() once we free up some space. See
IMPALA-2330 for more details.

With the above fixes we should also have fixed IMPALA-2241 and
IMPALA-2271 that are essentially stream::using_small_buffers_-related
DCHECKs.

This patch adds all 22 TPC-H queries in test_mem_usage_scaling test
and updates the per-query min mem limits in it. Additionally, it adds
a new aggregation test that uses the TPC-H dataset for larger
aggregations (TestTPCHAggregationQueries). It also removes some
dead test code.

Change-Id: Ia8ccd0b76f6d37562be21fd4539aedbc2a864d38
Reviewed-on: http://gerrit.cloudera.org:8080/818
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins

Conflicts:

	tests/query_test/test_aggregation.py
2015-09-23 11:07:42 -07:00
Sailesh Mukil
04eecf1f08 IMPALA-2158: DCHECK failure in AnalyticEvalNode::GetNext() expects memory to already be transferred.
A set of DCHECKs in GetNext() wrongly expected memory to already be transferred. In cases
of queries with limits, the resources most likely will not be transferred. This patch
removes the DCHECKS and transfers the memory at that point if needed.

Change-Id: Ie95ade79db125e0e63c2e45d1b27649b1874d510
Reviewed-on: http://gerrit.cloudera.org:8080/837
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 10:38:59 -07:00
Ippokratis Pandis
4d5ee2b3a2 IMPALA-2364: Wrong DCHECK in PHJ::ProcessProbeBatch
There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that
the state of PHJ was PROCESSING_PROBE. It looks like we can hit the
same dcheck when we are in REPARTITIONING phase.
This patch fixes this dcheck. It also adds tpc-ds q53 in the
test_mem_usage_scaling test (along with the needed refactoring in this
test) because tpc-ds q53 hit this dcheck in an endurance test.

Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28
Reviewed-on: http://gerrit.cloudera.org:8080/893
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 10:38:58 -07:00
Tim Armstrong
0e1161b161 IMPALA-2366: check fread return code correctly
DiskIoMgr did not correctly check the return code of fread. As a result,
if fread() return an error, DiskIoMgr would think it had succeeded and
filled the buffer with valid data, whereas in reality the buffer would
be full of garbage data and some error occurred.

This change will detect and report errors correctly.

Added a test for the eof case that failed before the fix and now passed.
The error case is difficult to test without modifying DiskIoMgr or
injecting faults at the filesystem level.

Change-Id: Ic4822183a9cd228da670b5fd130e34ca875e8c80
Reviewed-on: http://gerrit.cloudera.org:8080/882
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
68bd90dbc2 IMPALA-2350: malformed tuples in buffered-tuple-stream-test
The buffered tuple stream tests wrote out the memory for integer tuples
as a contiguous array of 4-byte integers, neglecting null indicators.
The same bad assumption was present in both the read and write code,
so the test passed in many circumstances. However, buffer overruns were
sometimes detected in the ASAN build. This bug was present for a long
time but a recent change made the ASAN build fail consistently.
IMPALA-1688, an unexplained failure in buffered-tuple-stream-test may
have the same cause.

This fix also changes the integers written out so that they are not all
small positive integers with many zero bytes.

Change-Id: I4a158751e8a9c934c912831032e83ec85056f06e
Reviewed-on: http://gerrit.cloudera.org:8080/876
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
2d0ed379a2 Add a TODO about duplicate tuples for RowBatch::DeepCopyTo
RowBatch::DeepCopyTo doesn't deduplicate tuples when copying. If not
used carefully, this could lead to us producing oversized row batches
(similar to the problem solved by dedup in RowBatch::Serialize).

Currently this could only occur if the build-side child of a nested loop
join both produces batches with many duplicate tuples and sets
MarkNeedToReturn.

Also fix error in comment.

Change-Id: Ib59c92f42ee4d491c9a9d55e0b125af90f2b1d48
Reviewed-on: http://gerrit.cloudera.org:8080/874
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Ippokratis Pandis
b6db42e678 IMPALA-2314: LargestSpilledPartition was not checking if partition is closed
In both PHJ and PAGG we call LargestSpilledPartition() in order to
figure out if there was any reduction in the size of the partitions
after a repartition. This code had a mistake as it was not checking
first whether a partition has been closed, which could happen if, for
example, a partition ended up empty after repartitioning. The result
of that it was SEGV as we were trying to reference a stream that had
already been set to NULL.

Change-Id: Ia369b0b3ad19c62da05e7ed1bcc4d28ebbe8c9df
Reviewed-on: http://gerrit.cloudera.org:8080/875
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Dan Hecht
f732fe2cdf Log backtrace on all MEM_LIMIT_EXCEEDED errors
Normally, error Status logs backtrace, but this doesn't happen for
MEM_LIMIT_EXCEEDED because these were copied from a global status.
Instead, construct them on the fly so that we get the normal Status
backtrace logging.  Also, remove some special case backtrace logging
that is now redundant from buffered-block-mgr.cc.

Primary motivation for this is to help debug IMPALA-2327, where it
appears a MEM_LIMIT_EXCEEDED status is dropped.  But I think this will
be generally useful for debugging problems that happen after
MEM_LIMIT_EXCEEDED.

Testing: Run test_mem_scaling.py and see all "Memory limit exceeded"
         now have backtraces.

Change-Id: I4cd04426e63397c24d3e16faa33caafc2a608c0c
Reviewed-on: http://gerrit.cloudera.org:8080/872
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Sailesh Mukil
0d46129458 IMPALA-1746: QueryExecState doesn't check for query cancellation or errors
QueryExecState::FetchRowsInternal() doesn't check the query state after evaluating the
select statement expressions with GetRowValue(). These means that, e.g., UDFs that call
SetError() in the select list will not fail the query.

Change-Id: I120d7abbee2a3ed5c5c66ec0a3a9b6e9a6ab10bf
Reviewed-on: http://gerrit.cloudera.org:8080/815
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Alex Behm
1c528492d3 Nested Types: Fix projection of collection-typed slots.
There was a bug with projecting collection-typed slots in the UnnestNode by setting
them to NULL. The problem was that the same tuple/slot could be referenced by multiple
input rows. As a result, all unnests after the first that operate on the same collection
value would incorrectly return an empty row batch because the slot had been set to NULL
by the first unnesting.

The fix is to ignore the null bit when retrieving a collection-typed slot's value in
the UnnestNode. We still set the null bit after retrieving the value for projection.
This solution purposely ignores the conventional NULL semantics of slots. It is a
temporary hack which must be removed eventually.

We rely on the producer of collection-typed slot values (scan node) to write an empty
array value into such slots when the they are NULL in addition to setting the null bit.

Change-Id: Ie6dc671b3d031f1dfe4d95090b1b6987c2c974da
Reviewed-on: http://gerrit.cloudera.org:8080/859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Alex Behm
0c90bf7ef5 IMPALA-2340: Fix NOT IN subquery planning and execution with nested types.
Fixes:
1. Change the planner to not invert null-aware anti join because there is
   only a left version. Also, always use a hash join because the
   nested-loop join does not support that join mode.
2. Fix PartitionedJoinNode::Reset() and related calls to make the join
   usable in subplans with the left null-aware anti join mode.

Change-Id: I8da50747f6a0412c5858fd32b9498f58ed779712
Reviewed-on: http://gerrit.cloudera.org:8080/847
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
db7519df24 IMPALA-2207: memory corruption on build side of NLJ
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.

This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.

Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.

Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Ippokratis Pandis
1da025fb0a Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails
This patch provides a temporary work-around IMPALA-2344. In case
block->Pin() fails in Sorter::Run::GetNext() we will fail the query
with OOM instead of DCHECK'ing. This work-around is similar to those
we did for IMPALA-1590 and IMPALA-1868. When we rework the buffer
management those failures should not happen.

This patch also has some minor formatting changes. More importantly
it adds a DCHECK in BufferedBlockMgr::FindBufferForBlock() that
ensures we do not call PinBuffer() in an already pinned block.

Change-Id: I5a43302b807972e39f4f6c98ec5ecb1eee5fe056
Reviewed-on: http://gerrit.cloudera.org:8080/861
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Alex Behm
ef29c976df IMPALA-2320: Use a separate MemPool for the FunctionContexts in AnalyticEvalNode.
The bug:
There was a MemPool in AnalyticEvalNode with a dual purpose:
(1) Allocate temporary tuples.
(2) Back the FunctionContexts of the aggregate function evaluators.
FunctionContexts use FreePools to do their own memory management using a
pointer-based structure that is stored in the memory blocks themselves.
When calling AnalyticEvalNode::Reset() we reset that mem pool backing
that pointer-based structure. Those pointers were then clobbered by
subsequent allocations (and writes) for temporary tuples, ultimately
resulting in the FreePool incorrectly reporting a double free
while doing a Finalize() of an aggregate function.

The fix:
While there are several other ways to address this issue, I chose to
use a different MemPool for the FunctionContexts because that seemed
to be the most sane and minimally invasive fix. That MemPool is not
reset during AnalyticEvalNode::Reset() because the memory is
ultimately managed by the FreePools of the FunctionContexts.

Change-Id: I42fd60785d3c6dec93436cd9ca64de58d1b15c7e
Reviewed-on: http://gerrit.cloudera.org:8080/857
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Alex Behm
057b0b7dba IMPALA-2322: Set new pointer for ArrayValue in Tuple::DeepCopyVarlenData().
The bug was a simple oversight where copied the array data, but forgot
to update the pointer of the corresponding ArrayValue.

Change-Id: Ib6ec0380f66194efc7ea3eb989535652eb8b526f
Reviewed-on: http://gerrit.cloudera.org:8080/855
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Tim Armstrong
c5b9b7a97d Added DCHECKS for non-null array values in sorter
The sorter does not currently support sorting tuples with collection
slots because the necessary deep copy logic is not implemented.
Fortunately, projection should ensure that all array values that reach
the sorter have been set to null. This patch adds DCHECKs to ensure
that this is the case. Variables are also renamed to reflect that with
nested types string values are a subset of variable-length values.

Change-Id: If617abe678903c69d12d1c65062c8063ae137296
Reviewed-on: http://gerrit.cloudera.org:8080/844
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-09-22 10:58:32 -07:00
Tim Armstrong
9ebe92c4f9 IMPALA-2299: dedup zero-length tuples correctly
The dedup logic in row batch serialisation incorrectly assumed that two
distinct tuples must have two distinct memory addresses. This is not
true if one tuple has zero length.

Update the serialisation logic to check for this case and insert a
NULL.

Adds a unit test that exercises this bug prior to the fix and a query
test that also hit a DCHECK prior to the fix.

Change-Id: If163274b3a6c10f8ac6b6bc90eee9ec95830b7dd
Reviewed-on: http://gerrit.cloudera.org:8080/849
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:31 -07:00
Huaisi Xu
0d45555a74 IMPALA-1800: Fixed PrettyPrinter::Print(double, TUnit::TIME_NS) output error
Previously PrettyPrinter::Print(double_val, TUnit::TIME_NS) results in output with
several decimal points: i.e. PrettyPrinter::Print(1234567.890, TUnit::TIME_NS)
produces 1.234568.234.567890ms. After this simple fix it returns 1.234568ms.

Change-Id: Ic34f49c71b79057651619da00dee5948046da1a2
Reviewed-on: http://gerrit.cloudera.org:8080/891
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 03:29:50 +00:00