Commit Graph

808 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
68fef6a5bf IMPALA-2213: make Parquet scanner fail query if the file size metadata is stale
This patch changes the Parquet scanner to check if it can't read the
full footer scan range, indicating that file has been overwritten by a
shorter file without refreshing the table metadata. Before it would
DCHECK. This patch adds a test for this case, as well as the case
where the new file is longer than the metadata states (which fails
with an existing error).

Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07
Reviewed-on: http://gerrit.cloudera.org:8080/1084
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-10-01 13:58:39 -07:00
Skye Wanderman-Milne
eb9db01092 IMPALA-2377: fix error case in ArrayValueBuilder::GetFreeMemory()
We weren't actually returning and would try to allocate an array
larger than INT_MAX. I tested by hand that a very large array (200M
elements) would fail to be allocated and the appropriate error is
returned to the shell, as well as adding an ArrayValueBuilder unit test.

Change-Id: Iedc3b3ca8c9c07100d6602f8e8cc9cfd57151747
Reviewed-on: http://gerrit.cloudera.org:8080/886
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-10-01 13:58:35 -07:00
Skye Wanderman-Milne
4d68fcc87e Fix MemPool allocations of INT_MAX.
Since MemPool::Allocate() takes an int, INT_MAX is the largest
possible requested allocation. This and slightly lower values would
cause the 'num_bytes' variable in the private Allocate() function
overflow, which would yield a valid pointer to a buffer too small to
hold the requested number of bytes. This patch fixes this problem by
making 'num_bytes' an int64_t, and adds a test for this behavior.

This may help some problems related to IMPALA-1619, although it's not
quite the same since MemPool was never limited to 1GB, only FreePool.

Change-Id: Ia8040d87d3be32896944b5d44ff0c1f2667837e0
Reviewed-on: http://gerrit.cloudera.org:8080/1107
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:54 -07:00
Matthew Jacobs
70b9954593 IMPALA-2189: [RM] Retry logic for Llama RPC may throw exception
The code in resource-broker.cc that makes RPCs to Llama will
attempt to retry the RPC some number of times (which is
configurable) if the RPC returns a failure. If the RPC
throws (which thrift may do), we try to reset the connection
and then make the RPC again, but this time not guarded by a
try/catch block. If this RPC throws, the process will crash.

This fixes the issue by removing the try/catch and instead
using the ClientCache DoRpc function which handles this
already. Some additional Llama RPC calling wrappers were
removed as well.

Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a
Reviewed-on: http://gerrit.cloudera.org:8080/881
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:52 -07:00
Juan Yu
7c498627f6 IMPALA-2249: Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()
Due to IMPALA-1619, allocating StringBuffer larger than 1GB could
cause Impala crash. Check the requested buffer size in advance and
fail the request if it is larger than 1GB. Once IMPALA-1619 is
fixed, we should revert this change.

Change-Id: Iffd1e701614b520ce58922ada2400386661eedb1
(cherry picked from commit 74ba16770eeade36ab77c86ed99d9248c60b0131)
Reviewed-on: http://gerrit.cloudera.org:8080/869
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:46 -07:00
Dimitris Tsirogiannis
7d338af638 IMPALA-2369, IMPALA-2435: Impala crashes when the sorter hits an OOM error
This commit fixes the issue where the impalad will crash if the sort
node can't get a new block from the buffered block manager. The fix
removes the DCHECKS after the calls to BufferedBlockMgr::GetNewBlock()
and returns a proper OOM error instead. It also ensures that the callers
of Sorter::Run::Init() don't ignore the returned status (IMPALA-2435).

Change-Id: I611f173fac3add770988e9d4aaa48efc4229fbd6
Reviewed-on: http://gerrit.cloudera.org:8080/976
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:43 -07:00
Tim Armstrong
f5db872fa2 IMPALA-2417: crash when updating counter in DataStreamRecvr
There were several possible races in DataStreamRecvr where the
RuntimeState for the query was torn down while a thread was waiting in
DataStreamRecvr. This is possible when waiting on a condition variable
because the lock is released. After the thread wakes up from the
condition variable, it can crash by updating a counter on a
RuntimeProfile that no longer exists.

To address the problem this patch adds CANCEL_SAFE_SCOPED_TIMER, a
variant of SCOPED_TIMER that checks a boolean variable for cancellation
before updating the RuntimeProfile. This should be used instead of
SCOPED_TIMER anywhere that it is possible that the RuntimeProfile will
be torn down while the timer is active.

Change-Id: Ib4339090d3dfb097e4c160a21b470f00b9c44bbf
Reviewed-on: http://gerrit.cloudera.org:8080/1061
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-30 17:17:41 -07:00
Matthew Jacobs
325916eefe IMPALA-2046: Print hostname along with backend number
When the coordinator prints the 'backend number' of
fragments that are finished or result in an error, the
hostname associated with that backend is also printed.

Change-Id: I0b27549bd9155ab9b077933ab6f621f4f0887371
Reviewed-on: http://gerrit.cloudera.org:8080/912
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:31 -07:00
Sailesh Mukil
b0bfe41046 IMPALA-2252: Crash (likely race) tearing down BufferedBlockMgr on query failure
When running certain heavy workloads it was observed that a
'RuntimeState' member variable was being accessed in WriteComplete()
on a RequestContext::Cancel() path after that RuntimeState object was
destroyed. This is a temporary fix to ensure that the destroyed member
variable is not accessed if the write status is cancelled.

A test is not included as it is not deterministically reproducible.

Change-Id: I8a55c070d25f0ca5c830a955e84df450061753a3
Reviewed-on: http://gerrit.cloudera.org:8080/897
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:30 -07:00
Skye Wanderman-Milne
24d63dd3b2 Fix reporting of custom OOM error messages.
Before this patch, a common pattern was:
  Status status = Status::MemLimitExceeded();
  status.SetErrorMsg(<custom error msg>);
  state_->SetMemLimitExceeded();
  return status;

This could cause the custom error message to be dropped, since
RuntimeState::SetMemLimitExceeded() sets query_status_ to the generic
"Memory limit exceeded" error, which then prevents query_status_ from
being set to the custom error status. (The custom error message is
often logged in the runtime state, but not always.)

This patch has RuntimeState::SetMemLimitExceeded() take an optional
ErrorMsg argument which is used to construct the new query status, and
changes existing uses of SetErrorMsg() + SetMemLimitExceeded() to use
this new functionality.

Change-Id: I9fe20da0bcc2cf01f2fd1fe29ae32c1a00708da1
Reviewed-on: http://gerrit.cloudera.org:8080/885
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2015-09-27 15:13:29 -07:00
Alex Behm
14aef7f6a7 IMPALA-2357: Fix spilling sorts with var-len slots that are NULL or empty.
The bug: Several places in the sort assumed that the presence of var-len
slots in the tuple to be sorted implied having var-len blocks.
However, if all var-len slots are NULL or empty, then we can have no
var-len blocks. This case is now properly handled.

Change-Id: Ia3ad3669313e9d494ce2472af7775febfa6f247c
Reviewed-on: http://gerrit.cloudera.org:8080/913
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:27 -07:00
Tim Armstrong
d37bf390a8 IMPALA-2406: avoid rows with no tuples
In some cases the planner generated plans with rows with no
materialized tuples. Recent changes to the backend caused these to
hit a DCHECK. This patch addresses one case in the planner where it
was possible to create such plans: when the planner generated an
empty node from a select subquery with no from clause. The fix is to
create a materialized tuple based on the select list expressions, in
the same way as we handle these selects when the planner cannot
statically determine they have no result rows.

An example query is included as a test.

It also adds additional checks to the frontend and backend to catch
these invalid rows earlier.

Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4
Reviewed-on: http://gerrit.cloudera.org:8080/911
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-27 15:13:25 -07:00
Alex Behm
7122f5a58f Nested Types: Simple BE performance improvements.
This patch avoids a few unecessary memory allocations, locks and atomics
that can become a bottleneck when executing subplans.

On the following benchmark the end-to-end runtime of a subplan-heavy
query was improved from 37s to 27s.

Benchmark:
select count(*) from huge_customer c, c.c_orders o, o.o_lineitems

The huge_customer table had 48 files totalling 6.5GB. The table was created
by copying the files of tpch_nested_parquet.customer several times into the
huge_customer table. I ran the benchmark with a single impalad.

There are still several easy opportunities for improving the performance
of subplan execution.

Change-Id: I9fce1c2857a8f8e6ed3f1b4842d07fd80c11296a
Reviewed-on: http://gerrit.cloudera.org:8080/894
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:58 -07:00
Matthew Jacobs
59df1b83e3 IMPALA-2245: Fix for client->num_tmp_reserved_buffers_ accounting
Fixes some logic in the buffered-block-mgr that we believe
is wrong, and leads to the following DCHECK failing:

  Check failed: client->num_tmp_reserved_buffers_ == 0
  (buffered-block-mgr.cc:259)

It may be possible for other issues to exist in the
accounting for num_tmp_reserved_buffers_, but this fix seems
to work as evidenced by the stress tests running without
hitting this DCHECK.

Change-Id: Ic6415afc6722461dc57f763a46e3abbc3aa09af6
Reviewed-on: http://gerrit.cloudera.org:8080/880
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-09-24 10:58:55 -07:00
Martin Grund
579be1c542 IMPALA-2284: Disallow long (1<<30) strings in group_concat()
This is the first step to fix issues with large memory allocations. In
this patch, the built-in `group_concat` is no longer allowed to allocate
arbitraryly large strings and crash impala, but is limited to the upper
bound of possible allocations in Impala.

This patch does not perform any functional change, but rather avoids
unnecessary crashes. However, it changes the parameter type of
FindChunk() in MemPool to be a signed 64bit integer. This change allows
the mempool to allocate internally memory of more than one 1GB, but the
public interface of Allocate() is not changed, so the general limitation
remains. The reason for this change is as follows:

  1) In a UDF FunctionContext::Reallocate() would allocate slightly more
  than 512MB from the FreePool.
  2) The free pool tries to double this size to alloocate 1GB from the
  MemPool.
  3) The MemPool doubles the size again and overflows the signed 32bit
  integer in the FindChunk() method. This will then only allocate 1GB
  instead of the expected 2GB.

What happens is that one of the callers expected a larger allocation
than actually happened, which will in turn lead to memory corruption as
soon as the memory is accessed.

Change-Id: I068835dfa0ac8f7538253d9fa5cfc3fb9d352f6a
Reviewed-on: http://gerrit.cloudera.org:8080/858
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2015-09-23 15:15:55 -07:00
Tim Armstrong
8bdc7a2a33 IMPALA-2381: Include tuple pointers in mem limit
A recent change that allocated a row batch's tuple pointers with malloc
rather than a MemPool meant that the tuple pointers were no longer
counted towards query or process memory limits. For some workload the
number of row batches and therefore the aggregate size of the tuple
pointers can be significant, so it is important to correctly account for
the memory.

This change simply pairs malloc() and free() calls with matching calls
to MemTracker::Consume() and MemTracker::Release().

Change-Id: I7fb5f2844ad0d51f71a3701d8f0897d8f0b24e18
Reviewed-on: http://gerrit.cloudera.org:8080/895
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:53 -07:00
Tim Armstrong
f61b86f060 Update and reenable some scratch file tests.
These tests were disabled because they relied on blacklisting, which
was disabled. It is still useful to have tests to exercise the error
handling code and ensure that scratch directories are used or not
used when they should be.

Change-Id: I89195a9fdd7ed858ae2addce93413871cffaf29b
Reviewed-on: http://gerrit.cloudera.org:8080/846
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 15:15:52 -07:00
Ippokratis Pandis
48699de6e3 IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce min mem needed for PAGG/PHJ
PAGG and PHJ were using an all-or-nothing approach wrt spilling. In
particular, they were trying to switch to IO-sized buffers for both
streams (aggregated and unaggregated in PAGG; build and probe in PHJ)
of every partition (currently 16 partitions for a total of 32
streams), even if some of the streams had very few rows, they were
empty or simply they would not spill so there was no need to allocate
IO-buffers for them. That was increasing the min mem needed by those
operators in many queries.

This patch decouples the decision to switch to IO-buffers for each
stream of each partition. Streams will switch to IO-sized buffers
whenever the rows they contain do not fit in the first two small
buffers (64KB and 512KB respectively). When we decide to spill a
partition, we switch to IO buffers both streams.

With these change many streams of PAGG and PHJ nodes do not need to
use IO-sized buffers, reducing the min mem requirement. For example,
below is the min mem needed (in MBs) for some of the TPC-H queries.
Some need half or less mem from the mem they needed before:

  TPC-H Q3: 645 -> 240
  TPC-H Q5: 375 -> 245
  TPC-H Q7: 685 -> 265
  TPC-H Q8: 740 -> 250
  TPC-H Q9: 650 -> 400
  TPC-H Q18: 1100 -> 425
  TPC-H Q20: 420 -> 250
  TPC-H Q21: 975 -> 620

To make this small buffer optimization to work, we had to fix
IMPALA-2352. That is, the AllocateRow() call of
PAGG::ConstructIntermediateTuple() could return unsuccessfully just
because the small buffers of the stream were exhausted. In that case,
previously we would treat it as an indication that there is no memory
left, start spilling a partition and switching all stream to
IO-buffes. Now we make a best effort, trying to first
SwitchToIoffers() and if that is successful, we re-attempt the
AllocateRow() call. See IMPALA-2352 for more details.

Another change is that now SwitchToIoBuffers() will reset the flag
using_small_buffers_ back to false, in case we are in a very low
memory situation and it fails to get a buffer. That allows us to
retry calling SwitchToIoBuffers() once we free up some space. See
IMPALA-2330 for more details.

With the above fixes we should also have fixed IMPALA-2241 and
IMPALA-2271 that are essentially stream::using_small_buffers_-related
DCHECKs.

This patch adds all 22 TPC-H queries in test_mem_usage_scaling test
and updates the per-query min mem limits in it. Additionally, it adds
a new aggregation test that uses the TPC-H dataset for larger
aggregations (TestTPCHAggregationQueries). It also removes some
dead test code.

Change-Id: Ia8ccd0b76f6d37562be21fd4539aedbc2a864d38
Reviewed-on: http://gerrit.cloudera.org:8080/818
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins

Conflicts:

	tests/query_test/test_aggregation.py
2015-09-23 11:07:42 -07:00
Tim Armstrong
0e1161b161 IMPALA-2366: check fread return code correctly
DiskIoMgr did not correctly check the return code of fread. As a result,
if fread() return an error, DiskIoMgr would think it had succeeded and
filled the buffer with valid data, whereas in reality the buffer would
be full of garbage data and some error occurred.

This change will detect and report errors correctly.

Added a test for the eof case that failed before the fix and now passed.
The error case is difficult to test without modifying DiskIoMgr or
injecting faults at the filesystem level.

Change-Id: Ic4822183a9cd228da670b5fd130e34ca875e8c80
Reviewed-on: http://gerrit.cloudera.org:8080/882
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
68bd90dbc2 IMPALA-2350: malformed tuples in buffered-tuple-stream-test
The buffered tuple stream tests wrote out the memory for integer tuples
as a contiguous array of 4-byte integers, neglecting null indicators.
The same bad assumption was present in both the read and write code,
so the test passed in many circumstances. However, buffer overruns were
sometimes detected in the ASAN build. This bug was present for a long
time but a recent change made the ASAN build fail consistently.
IMPALA-1688, an unexplained failure in buffered-tuple-stream-test may
have the same cause.

This fix also changes the integers written out so that they are not all
small positive integers with many zero bytes.

Change-Id: I4a158751e8a9c934c912831032e83ec85056f06e
Reviewed-on: http://gerrit.cloudera.org:8080/876
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
2d0ed379a2 Add a TODO about duplicate tuples for RowBatch::DeepCopyTo
RowBatch::DeepCopyTo doesn't deduplicate tuples when copying. If not
used carefully, this could lead to us producing oversized row batches
(similar to the problem solved by dedup in RowBatch::Serialize).

Currently this could only occur if the build-side child of a nested loop
join both produces batches with many duplicate tuples and sets
MarkNeedToReturn.

Also fix error in comment.

Change-Id: Ib59c92f42ee4d491c9a9d55e0b125af90f2b1d48
Reviewed-on: http://gerrit.cloudera.org:8080/874
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Dan Hecht
f732fe2cdf Log backtrace on all MEM_LIMIT_EXCEEDED errors
Normally, error Status logs backtrace, but this doesn't happen for
MEM_LIMIT_EXCEEDED because these were copied from a global status.
Instead, construct them on the fly so that we get the normal Status
backtrace logging.  Also, remove some special case backtrace logging
that is now redundant from buffered-block-mgr.cc.

Primary motivation for this is to help debug IMPALA-2327, where it
appears a MEM_LIMIT_EXCEEDED status is dropped.  But I think this will
be generally useful for debugging problems that happen after
MEM_LIMIT_EXCEEDED.

Testing: Run test_mem_scaling.py and see all "Memory limit exceeded"
         now have backtraces.

Change-Id: I4cd04426e63397c24d3e16faa33caafc2a608c0c
Reviewed-on: http://gerrit.cloudera.org:8080/872
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Alex Behm
1c528492d3 Nested Types: Fix projection of collection-typed slots.
There was a bug with projecting collection-typed slots in the UnnestNode by setting
them to NULL. The problem was that the same tuple/slot could be referenced by multiple
input rows. As a result, all unnests after the first that operate on the same collection
value would incorrectly return an empty row batch because the slot had been set to NULL
by the first unnesting.

The fix is to ignore the null bit when retrieving a collection-typed slot's value in
the UnnestNode. We still set the null bit after retrieving the value for projection.
This solution purposely ignores the conventional NULL semantics of slots. It is a
temporary hack which must be removed eventually.

We rely on the producer of collection-typed slot values (scan node) to write an empty
array value into such slots when the they are NULL in addition to setting the null bit.

Change-Id: Ie6dc671b3d031f1dfe4d95090b1b6987c2c974da
Reviewed-on: http://gerrit.cloudera.org:8080/859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:33 -07:00
Tim Armstrong
db7519df24 IMPALA-2207: memory corruption on build side of NLJ
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.

This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.

Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.

Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Ippokratis Pandis
1da025fb0a Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails
This patch provides a temporary work-around IMPALA-2344. In case
block->Pin() fails in Sorter::Run::GetNext() we will fail the query
with OOM instead of DCHECK'ing. This work-around is similar to those
we did for IMPALA-1590 and IMPALA-1868. When we rework the buffer
management those failures should not happen.

This patch also has some minor formatting changes. More importantly
it adds a DCHECK in BufferedBlockMgr::FindBufferForBlock() that
ensures we do not call PinBuffer() in an already pinned block.

Change-Id: I5a43302b807972e39f4f6c98ec5ecb1eee5fe056
Reviewed-on: http://gerrit.cloudera.org:8080/861
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Alex Behm
057b0b7dba IMPALA-2322: Set new pointer for ArrayValue in Tuple::DeepCopyVarlenData().
The bug was a simple oversight where copied the array data, but forgot
to update the pointer of the corresponding ArrayValue.

Change-Id: Ib6ec0380f66194efc7ea3eb989535652eb8b526f
Reviewed-on: http://gerrit.cloudera.org:8080/855
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:32 -07:00
Tim Armstrong
c5b9b7a97d Added DCHECKS for non-null array values in sorter
The sorter does not currently support sorting tuples with collection
slots because the necessary deep copy logic is not implemented.
Fortunately, projection should ensure that all array values that reach
the sorter have been set to null. This patch adds DCHECKs to ensure
that this is the case. Variables are also renamed to reflect that with
nested types string values are a subset of variable-length values.

Change-Id: If617abe678903c69d12d1c65062c8063ae137296
Reviewed-on: http://gerrit.cloudera.org:8080/844
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-09-22 10:58:32 -07:00
Tim Armstrong
9ebe92c4f9 IMPALA-2299: dedup zero-length tuples correctly
The dedup logic in row batch serialisation incorrectly assumed that two
distinct tuples must have two distinct memory addresses. This is not
true if one tuple has zero length.

Update the serialisation logic to check for this case and insert a
NULL.

Adds a unit test that exercises this bug prior to the fix and a query
test that also hit a DCHECK prior to the fix.

Change-Id: If163274b3a6c10f8ac6b6bc90eee9ec95830b7dd
Reviewed-on: http://gerrit.cloudera.org:8080/849
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-22 10:58:31 -07:00
Nong Li
e06764fb16 Fix race in DiskIoMgr.
I discovered this while reading through the code to help with an Impala issue. The
problem is that in the cancellation path, EnqueueBuffer() will return the buffer
descriptor to the cache, which is effectively the same as freeing it. The caller
would then try to read some state from it and the state can be corrupt. From a conceptual
point of view, EnqueueBuffer() represents a memory hand-off so this behavior makes
sense. The caller just needs to store the state it needs before EnqueueBuffer().

This fix is similar to that for IMPALA-2101.

Change-Id: I2deab4923d1324fd275b8dd4ad37eedb6d39922c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/8023
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/834
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-09-15 08:38:17 -07:00
Tim Armstrong
1a72618fa0 IMPALA-2339: DCHECK in sorter should allow VARCHAR
This DCHECK was overly strict - this code path is correct for any
variable-length string type, not just STRING.

Change-Id: I724ea4c9d97056dc1e977124dd9ec0f41cf7a2ce
Reviewed-on: http://gerrit.cloudera.org:8080/840
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-15 08:38:11 -07:00
Tim Armstrong
d7ae529dac IMPALA-2335: DCHECK for zero-length tuple
This DCHECK condition was overly strict - a non-nullable tuple pointer
can be NULL if the tuple is zero bytes long (since there is no memory
backing the tuple).

Adds a test query that hit the DCHECK.

Change-Id: I16e8bc0db747b83c239de931a4fc9677d5c85ae6
Reviewed-on: http://gerrit.cloudera.org:8080/836
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-15 08:38:09 -07:00
Tim Armstrong
79e1ca689f IMPALA-2331: set null indicators in RowBatchSerializeTest
This correctly initialises all null indicators in RowBatchSerialize test
to not-NULL (i.e. 0 bytes). Previously they were not initialised, so the
tests only passed if the memory happened to be filled with zeroes.

Change-Id: I12edad82ebb8cd7a2fb91b6686555b82f9e133cf
Reviewed-on: http://gerrit.cloudera.org:8080/825
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:02 -07:00
Alex Behm
930f31c28f Fix mismatched malloc/delete.
Change-Id: I049ae522adf81a8c9a378b33fb80076492442726
Reviewed-on: http://gerrit.cloudera.org:8080/824
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:02 -07:00
Alex Behm
f46aa38161 IMPALA-2325: Skip NULL tuples in Coordinator::ValidateCollectionSlots().
Change-Id: I4b3e07f1ebe0d4244f7386f54d7b74b403b798e2
Reviewed-on: http://gerrit.cloudera.org:8080/817
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:01 -07:00
Ippokratis Pandis
f57ce3436c IMPALA-2256: Handle joins with right side of high cardinality and zero materialized slots
The hash join and tuple stream code was not handling correctly the
case of joins whose right side had very high cardinality but where
tuple had zero footprint. Any such join with more than 16M tuples
on the right side would crash. In particular, if the tuple footprint
is zero, an infinite number of rows fit in one block. But according to
the old way we were iterating over the rows of the stream, we would
increment by 1 the idx to get the next "row" eventually overflowing
and hitting dcheck.

Another, second, problem was the calculation of the size of the hash
table in such where the footprint of tuples is zero. In such case,
a hash table of minimum size would suffice. Instead we would try to
create a very large hash table to fit the large number of tuples,
resulting to OOM errors.

This patch fixes the two problems by having specific calculation of
the next idx in the stream as well as the size of the hash table in
case the stream contains tuples with zero footprint.

Change-Id: I12469b9c63581fcbc78c87200de7797eac3428c9
Reviewed-on: http://gerrit.cloudera.org:8080/811
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:01 -07:00
Alex Behm
ecdd5688b9 Nested Types: Tuple pointers are owned by the containing RowBatch by default.
This patch makes the ownership of the memory backing the tuple pointers of
a RowBatch dependent on whether the legacy joins and aggs are enabled:

By default, the memory is malloc'd and owned by the RowBatch:
If enable_partitioned_hash_join=true and enable_partitioned_aggregation=true
then the memory is owned by the RowBatch and is freed upon its destruction.
This mode is more performant especially with SubplanNodes in the ExecNode tree
because the tuple pointers are not transferred and do not have to be re-created
in every Reset().

Memory is allocated from MemPool:
Otherwise, the memory is allocated from the RowBatch's tuple pool. As a result,
the pointer memory is transferred just like tuple data, and must be re-created
in Reset(). This mode is required for the legacy join and agg which rely on the
tuple pointers being allocated from the RowBatch's tuple pool, so they can
acquire ownership of the tuple pointers.

Performance impact for nested types:
Initial cluster runs and profiling on nested TPCH identified excessive
malloc/frees as a major performance bottleneck. This change paves the way
for further optimizations which yielded a 2x improvement in response time
for most nested TPCH queries.

Change-Id: I4ac58b18058ce46b4db89fbe117b0bcad19e9ee7
Reviewed-on: http://gerrit.cloudera.org:8080/807
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:01 -07:00
Tim Armstrong
25e7454bc9 IMPALA-2319: correctly enforce NLJ limit
In some cases in the NLJ node eos_ wasn't set even though the limit was
reached. This prevented the limit from being handled correctly before
returning rows to the caller of GetNext(). This could result in either
too many rows being returned, or a crash when the row batch size was
set to an invalid negative number.

The fix is to always check for whether the limit was reached before
returning from GetNext().

Change-Id: I660e774787870213ada9f2d3e6f10953d9937022
Reviewed-on: http://gerrit.cloudera.org:8080/797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-11 22:45:39 +00:00
Tim Armstrong
302df6be5d Cleanup of DeepCopy
Rename some functions and consistently skip the variable length code
path when there are no variable length slots.

Change-Id: I2f3405fcc5f545b207fa48e17f37fe968208d94c
Reviewed-on: http://gerrit.cloudera.org:8080/773
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-10 21:27:52 +00:00
Alex Behm
deb9c6f8e6 Nested Types: Poor man's projection for collection-typed slots.
Collection-typed slots are expensive to copy, e.g., during data
exchanges or when writing into a buffered-tuple-stream. Even worse,
such slots could be duplicated many times after unnesting in a
subplan. To alleviate this problem, this patch implements a
poor man's projection where collection-typed slots are set to NULL
inside the SubplanNode that flattens them.

The FE guarantees that the contents of an array-typed slot are never
referenced outside of the single UnnestNode that access them, so when
returning eos in UnnestNode::GetNext() we also set the unnested array
slot to NULL to avoid those expensive copies in downstream exec nodes.

The FE provides that guarantee by creating a new slot in the parent
scan for every relative CollectionTableRef. For example, for a table
't' with a collection-typed column 'c' the following query would have
two separate slots in the tuple of 't', one for 'c1' and one for 'c2':

select * from t, t.c c1, t.c c2

Change-Id: I90e5b86463019c9ed810c299945c831c744ff563
Reviewed-on: http://gerrit.cloudera.org:8080/763
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-10 05:44:55 +00:00
Dan Hecht
c404f2ea2e IMPALA-2286: Fix race between ~BufferedBlockMgr() and BufferedBlockMgr::Create()
The race, described below, can lead to:
   DCHECK: query_to_block_mgrs_.find(query_id_) != query_to_block_mgrs_.end()

Another fragment may have called Create() for this query_id_ and
saw that this BufferedBlockMgr is being destructed.  That fragement will
overwrite the map entry for query_id_, pointing it to a different
BufferedBlockMgr object.  We should let that object's destructor remove the
entry.  On the other hand, if the second BufferedBlockMgr is destructed before
this thread acquires the lock, then we'll remove the entry (because we can't
distinguish between the two expired pointers), and when the other
~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_.

Change-Id: I18a9d965e0689adc4ee9b837eef19a21e0683a64
Reviewed-on: http://gerrit.cloudera.org:8080/772
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-09-09 17:48:36 +00:00
Tim Armstrong
235a8d08da IMPALA-2295: deep copy arrays in BufferedTupleStream
This was unimplemented and is used on some code paths. Arrays were not
properly copied into the BufferedTupleStream, potentially leaving stray
pointers to invalid or reused memory. Arrays are now correctly deep
copied.  Includes a unit test that copys rows containing arrays in and
out of a BufferedTupleStream.

Also implement matching optimisation for deep copy in RowBatch.

Change-Id: I75d91a6b439450c5b47646b73bc528cfb8f7109b
Reviewed-on: http://gerrit.cloudera.org:8080/751
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-09 02:39:14 +00:00
Tim Armstrong
1ae79dbb0a Switch std::list to std::deque in Sorter
Sorter calls std::list::size() in multiple places. This is an O(n)
operation pre-C++11. std::deque is a good alternative here, since all
inserts and removals are at the beginning and end of the list.

This should give a small performance boost.

Change-Id: I1387299796742673c6ed03f6b47a0d41b88c2563
Reviewed-on: http://gerrit.cloudera.org:8080/768
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-09 01:33:37 +00:00
Tim Armstrong
5ac55f24cc IMPALA-2296: missing DeepCopy array support
Implement Tuple-to-Tuple DeepCopy for collections. Add query test
that uses the TOP-N node, which deep copies tuples in this way.
Confirmed that the query test failed before this fix.

Change-Id: I3fea860d8251038d7b5eb85c77973939abe9dbf8
Reviewed-on: http://gerrit.cloudera.org:8080/757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-08 23:40:53 +00:00
Tim Armstrong
af85fa63da Fix missing space in error message
Change-Id: I193b84537e8b7c9066bbde2956353eb5d5283a77
Reviewed-on: http://gerrit.cloudera.org:8080/765
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-09-08 20:08:21 +00:00
Tim Armstrong
4c95469a80 IMPALA-2305: avoid blacklisting to prevent test failures
Blacklisting appears to be causing some tests to fail (perhaps because
of a write error encountered by another test). For now, disable
blacklisting.

Change-Id: I06265eac465f58a836b69b01bcb60a8f875a22cc
Reviewed-on: http://gerrit.cloudera.org:8080/760
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-06 22:48:25 +00:00
Tim Armstrong
5b1157b44b IMPALA-2079: Part 2: handling tmp device errors
Tmp devices are blacklisted when a write error is encountered for that
device. No more scratch space will be allocated on the blacklisted
device, based on the assumption that the device is likely to be
misconfigured or failing.

This patch does not attempt to recover the query that experienced the
write error. It also does not attempt to remap any existing blocks away
from the temporary device.

This behaviour is unit tested for several failure scenarios.

This patch adds additional test infrastructure required for testing
BufferedBlockMgr behavior in the presence of faults and in
configurations with multiple tmp directories.

Adds metrics tmp-file-mgr.active-scratch-dirs and
tmp-file-mgr.active-scratch-dirs.list that track the number and set of
active scratch dirs and expose it in the Impala web UI.

Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746
Reviewed-on: http://gerrit.cloudera.org:8080/579
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-03 01:54:32 +00:00
Skye Wanderman-Milne
bcc73a36da Nested types: read and materialize nested types in Parquet scanner
This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.

This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).

I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.

Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-02 19:23:54 +00:00
Tim Armstrong
9a80402e47 IMPALA-2270: avoid FnvHash64to32 with empty inputs
FnvHash64to32 produces pathologically bad results when hashing zero-byte
input: it always returns 0 regardless of the input hash seed. This
is a result of it xoring the 32-bit hash seed with itself. This patch
adds a DCHECK to this function to verify that this function is not
invoked with zero-byte inputs, and updates all callsites to check for
the zero-length case.

This patch also improves hashing of booleans: false and NULL no longer
hash to the same value.

Change-Id: I6706f6ea167e5362d55351f7cc0c637c680a315d
Reviewed-on: http://gerrit.cloudera.org:8080/720
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-09-02 11:07:03 +00:00
Skye Wanderman-Milne
62b31bdb52 Nested types: send materialized path to BE
See comment in Descriptors.thrift for what the materialized path is.

Change-Id: I64d00cf1bc2edcbbed3b6cdd5e934c55fff70a49
Reviewed-on: http://gerrit.cloudera.org:8080/650
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-09-02 10:30:24 +00:00
Tim Armstrong
b402003532 Nested Types: Dedup tuples when de/serializing
This patch extends the deduplication of tuples in row batches to work on
non-adjacent tuples. This deduplication requires an additional data
structure (a hash table) and adds additional performance overhead (up to
3x serialization time), so it is only enabled for row batches with
compositions that are likely to blow up due to non-adjacent duplication
of large tuples. This avoids performance regression in typical cases,
while preventing size blow-ups in problematic cases, such as joining
three streams of tuples some of which contain may contain large
collections.

A test is included that ensures that adjacent deduplication is enabled.
The row batch serialize benchmark shows that deduplication does not regress
performance of serialization or deserialization.

Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407
Reviewed-on: http://gerrit.cloudera.org:8080/573
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-08-31 18:45:57 +00:00