This patch changes the Parquet scanner to check if it can't read the
full footer scan range, indicating that file has been overwritten by a
shorter file without refreshing the table metadata. Before it would
DCHECK. This patch adds a test for this case, as well as the case
where the new file is longer than the metadata states (which fails
with an existing error).
Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07
Reviewed-on: http://gerrit.cloudera.org:8080/1084
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
We weren't actually returning and would try to allocate an array
larger than INT_MAX. I tested by hand that a very large array (200M
elements) would fail to be allocated and the appropriate error is
returned to the shell, as well as adding an ArrayValueBuilder unit test.
Change-Id: Iedc3b3ca8c9c07100d6602f8e8cc9cfd57151747
Reviewed-on: http://gerrit.cloudera.org:8080/886
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Since MemPool::Allocate() takes an int, INT_MAX is the largest
possible requested allocation. This and slightly lower values would
cause the 'num_bytes' variable in the private Allocate() function
overflow, which would yield a valid pointer to a buffer too small to
hold the requested number of bytes. This patch fixes this problem by
making 'num_bytes' an int64_t, and adds a test for this behavior.
This may help some problems related to IMPALA-1619, although it's not
quite the same since MemPool was never limited to 1GB, only FreePool.
Change-Id: Ia8040d87d3be32896944b5d44ff0c1f2667837e0
Reviewed-on: http://gerrit.cloudera.org:8080/1107
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The code in resource-broker.cc that makes RPCs to Llama will
attempt to retry the RPC some number of times (which is
configurable) if the RPC returns a failure. If the RPC
throws (which thrift may do), we try to reset the connection
and then make the RPC again, but this time not guarded by a
try/catch block. If this RPC throws, the process will crash.
This fixes the issue by removing the try/catch and instead
using the ClientCache DoRpc function which handles this
already. Some additional Llama RPC calling wrappers were
removed as well.
Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a
Reviewed-on: http://gerrit.cloudera.org:8080/881
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Due to IMPALA-1619, allocating StringBuffer larger than 1GB could
cause Impala crash. Check the requested buffer size in advance and
fail the request if it is larger than 1GB. Once IMPALA-1619 is
fixed, we should revert this change.
Change-Id: Iffd1e701614b520ce58922ada2400386661eedb1
(cherry picked from commit 74ba16770eeade36ab77c86ed99d9248c60b0131)
Reviewed-on: http://gerrit.cloudera.org:8080/869
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
This commit fixes the issue where the impalad will crash if the sort
node can't get a new block from the buffered block manager. The fix
removes the DCHECKS after the calls to BufferedBlockMgr::GetNewBlock()
and returns a proper OOM error instead. It also ensures that the callers
of Sorter::Run::Init() don't ignore the returned status (IMPALA-2435).
Change-Id: I611f173fac3add770988e9d4aaa48efc4229fbd6
Reviewed-on: http://gerrit.cloudera.org:8080/976
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
There were several possible races in DataStreamRecvr where the
RuntimeState for the query was torn down while a thread was waiting in
DataStreamRecvr. This is possible when waiting on a condition variable
because the lock is released. After the thread wakes up from the
condition variable, it can crash by updating a counter on a
RuntimeProfile that no longer exists.
To address the problem this patch adds CANCEL_SAFE_SCOPED_TIMER, a
variant of SCOPED_TIMER that checks a boolean variable for cancellation
before updating the RuntimeProfile. This should be used instead of
SCOPED_TIMER anywhere that it is possible that the RuntimeProfile will
be torn down while the timer is active.
Change-Id: Ib4339090d3dfb097e4c160a21b470f00b9c44bbf
Reviewed-on: http://gerrit.cloudera.org:8080/1061
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
When the coordinator prints the 'backend number' of
fragments that are finished or result in an error, the
hostname associated with that backend is also printed.
Change-Id: I0b27549bd9155ab9b077933ab6f621f4f0887371
Reviewed-on: http://gerrit.cloudera.org:8080/912
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
When running certain heavy workloads it was observed that a
'RuntimeState' member variable was being accessed in WriteComplete()
on a RequestContext::Cancel() path after that RuntimeState object was
destroyed. This is a temporary fix to ensure that the destroyed member
variable is not accessed if the write status is cancelled.
A test is not included as it is not deterministically reproducible.
Change-Id: I8a55c070d25f0ca5c830a955e84df450061753a3
Reviewed-on: http://gerrit.cloudera.org:8080/897
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
Before this patch, a common pattern was:
Status status = Status::MemLimitExceeded();
status.SetErrorMsg(<custom error msg>);
state_->SetMemLimitExceeded();
return status;
This could cause the custom error message to be dropped, since
RuntimeState::SetMemLimitExceeded() sets query_status_ to the generic
"Memory limit exceeded" error, which then prevents query_status_ from
being set to the custom error status. (The custom error message is
often logged in the runtime state, but not always.)
This patch has RuntimeState::SetMemLimitExceeded() take an optional
ErrorMsg argument which is used to construct the new query status, and
changes existing uses of SetErrorMsg() + SetMemLimitExceeded() to use
this new functionality.
Change-Id: I9fe20da0bcc2cf01f2fd1fe29ae32c1a00708da1
Reviewed-on: http://gerrit.cloudera.org:8080/885
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
The bug: Several places in the sort assumed that the presence of var-len
slots in the tuple to be sorted implied having var-len blocks.
However, if all var-len slots are NULL or empty, then we can have no
var-len blocks. This case is now properly handled.
Change-Id: Ia3ad3669313e9d494ce2472af7775febfa6f247c
Reviewed-on: http://gerrit.cloudera.org:8080/913
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
In some cases the planner generated plans with rows with no
materialized tuples. Recent changes to the backend caused these to
hit a DCHECK. This patch addresses one case in the planner where it
was possible to create such plans: when the planner generated an
empty node from a select subquery with no from clause. The fix is to
create a materialized tuple based on the select list expressions, in
the same way as we handle these selects when the planner cannot
statically determine they have no result rows.
An example query is included as a test.
It also adds additional checks to the frontend and backend to catch
these invalid rows earlier.
Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4
Reviewed-on: http://gerrit.cloudera.org:8080/911
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch avoids a few unecessary memory allocations, locks and atomics
that can become a bottleneck when executing subplans.
On the following benchmark the end-to-end runtime of a subplan-heavy
query was improved from 37s to 27s.
Benchmark:
select count(*) from huge_customer c, c.c_orders o, o.o_lineitems
The huge_customer table had 48 files totalling 6.5GB. The table was created
by copying the files of tpch_nested_parquet.customer several times into the
huge_customer table. I ran the benchmark with a single impalad.
There are still several easy opportunities for improving the performance
of subplan execution.
Change-Id: I9fce1c2857a8f8e6ed3f1b4842d07fd80c11296a
Reviewed-on: http://gerrit.cloudera.org:8080/894
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Fixes some logic in the buffered-block-mgr that we believe
is wrong, and leads to the following DCHECK failing:
Check failed: client->num_tmp_reserved_buffers_ == 0
(buffered-block-mgr.cc:259)
It may be possible for other issues to exist in the
accounting for num_tmp_reserved_buffers_, but this fix seems
to work as evidenced by the stress tests running without
hitting this DCHECK.
Change-Id: Ic6415afc6722461dc57f763a46e3abbc3aa09af6
Reviewed-on: http://gerrit.cloudera.org:8080/880
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This is the first step to fix issues with large memory allocations. In
this patch, the built-in `group_concat` is no longer allowed to allocate
arbitraryly large strings and crash impala, but is limited to the upper
bound of possible allocations in Impala.
This patch does not perform any functional change, but rather avoids
unnecessary crashes. However, it changes the parameter type of
FindChunk() in MemPool to be a signed 64bit integer. This change allows
the mempool to allocate internally memory of more than one 1GB, but the
public interface of Allocate() is not changed, so the general limitation
remains. The reason for this change is as follows:
1) In a UDF FunctionContext::Reallocate() would allocate slightly more
than 512MB from the FreePool.
2) The free pool tries to double this size to alloocate 1GB from the
MemPool.
3) The MemPool doubles the size again and overflows the signed 32bit
integer in the FindChunk() method. This will then only allocate 1GB
instead of the expected 2GB.
What happens is that one of the callers expected a larger allocation
than actually happened, which will in turn lead to memory corruption as
soon as the memory is accessed.
Change-Id: I068835dfa0ac8f7538253d9fa5cfc3fb9d352f6a
Reviewed-on: http://gerrit.cloudera.org:8080/858
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
A recent change that allocated a row batch's tuple pointers with malloc
rather than a MemPool meant that the tuple pointers were no longer
counted towards query or process memory limits. For some workload the
number of row batches and therefore the aggregate size of the tuple
pointers can be significant, so it is important to correctly account for
the memory.
This change simply pairs malloc() and free() calls with matching calls
to MemTracker::Consume() and MemTracker::Release().
Change-Id: I7fb5f2844ad0d51f71a3701d8f0897d8f0b24e18
Reviewed-on: http://gerrit.cloudera.org:8080/895
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
These tests were disabled because they relied on blacklisting, which
was disabled. It is still useful to have tests to exercise the error
handling code and ensure that scratch directories are used or not
used when they should be.
Change-Id: I89195a9fdd7ed858ae2addce93413871cffaf29b
Reviewed-on: http://gerrit.cloudera.org:8080/846
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
PAGG and PHJ were using an all-or-nothing approach wrt spilling. In
particular, they were trying to switch to IO-sized buffers for both
streams (aggregated and unaggregated in PAGG; build and probe in PHJ)
of every partition (currently 16 partitions for a total of 32
streams), even if some of the streams had very few rows, they were
empty or simply they would not spill so there was no need to allocate
IO-buffers for them. That was increasing the min mem needed by those
operators in many queries.
This patch decouples the decision to switch to IO-buffers for each
stream of each partition. Streams will switch to IO-sized buffers
whenever the rows they contain do not fit in the first two small
buffers (64KB and 512KB respectively). When we decide to spill a
partition, we switch to IO buffers both streams.
With these change many streams of PAGG and PHJ nodes do not need to
use IO-sized buffers, reducing the min mem requirement. For example,
below is the min mem needed (in MBs) for some of the TPC-H queries.
Some need half or less mem from the mem they needed before:
TPC-H Q3: 645 -> 240
TPC-H Q5: 375 -> 245
TPC-H Q7: 685 -> 265
TPC-H Q8: 740 -> 250
TPC-H Q9: 650 -> 400
TPC-H Q18: 1100 -> 425
TPC-H Q20: 420 -> 250
TPC-H Q21: 975 -> 620
To make this small buffer optimization to work, we had to fix
IMPALA-2352. That is, the AllocateRow() call of
PAGG::ConstructIntermediateTuple() could return unsuccessfully just
because the small buffers of the stream were exhausted. In that case,
previously we would treat it as an indication that there is no memory
left, start spilling a partition and switching all stream to
IO-buffes. Now we make a best effort, trying to first
SwitchToIoffers() and if that is successful, we re-attempt the
AllocateRow() call. See IMPALA-2352 for more details.
Another change is that now SwitchToIoBuffers() will reset the flag
using_small_buffers_ back to false, in case we are in a very low
memory situation and it fails to get a buffer. That allows us to
retry calling SwitchToIoBuffers() once we free up some space. See
IMPALA-2330 for more details.
With the above fixes we should also have fixed IMPALA-2241 and
IMPALA-2271 that are essentially stream::using_small_buffers_-related
DCHECKs.
This patch adds all 22 TPC-H queries in test_mem_usage_scaling test
and updates the per-query min mem limits in it. Additionally, it adds
a new aggregation test that uses the TPC-H dataset for larger
aggregations (TestTPCHAggregationQueries). It also removes some
dead test code.
Change-Id: Ia8ccd0b76f6d37562be21fd4539aedbc2a864d38
Reviewed-on: http://gerrit.cloudera.org:8080/818
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
Conflicts:
tests/query_test/test_aggregation.py
DiskIoMgr did not correctly check the return code of fread. As a result,
if fread() return an error, DiskIoMgr would think it had succeeded and
filled the buffer with valid data, whereas in reality the buffer would
be full of garbage data and some error occurred.
This change will detect and report errors correctly.
Added a test for the eof case that failed before the fix and now passed.
The error case is difficult to test without modifying DiskIoMgr or
injecting faults at the filesystem level.
Change-Id: Ic4822183a9cd228da670b5fd130e34ca875e8c80
Reviewed-on: http://gerrit.cloudera.org:8080/882
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The buffered tuple stream tests wrote out the memory for integer tuples
as a contiguous array of 4-byte integers, neglecting null indicators.
The same bad assumption was present in both the read and write code,
so the test passed in many circumstances. However, buffer overruns were
sometimes detected in the ASAN build. This bug was present for a long
time but a recent change made the ASAN build fail consistently.
IMPALA-1688, an unexplained failure in buffered-tuple-stream-test may
have the same cause.
This fix also changes the integers written out so that they are not all
small positive integers with many zero bytes.
Change-Id: I4a158751e8a9c934c912831032e83ec85056f06e
Reviewed-on: http://gerrit.cloudera.org:8080/876
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
RowBatch::DeepCopyTo doesn't deduplicate tuples when copying. If not
used carefully, this could lead to us producing oversized row batches
(similar to the problem solved by dedup in RowBatch::Serialize).
Currently this could only occur if the build-side child of a nested loop
join both produces batches with many duplicate tuples and sets
MarkNeedToReturn.
Also fix error in comment.
Change-Id: Ib59c92f42ee4d491c9a9d55e0b125af90f2b1d48
Reviewed-on: http://gerrit.cloudera.org:8080/874
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Normally, error Status logs backtrace, but this doesn't happen for
MEM_LIMIT_EXCEEDED because these were copied from a global status.
Instead, construct them on the fly so that we get the normal Status
backtrace logging. Also, remove some special case backtrace logging
that is now redundant from buffered-block-mgr.cc.
Primary motivation for this is to help debug IMPALA-2327, where it
appears a MEM_LIMIT_EXCEEDED status is dropped. But I think this will
be generally useful for debugging problems that happen after
MEM_LIMIT_EXCEEDED.
Testing: Run test_mem_scaling.py and see all "Memory limit exceeded"
now have backtraces.
Change-Id: I4cd04426e63397c24d3e16faa33caafc2a608c0c
Reviewed-on: http://gerrit.cloudera.org:8080/872
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
There was a bug with projecting collection-typed slots in the UnnestNode by setting
them to NULL. The problem was that the same tuple/slot could be referenced by multiple
input rows. As a result, all unnests after the first that operate on the same collection
value would incorrectly return an empty row batch because the slot had been set to NULL
by the first unnesting.
The fix is to ignore the null bit when retrieving a collection-typed slot's value in
the UnnestNode. We still set the null bit after retrieving the value for projection.
This solution purposely ignores the conventional NULL semantics of slots. It is a
temporary hack which must be removed eventually.
We rely on the producer of collection-typed slot values (scan node) to write an empty
array value into such slots when the they are NULL in addition to setting the null bit.
Change-Id: Ie6dc671b3d031f1dfe4d95090b1b6987c2c974da
Reviewed-on: http://gerrit.cloudera.org:8080/859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.
This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.
Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.
Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch provides a temporary work-around IMPALA-2344. In case
block->Pin() fails in Sorter::Run::GetNext() we will fail the query
with OOM instead of DCHECK'ing. This work-around is similar to those
we did for IMPALA-1590 and IMPALA-1868. When we rework the buffer
management those failures should not happen.
This patch also has some minor formatting changes. More importantly
it adds a DCHECK in BufferedBlockMgr::FindBufferForBlock() that
ensures we do not call PinBuffer() in an already pinned block.
Change-Id: I5a43302b807972e39f4f6c98ec5ecb1eee5fe056
Reviewed-on: http://gerrit.cloudera.org:8080/861
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
The bug was a simple oversight where copied the array data, but forgot
to update the pointer of the corresponding ArrayValue.
Change-Id: Ib6ec0380f66194efc7ea3eb989535652eb8b526f
Reviewed-on: http://gerrit.cloudera.org:8080/855
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The sorter does not currently support sorting tuples with collection
slots because the necessary deep copy logic is not implemented.
Fortunately, projection should ensure that all array values that reach
the sorter have been set to null. This patch adds DCHECKs to ensure
that this is the case. Variables are also renamed to reflect that with
nested types string values are a subset of variable-length values.
Change-Id: If617abe678903c69d12d1c65062c8063ae137296
Reviewed-on: http://gerrit.cloudera.org:8080/844
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The dedup logic in row batch serialisation incorrectly assumed that two
distinct tuples must have two distinct memory addresses. This is not
true if one tuple has zero length.
Update the serialisation logic to check for this case and insert a
NULL.
Adds a unit test that exercises this bug prior to the fix and a query
test that also hit a DCHECK prior to the fix.
Change-Id: If163274b3a6c10f8ac6b6bc90eee9ec95830b7dd
Reviewed-on: http://gerrit.cloudera.org:8080/849
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
I discovered this while reading through the code to help with an Impala issue. The
problem is that in the cancellation path, EnqueueBuffer() will return the buffer
descriptor to the cache, which is effectively the same as freeing it. The caller
would then try to read some state from it and the state can be corrupt. From a conceptual
point of view, EnqueueBuffer() represents a memory hand-off so this behavior makes
sense. The caller just needs to store the state it needs before EnqueueBuffer().
This fix is similar to that for IMPALA-2101.
Change-Id: I2deab4923d1324fd275b8dd4ad37eedb6d39922c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/8023
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/834
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
This DCHECK was overly strict - this code path is correct for any
variable-length string type, not just STRING.
Change-Id: I724ea4c9d97056dc1e977124dd9ec0f41cf7a2ce
Reviewed-on: http://gerrit.cloudera.org:8080/840
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This DCHECK condition was overly strict - a non-nullable tuple pointer
can be NULL if the tuple is zero bytes long (since there is no memory
backing the tuple).
Adds a test query that hit the DCHECK.
Change-Id: I16e8bc0db747b83c239de931a4fc9677d5c85ae6
Reviewed-on: http://gerrit.cloudera.org:8080/836
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This correctly initialises all null indicators in RowBatchSerialize test
to not-NULL (i.e. 0 bytes). Previously they were not initialised, so the
tests only passed if the memory happened to be filled with zeroes.
Change-Id: I12edad82ebb8cd7a2fb91b6686555b82f9e133cf
Reviewed-on: http://gerrit.cloudera.org:8080/825
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The hash join and tuple stream code was not handling correctly the
case of joins whose right side had very high cardinality but where
tuple had zero footprint. Any such join with more than 16M tuples
on the right side would crash. In particular, if the tuple footprint
is zero, an infinite number of rows fit in one block. But according to
the old way we were iterating over the rows of the stream, we would
increment by 1 the idx to get the next "row" eventually overflowing
and hitting dcheck.
Another, second, problem was the calculation of the size of the hash
table in such where the footprint of tuples is zero. In such case,
a hash table of minimum size would suffice. Instead we would try to
create a very large hash table to fit the large number of tuples,
resulting to OOM errors.
This patch fixes the two problems by having specific calculation of
the next idx in the stream as well as the size of the hash table in
case the stream contains tuples with zero footprint.
Change-Id: I12469b9c63581fcbc78c87200de7797eac3428c9
Reviewed-on: http://gerrit.cloudera.org:8080/811
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
This patch makes the ownership of the memory backing the tuple pointers of
a RowBatch dependent on whether the legacy joins and aggs are enabled:
By default, the memory is malloc'd and owned by the RowBatch:
If enable_partitioned_hash_join=true and enable_partitioned_aggregation=true
then the memory is owned by the RowBatch and is freed upon its destruction.
This mode is more performant especially with SubplanNodes in the ExecNode tree
because the tuple pointers are not transferred and do not have to be re-created
in every Reset().
Memory is allocated from MemPool:
Otherwise, the memory is allocated from the RowBatch's tuple pool. As a result,
the pointer memory is transferred just like tuple data, and must be re-created
in Reset(). This mode is required for the legacy join and agg which rely on the
tuple pointers being allocated from the RowBatch's tuple pool, so they can
acquire ownership of the tuple pointers.
Performance impact for nested types:
Initial cluster runs and profiling on nested TPCH identified excessive
malloc/frees as a major performance bottleneck. This change paves the way
for further optimizations which yielded a 2x improvement in response time
for most nested TPCH queries.
Change-Id: I4ac58b18058ce46b4db89fbe117b0bcad19e9ee7
Reviewed-on: http://gerrit.cloudera.org:8080/807
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
In some cases in the NLJ node eos_ wasn't set even though the limit was
reached. This prevented the limit from being handled correctly before
returning rows to the caller of GetNext(). This could result in either
too many rows being returned, or a crash when the row batch size was
set to an invalid negative number.
The fix is to always check for whether the limit was reached before
returning from GetNext().
Change-Id: I660e774787870213ada9f2d3e6f10953d9937022
Reviewed-on: http://gerrit.cloudera.org:8080/797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Rename some functions and consistently skip the variable length code
path when there are no variable length slots.
Change-Id: I2f3405fcc5f545b207fa48e17f37fe968208d94c
Reviewed-on: http://gerrit.cloudera.org:8080/773
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Collection-typed slots are expensive to copy, e.g., during data
exchanges or when writing into a buffered-tuple-stream. Even worse,
such slots could be duplicated many times after unnesting in a
subplan. To alleviate this problem, this patch implements a
poor man's projection where collection-typed slots are set to NULL
inside the SubplanNode that flattens them.
The FE guarantees that the contents of an array-typed slot are never
referenced outside of the single UnnestNode that access them, so when
returning eos in UnnestNode::GetNext() we also set the unnested array
slot to NULL to avoid those expensive copies in downstream exec nodes.
The FE provides that guarantee by creating a new slot in the parent
scan for every relative CollectionTableRef. For example, for a table
't' with a collection-typed column 'c' the following query would have
two separate slots in the tuple of 't', one for 'c1' and one for 'c2':
select * from t, t.c c1, t.c c2
Change-Id: I90e5b86463019c9ed810c299945c831c744ff563
Reviewed-on: http://gerrit.cloudera.org:8080/763
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The race, described below, can lead to:
DCHECK: query_to_block_mgrs_.find(query_id_) != query_to_block_mgrs_.end()
Another fragment may have called Create() for this query_id_ and
saw that this BufferedBlockMgr is being destructed. That fragement will
overwrite the map entry for query_id_, pointing it to a different
BufferedBlockMgr object. We should let that object's destructor remove the
entry. On the other hand, if the second BufferedBlockMgr is destructed before
this thread acquires the lock, then we'll remove the entry (because we can't
distinguish between the two expired pointers), and when the other
~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_.
Change-Id: I18a9d965e0689adc4ee9b837eef19a21e0683a64
Reviewed-on: http://gerrit.cloudera.org:8080/772
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This was unimplemented and is used on some code paths. Arrays were not
properly copied into the BufferedTupleStream, potentially leaving stray
pointers to invalid or reused memory. Arrays are now correctly deep
copied. Includes a unit test that copys rows containing arrays in and
out of a BufferedTupleStream.
Also implement matching optimisation for deep copy in RowBatch.
Change-Id: I75d91a6b439450c5b47646b73bc528cfb8f7109b
Reviewed-on: http://gerrit.cloudera.org:8080/751
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Sorter calls std::list::size() in multiple places. This is an O(n)
operation pre-C++11. std::deque is a good alternative here, since all
inserts and removals are at the beginning and end of the list.
This should give a small performance boost.
Change-Id: I1387299796742673c6ed03f6b47a0d41b88c2563
Reviewed-on: http://gerrit.cloudera.org:8080/768
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Implement Tuple-to-Tuple DeepCopy for collections. Add query test
that uses the TOP-N node, which deep copies tuples in this way.
Confirmed that the query test failed before this fix.
Change-Id: I3fea860d8251038d7b5eb85c77973939abe9dbf8
Reviewed-on: http://gerrit.cloudera.org:8080/757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Blacklisting appears to be causing some tests to fail (perhaps because
of a write error encountered by another test). For now, disable
blacklisting.
Change-Id: I06265eac465f58a836b69b01bcb60a8f875a22cc
Reviewed-on: http://gerrit.cloudera.org:8080/760
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Tmp devices are blacklisted when a write error is encountered for that
device. No more scratch space will be allocated on the blacklisted
device, based on the assumption that the device is likely to be
misconfigured or failing.
This patch does not attempt to recover the query that experienced the
write error. It also does not attempt to remap any existing blocks away
from the temporary device.
This behaviour is unit tested for several failure scenarios.
This patch adds additional test infrastructure required for testing
BufferedBlockMgr behavior in the presence of faults and in
configurations with multiple tmp directories.
Adds metrics tmp-file-mgr.active-scratch-dirs and
tmp-file-mgr.active-scratch-dirs.list that track the number and set of
active scratch dirs and expose it in the Impala web UI.
Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746
Reviewed-on: http://gerrit.cloudera.org:8080/579
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.
This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).
I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.
Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
FnvHash64to32 produces pathologically bad results when hashing zero-byte
input: it always returns 0 regardless of the input hash seed. This
is a result of it xoring the 32-bit hash seed with itself. This patch
adds a DCHECK to this function to verify that this function is not
invoked with zero-byte inputs, and updates all callsites to check for
the zero-length case.
This patch also improves hashing of booleans: false and NULL no longer
hash to the same value.
Change-Id: I6706f6ea167e5362d55351f7cc0c637c680a315d
Reviewed-on: http://gerrit.cloudera.org:8080/720
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
See comment in Descriptors.thrift for what the materialized path is.
Change-Id: I64d00cf1bc2edcbbed3b6cdd5e934c55fff70a49
Reviewed-on: http://gerrit.cloudera.org:8080/650
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
This patch extends the deduplication of tuples in row batches to work on
non-adjacent tuples. This deduplication requires an additional data
structure (a hash table) and adds additional performance overhead (up to
3x serialization time), so it is only enabled for row batches with
compositions that are likely to blow up due to non-adjacent duplication
of large tuples. This avoids performance regression in typical cases,
while preventing size blow-ups in problematic cases, such as joining
three streams of tuples some of which contain may contain large
collections.
A test is included that ensures that adjacent deduplication is enabled.
The row batch serialize benchmark shows that deduplication does not regress
performance of serialization or deserialization.
Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407
Reviewed-on: http://gerrit.cloudera.org:8080/573
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>