Returns the date of the weekday that follows a particular date.
The weekday argument is a string literal indicating the day of the week.
Also this argument is case-insensitive. Available values are:
"Sunday"/"SUN", "Monday"/"MON", "Tuesday"/"TUE",
"Wednesday"/"WED", "Thursday"/"THU", "Friday"/"FRI", "Saturday"/"SAT".
For example, the first Saturday after Wednesday, 25 December 2013
is on 28 December 2013.
select next_day('2013-12-25','Saturday') returns '2013-12-28 00:00:00'
select next_day(to_timestamp('08-1987-21', 'MM-yyyy-dd'), 'FRIDAY')
returns '1987-08-28 00:00:00'
Change-Id: I2721d236c096639a9e7d2df8a45ca888c6b3e83e
Reviewed-on: http://gerrit.cloudera.org:8080/1943
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
Adds handling and testing for a specific Parquet data corruption
scenario with plain dictionary encoded values.
The problematic scenario is when the repeat or literal count of
the RLE-encoded dictionary indexes is decoded as 0 - an invalid value.
There are several other cases of data corruption that are not yet
handled gracefully. This patch only handles one specific case.
Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
Reviewed-on: http://gerrit.cloudera.org:8080/3299
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).
$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.
By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.
Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.
Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits). Ran an exhaustive test
run before posting for review.
This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.
Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Reviewed-on: http://gerrit.cloudera.org:8080/3315
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
The Sorter's memory management logic failed to correctly manage buffers
when spilling. It would try to make use of all buffers in the system,
neglecting to account for other operators' buffer usage.
This patch adjusts the logic so that it handles contention for buffers
so long as it can get enough buffers to make progress. Instead of
precalculating the number of buffers it thinks it should be able to
pin, it just makes a best-effort attempt to pin the initial buffers
as many runs as possible, up to a limit. As long as it can pin three
runs, it can make progress.
Testing:
Added an additional test that failed before the patch without OOM.
An analytic function test that was meant to fail also started succeeding
so I had to adjust the limit there too.
Change-Id: Idfe55cc13c7f2b54cba1d05ade44cbcf6bb573c0
Reviewed-on: http://gerrit.cloudera.org:8080/2908
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Clarify relationships between classes, clean up the previous mess
where every class was friends with the other so there's an actual
distinction between public and private members. TupleIterator
is now no longer tied to TupleSorter, just Run.
Document and enforce invariants in many cases.
Factor out some functions from large functions.
Simplify and document iterator logic.
Make management of buffers when iterating over output stream more
explicitly correct: either use MarkNeedToReturn() or attach block
to the batch as appropriate. The SortedRunMerger didn't handle
resource transfer correctly, except if all the memory came from
the batch's MemPool. This patch fixes the cases when resources
are attached to the batches, but not the 'need_to_return' case.
Document that SortedRunMerger requires 'deep_copy_input' to be true
if batches can have the 'need_to_return' flag set.
Also use the atomic block exchange operation when moving between
blocks in unpinned runs to prevent pin failures at that point.
I explicitly have avoided changing the hairy block management logic
when allocating buffers for merging, that will need addressing in
a follow-up patch.
Add a SpilledRuns counter so that it's more explicit that spilling
occurred.
Testing:
Added some tests for corner cases with empty and NULL strings.
Fixed a test that previously failed with OOM but now succeeds.
Performance:
Benchmarking against old code initial revealed some regressions from
changes in inlining. Force inlining the TupleComparator::operator() and
iterator Next()/Prev() functions helped and performance seems similar or
slightly better on the targeted orderby benchmarks.
Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Reviewed-on: http://gerrit.cloudera.org:8080/2826
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
With the prefetching changes, the probe expressions' local
allocations are no longer freed via QueryMaintenance() in
PHJ. Instead, they are freed explicitly in GetNext() after
an entire probe batch has been processed. Due to this
change in how we handle local allocations of probe expressions,
a DCHECK was added to verify that there is no local allocation
from the probe expression in ProcessBuildInput(). Turns out that
Expr::Open() called in ConstructBuildSide() on the probe
expressions may have caused local allocations to occur for
certain UDFs (e.g. extract()).
This change handles the situation above by freeing local
allocations of the probe expressions once before calling
ProcessBuildInput() in ConstructBuildSide(). A new regression
test is also added for this specific case.
Change-Id: I2096ca3e2093c5ab0ecc0e7ca4cd1b5f3c1ed1ed
Reviewed-on: http://gerrit.cloudera.org:8080/3253
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
In ExprTest::GetValue, we create a local string and then end up returning
a reference to that string, resulting in a memory error. The mistake
wasn't obvious from looking at the code due to the convoluted way
that GetValue and ConvertValue work. This patch modifies GetValue
and ConvertValue to be simpler and eliminates the memory error.
Change-Id: I040179ee44782a22c88b810ff97612aaa89839f4
Reviewed-on: http://gerrit.cloudera.org:8080/3278
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
This commit adds noexcept specifier to some cross-compiled
functions which are known to not throw exceptions. This helps
avoid some exception related instructions (e.g. invoke,
landingpad) in the IR.
Change-Id: I96bd2fec6c14771acae1e700bed958951368ee77
Reviewed-on: http://gerrit.cloudera.org:8080/3256
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Currently, we never populate the errorMessage or sqlState
fields of TGetOperationStatusResp when the GetOperationStatus
HiveServer2 rpc is called. This patch checks if the query has
an error status and if so sets errorMessage and sqlState.
GetOperationStatus also now takes the QueryExecState lock since
QueryExecState::query_state_ and QueryExecState::query_status_
are supposed to be protected by it.
Additionally, this patch performs some cleanup and adds some
documentation around our behavior for updating
QueryExecState::query_state_/query_status_.
This also addresses IMPALA-3298: TGetOperationStatusResp missing
error message when data is expired
Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed
Reviewed-on: http://gerrit.cloudera.org:8080/3094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
This change breaks out runtime filter memory consumption from the
query-wide tracker to improve debuggability of memory limit exceeded
errors.
Testing: ran exhaustive tests, ran local and cluster stress tests.
Change-Id: I9f28f3b55b5c62e6f0f9838c5947c9446d444d20
Reviewed-on: http://gerrit.cloudera.org:8080/3247
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
The bug is that return_val.status is an optional field, so setting
the status without __isset is equivalent to Status::OK(). This
meant that fragment did not get notified when reporting status
if the coordinator had gone away. This means that is a cancel
RPC was lost, we could be left with zombie fragments with no
coordinator that kept on running until completion.
Testing:
I couldn't see a way to replicate this reliably with our existing test
setup, since it requires some RPCs to be dropped to get into this state.
I manually tested by commenting out CancelRemoteFragments(), starting a
long-running query then cancelling it. Before the patch, perf top showed
that the fragments continue to execute the query. After the patch, the
fragments stopped executing quickly.
Change-Id: I62ab6f4df7c0ee60c6aa6291513f9f0cbfac3fe7
Reviewed-on: http://gerrit.cloudera.org:8080/3238
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This change is a pre-requisite for IMPALA-2550.
Change-Id: I0659c94f6b80bd7bbe0bd150ce243f9efa9a41ad
TODO: Write commit message
Reviewed-on: http://gerrit.cloudera.org:8080/3202
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Before this patch, we would first convert the Decimal to Double, then
Double to Timestamp. This resulted in imprecise results.
I ran a benchmark where we read decimal values from a large parquet
table and cast them to timestamp. The new correct implementation is
slightly slower than the old one (101 seconds vs 70 seconds).
Change-Id: Iabeea9f4ab4880b2f814408add63c77916e2dba9
Reviewed-on: http://gerrit.cloudera.org:8080/3154
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
These come with significant memory overhead, meaning that the memory
usage of the debug build diverges significantly from the release build.
We should disable them by default. They can be enable by setting
ENABLE_IMPALA_IR_DEBUG_INFO=true.
Change-Id: Ia5426fe3f8be0b7a100c0c3683c8ef1eaf507146
Reviewed-on: http://gerrit.cloudera.org:8080/3223
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on
reboot on various distributions. This change moves the default location to
FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent
name collisions in case of local test clusters and strangely configured installations.
For local test clusters the minidumps will be written to
$IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}.
Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f
Reviewed-on: http://gerrit.cloudera.org:8080/3171
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The HdfsTableSink usualy creates a HDFS connection to the filesystem
that the base table resides in. However, if we create a partition in
a FS different than that of the base table and set
S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to
a different filesystem with the wrong filesystem connector.
This patch allows the table sink itself to work with different
filesystems by getting rid of a single FS connector and getting a
connector per partition.
This also reenables the multiple_filesystems test and modifies it to
use the unique_database fixture so that parallel runs on the same
bucket do not clash and end up in failures.
This patch also introduces a SECONDARY_FILESYSTEM environment variable
which will be set by the test to allow S3, Isilon and the localFS to
be used as the secondary filesystems.
All jobs with HDFS as the default filesystem need to set the
appropriate environment for S3 and Isilon, i.e. the following:
- export AWS_SECERT_ACCESS_KEY
- export AWS_ACCESS_KEY_ID
- export SECONDARY_FILESYSTEM (to whatever filesystem needs to be
tested)
TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a
lot of similarities. Need to clean them up in a following patch.
Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Reviewed-on: http://gerrit.cloudera.org:8080/3146
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
For kerberized clusters, users expect the Catalog service to use
the kerberos principal instead of operating sytem user that runs
the Catalog process. This patch fixes that.
Change-Id: I842e558e59023c7d937796a4cac51a013d948e02
Reviewed-on: http://gerrit.cloudera.org:8080/3165
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
This change moves the source and header files of squeasel
and mustache to be/src/thirdparty. This is a step towards
removing thirdparty as a preparation to move to ASF.
There is also corresponding change to Impala-lzo to update
its include path.
Change-Id: I782e493bc28086a1587274b3c474ea6b6f201855
Reviewed-on: http://gerrit.cloudera.org:8080/3206
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
Previously including runtime-state.h or exec-env.h pulled in a huge
number of headers. By replacing all of those includes with forward
declarations, we can reduce the number of headers included when building
each source file.
This required various changes, including splitting header files, and in
one case extracting the nested DiskIoMgr::RequestContext class so that
the RequestContext can be instantiated without the full DiskIoMgr
header.
The payoff is that touching many header files results in significantly
smaller incremental builds. E.g. changes to bloom-filter.h only require
recompiling a handful of files, instead of 100+.
Build time of individual files should also be slightly quicker, since
they pull in fewer headers.
Change-Id: I3b246ad9c3681d649e7bfc969c7fa885c6242d84
Reviewed-on: http://gerrit.cloudera.org:8080/3108
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Adds a query option 'strict_mode' which treats integer and
floating pt overflows as parse errors. In the past,
overflows were ignored and the max value was returned. When
this query option is set, overflowing values are treated as if
they were completely invalid data, i.e. NULL is returned.
When abort_on_error is enabled, this means the query is
aborted.
Notes:
* DECIMAL overflow/underflow is already treated as an error.
* The handling in text-converter treats underflows the same
as overflows, so they would result in the same behavior.
However, floating point parsing never returns an underflow
today.
* We may also want to handle numeric values that are truncated
when parsing to integer types, e.g. 10.5 -> 10.
Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836
Reviewed-on: http://gerrit.cloudera.org:8080/3150
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Bug: Impalads crash if we query an Avro table with stale metadata
Cause: This happens because avroSchema_ is not set in HdfsTable,
which is not propagated to the avro scanner and it doesn't have
appropriate checks to make sure the schema is non-null.
The patch fixes the following.
1. Avro scanner should gracefully handle the case where the avro schema
is not set. Appropriate null checks and a meaning error message have
been added.
2. This is a special case with multi-fileformat partitioned tables.
avroSchema_ should be set in HdfsTable even if any subset of the
partitions are backed by avro. Without this patch, we only set it
if the base table file format is Avro.
Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Reviewed-on: http://gerrit.cloudera.org:8080/3136
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
There are multiple places in the code which call
RuntimeState::SetMemLimitExceeded(). Most of them are
unnecessary as the error status constructed will eventually
be propagated up the tree of exec nodes. There is no obvious
reason to treat query memory limit exceeded differently.
In some cases such as scan-node, calling SetMemLimitExceeded()
is actually confusing as all scanner threads may pick up error
status when any thread exceeds query memory limit, causing a
lot of noise in the log.
This change replaces most calls to RuntimeState::SetMemLimitExceeded()
with MemTracker::MemLimitExceeded(). The remaining places are:
the old hash table code, the UDF framework and QueryMaintenance()
which checks for memory limit periodically. The query maintenance
case will be removed eventually once IMPALA-2399 is fixed.
Change-Id: Ic0ca128c768d1e73713866e8c513a1b75e6b4b59
Reviewed-on: http://gerrit.cloudera.org:8080/3140
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This is an incremental improvement towards IMPALA-3090. Where possible
we use MemTracker::MemLimitExceeded() instead of directly constructing
the Status object.
The remaining cases where we directly construct the state are related
the the BufferedBlockMgr, which will be deprecated: either they are
produced by the BufferedBlockMgr, or produced when a Pin() unexpectedly
fails. Both of these will go away anyway.
Change-Id: I77c37f86dd15ace39e28b5cc72d37bc8d4109041
Reviewed-on: http://gerrit.cloudera.org:8080/3148
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Fixes a typo in ImpalaServer::AuthorizeProxyUser where we
check that the 'user' parameters isn't empty twice instead
of also checking the 'do_as_user' parameter.
Change-Id: I8e3962f6f397804e37d4f2c667e97b55bd3ca2bf
Reviewed-on: http://gerrit.cloudera.org:8080/3120
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Sorter can have runaway memory consumption as it never frees
local allocations made in comparator_.Less(). In addition, it
doesn't check for errors generated during expression evaluation
so it may keep sorting even after failures have occurred.
This change fixes the problem by freeing local allocations for
every n invocations of comparator_.Less() where n is the row
batch size specified in the query options. Various error checks
are also added to return early if any error is encountered.
Change-Id: I941729b4836e5dbb827d4313a0b45bc5df2fa8e1
Reviewed-on: http://gerrit.cloudera.org:8080/3116
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
With RHEL5 on AWS EC2 for example, sysconf() returns bad info about
cache line sizes. We should tolerate this instead of bringing down
impalad.
Change-Id: Id4d61b05fe213028a7e9aaabe98adc2792b90e07
Reviewed-on: http://gerrit.cloudera.org:8080/3111
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Enabling this revealed a latent bug where a #include was wrapped in the
impala namespace, resulting in the functions being defined in the wrong
namespace.
Change-Id: If723167b2d03da7592b64a204e31e81ea868e4f2
Reviewed-on: http://gerrit.cloudera.org:8080/3024
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
With this commit runtime filters can be assigned to multiple destination
nodes (scans). For each filter, the destination nodes are determined
using equivalent classes during planning. For each filter, all its
destination nodes are in the left subtree rooted at the join node
that constructs this filter. A runtime filter may have both
local and remote targets. The backend determines how to route each
filter depending on the number and type (local, remote) of its destination
nodes.
With this commit, we enable runtime filter propagation in all the
operands of UNION [ALL|DISTINCT] nodes.
Change-Id: Iad2ce4e579a30616c469312a4e658140d317507b
Reviewed-on: http://gerrit.cloudera.org:8080/2932
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
This addresses the regression for small-ndv aggs resulting from
prefetching. The idea is that for small-ndv aggs prefetching
increases the # of instructions and memory references, but
doesn't provide any compensating benefit. This change replaces
constant values in the hash-table code, which reduces the
instruction count and # of memory references in aggregations.
Preliminary perf results show that a low-NDV decimal agg is
around 20% faster (2.1s -> 1.7s) and a high-NDV decimal agg
is around 7% faster (15s -> 14s). I haven't investigated
how much of the speedup is reduced codegen time.
Change-Id: I483a19662c90ca54bc21d60fd6ba97dbed93eaef
Reviewed-on: http://gerrit.cloudera.org:8080/3088
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.
We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.
There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().
Co-author: Michael Ho <kwho@cloudera.com>
Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Reviewed-on: http://gerrit.cloudera.org:8080/3070
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The thrift client incorrectly wraps the TSaslTransport
around the TBufferedTransport which leads to significant
performance issues. (Note that the server-side wraps the
transports in the correct order already.)
Currently: TSaslTransport(TBufferedTransport(socket))
Should be: TBufferedTransport(TSaslTransport(socket))
As a result, when we write a structure, we end up doing lots
of write calls which hit the TSaslTransport which does no
buffering. So it ends up producing output that looks like:
[0, 0, 0, 1], <one char>, [0, 0, 0, 1], <one char>, etc.
for each individual write call.
These end up buffered so we don't get lots of tiny packets
on the send side. However, on the receiver side we are doing
one recv call per Sasl frame.
This patch reorders the wrapping of transports in the thrift
client, so that it matches the order on the thrift server
which improves exhange performance making it within 10% of
non-kerberos.
Change-Id: I81d30b3d8d10fe6dcd8eb88cca49734af09f9d91
Reviewed-on: http://gerrit.cloudera.org:8080/3093
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
In 2.5 we added the ability to set per-pool default query
options. A string of key-value pairs can be specified with a
pool configuration. However, if any options fail to parse,
then all the options are ignored. We want that behavior (and
returning an error) when parsing the process-wide default
query options on startup and when parsing the options sent
from a client (e.g. in beeswax server) because an error can
be returned immediately for the triggering action at that
time (i.e. starting the impalad or submitting a query with
the options set). This behavior is bad for the pool default
query options because (a) the configuration is set by the
administrator and there's nothing we can do until a query is
submitted and (b) one invalid option shouldn't mean that
other valid options aren't set.
Change-Id: If04733b775963091b0314c65286df126fd812358
Reviewed-on: http://gerrit.cloudera.org:8080/3056
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This change pipelines the code which probes the hash tables.
This is based on the idea which Mostafa presented earlier.
Essentially, all rows in a row batch will be evaluated and
hashed first before being probed against the hash tables.
Hash table buckets are prefetched as hash values of rows are
computed.
To avoid re-evaluating the rows again during probing (as the rows
have been evaluated once to compute the hash values), hash table
context has been updated to cache the evaluated expression values,
null bits and hash values of some number of rows. Hash table context
provies a new iterator like interface to iterate through the cached
values.
A PREFETCH_MODE query option has also been added to disable prefetching
if necessary. The default mode is 1 which means hash table buckets will
be prefetched. In the future, this mode may be extended to support hash
table buckets' data prefetching too.
Combined with the build side prefetching, a self join of table lineitem
improves by 40% on a single node run on average:
select count(*)
from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
o1.l_linenumber = o2.l_linenumber;
Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Reviewed-on: http://gerrit.cloudera.org:8080/2959
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Several places in the tests didn't check return statuses.
KUDU_ASSERT_OK can only be used in functions that return void,
KUDU_CHECK_OK is used otherwise.
2) The forward declared "class ColumnType" should have actually been a
struct.
Now there aren't any more Kudu related warnings from clang.
Change-Id: Id3e2f5ec9925c3cf81c7f4048decc6a5f97eee66
Reviewed-on: http://gerrit.cloudera.org:8080/3062
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The information in the JIRA is consistent with a failure to allocate
memory for the def level cache. There was a bug where this failure status
was not properly propagated, so eventually a DCHECK was hit that expected
the cache memory to be allocated.
Change-Id: I38856e6e1f5fbdbf5327cf31a2a109e6c930901d
Reviewed-on: http://gerrit.cloudera.org:8080/3065
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The args[] buffer was too small. I also reverted to the usual style of
commenting out each unused argument Value*, which makes it slightly easier
to spot this kind of bug.
Change-Id: Ic2546b8f42ac0a4e0715b134c384ccf311f663c2
Reviewed-on: http://gerrit.cloudera.org:8080/3051
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
In order to perform round-robin backend selection, the simple
scheduler uses an iterator to the next backend entry to be selected.
This iterator needs to be reset whenever it is invalidated by changes
to the underlying map. The current behavior resets the pointer on
every message of the statestored, even if the message was empty and
thus did not result in any changes to the map.
After every reset of the iterator round-robin selection starts from
the first backend in the scheduler's backend map. As the statestored
sends empty keepalive messages every couple of seconds, this
effectively limits scheduling of remote reads to only a few backends.
This change introduces a check to prevent those unnecessary iterator
resets, which will spread remote reads more evenly over all backends.
Change-Id: I831d485b46c7d9460fb014a302a26864b6bd573e
Reviewed-on: http://gerrit.cloudera.org:8080/2330
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/3031
This patch improves our ByteSwap() function by handling
more byte sizes in the fast path, as opposed to the
loop-based slow path.
ByteSwap() is used heavily in when scanning Parquet decimals.
Before this patch, VTune showed ByteSwap() among the top
three worst cycle offenders when running TPCH-Q6 on my local
setup with a large lineitem table.
After this patch, ByteSwap() shows no significant contribution
to the overall cycles spent.
There was a measurable improvement of a few percent for TPCH-Q6.
Change-Id: I4f462e6bdb022db46b48889a6a7426120a80d9b4
Reviewed-on: http://gerrit.cloudera.org:8080/3033
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The problem: varlen data (e.g. strings) produced by aggregations is
freed by FreeLocalAllocations() after passing up the output
batch. This works for streaming operators or blocking operators that
copy their input, but results in memory corruption when the output
reaches non-copying blocking operators, e.g. SubplanNode and
NestedLoopJoinNode.
The fix: this patch makes the PartitionedAggregationNode copy out
produced string data if the node is in a subplan. Otherwise it calls
MarkNeedsToReturn() on the output batch. Marking the batch would work
in the subplan case as well, but would likely be less efficient since
it would result in many small batches coming out of the subplan.
The patch includes a test case. However, this test only exposes the
problem with an ASAN build and the --disable_mem_pools flag, which we
don't currently have automated testing for.
Change-Id: Iada891504c261ba54f4eb8c9d7e4e5223668d7b9
Reviewed-on: http://gerrit.cloudera.org:8080/2929
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.
Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
The lifetime of a scanner thread is decoupled from that of row batches that
it produces. That means that all resources associated with row batches
produced by the scanner thread should be transferred to those batches.
The bug was that we were not transferring the ownership of memory from the
scratch batch to the final row batch returned in HdfsParquetScanner::Close().
Triggering an event that would cause the freed memory to be dereferenced is
possible, but very difficult. My understanding is that it is only possible
in exceptional non-deterministic scenarios, e.g., a query is cancelled just
at the right time, or the scanner hits a parse/decoding error.
Testing: I tested this change locally by running the scanner and nested
types test as well as TPCH, nested TPCH, and TPC-DS.
Change-Id: Ic34d32c9a41ea66b2b2d8f5e187cc84d4cb569b2
Reviewed-on: http://gerrit.cloudera.org:8080/3041
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
We observed that some spilling joins started returning incorrect
results. The behaviour seems to happen when a codegen'd insert and a
non-codegen'd probe function is used (or vice-versa). This only seems to
happen in a subset of cases.
The bug appears to be a result of the implicit cast of the uint32_t seed
value to the int32_t hash argument to HashTable::Hash(). The behaviour
is unspecified if the uint32_t does not fit in the int32_t. In Murmur
hash, this value is subsequently cast to a uint64_t, so we have a chain
of uint32_t->int32_t->uint64_t conversions. It would require a very
careful reading of the C++ standard to understand what the expected
result is, and whether we're seeing a compiler bug or just unspecified
behaviour, but we can avoid it entirely by keeping the values unsigned.
Testing:
I was able to reproduce the issue under a very specific of circumstances,
listed below. Before this change it consistently returned 0 rows. After the
change it consistently returned the correct results. I haven't had much
luck creating a suitable regression test.
* 1 impalad
* --disable_mem_pools=true
* use tpch_20_parquet;
* set mem_limit=1275mb;
* TPC-H query 7:
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from (
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
year(l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between '1995-01-01' and '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year
Change-Id: I952638dc94119a4bc93126ea94cc6a3edf438956
Reviewed-on: http://gerrit.cloudera.org:8080/3034
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch adds two query options for runtime filters:
RUNTIME_FILTER_MAX_SIZE
RUNTIME_FILTER_MIN_SIZE
These options define the minimum and maximum filter sizes for a filter,
no matter what the estimates produced by the planner are. Filter sizes
are rounded up to the nearest power of two.
Change-Id: I5c13c200a0f1855f38a5da50ca34a737e741868b
Reviewed-on: http://gerrit.cloudera.org:8080/2966
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This change builds on top of the recent move to column-wise
materialization of scalar values in the Parquet scanner.
The goal of this patch is to improve the scan efficiency, and
show the future direction for all column readers.
Major TODO:
The current patch has minor code duplication/redundancy,
and the new ReadValueBatch() departs from (but improves) the
existing column reader control flow. To improve code reuse
and readability we should overhaul all column readers to be
more uniform.
Summary of changes:
- refactor ReadValueBatch() to simplify control flow
- introduce caching of def/rep levels for faster level
decoding, and for a tigher value materialization loop
- new templated function for value materialization that
takes the value encoding as a template argument
Mini benchmark vs. cdh5-trunk
I ran the following queries on a single impalad before and after my
change using a synthetic 'huge_lineitem' table.
I modified hdfs-scan-node.cc to set the number of rows of any row
batch to 0 to focus the measurement on the scan time.
Query options:
set num_scanner_threads=1;
set disable_codegen=true;
set num_nodes=1;
select * from huge_lineitem;
Before: 22.39s
Afer: 13.62s
select * from huge_lineitem where l_linenumber < 0;
Before: 25.11s
After: 17.73s
select * from huge_lineitem where l_linenumber % 2 = 0;
Before: 26.32s
After: 16.68s
select l_linenumber from huge_lineitem;
Before: 1.74s
After: 0.92s
Testing:
I ran a private exhaustive build and all tests passed.
Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30
Reviewed-on: http://gerrit.cloudera.org:8080/2843
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.
Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins