Commit Graph

2562 Commits

Author SHA1 Message Date
Hayabusa-intel
4e7172f6f5 IMPALA-2459: Implement next_day date/time UDF
Returns the date of the weekday that follows a particular date.
The weekday argument is a string literal indicating the day of the week.
Also this argument is case-insensitive. Available values are:
"Sunday"/"SUN", "Monday"/"MON", "Tuesday"/"TUE",
"Wednesday"/"WED", "Thursday"/"THU", "Friday"/"FRI", "Saturday"/"SAT".
For example, the first Saturday after Wednesday, 25 December 2013
is on 28 December 2013.
select next_day('2013-12-25','Saturday') returns '2013-12-28 00:00:00'
select next_day(to_timestamp('08-1987-21', 'MM-yyyy-dd'), 'FRIDAY')
returns '1987-08-28 00:00:00'

Change-Id: I2721d236c096639a9e7d2df8a45ca888c6b3e83e
Reviewed-on: http://gerrit.cloudera.org:8080/1943
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
2016-06-09 04:30:48 -07:00
Alex Behm
025fd3bd7f IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0.
Adds handling and testing for a specific Parquet data corruption
scenario with plain dictionary encoded values.

The problematic scenario is when the repeat or literal count of
the RLE-encoded dictionary indexes is decoded as 0 - an invalid value.

There are several other cases of data corruption that are not yet
handled gracefully. This patch only handles one specific case.

Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
Reviewed-on: http://gerrit.cloudera.org:8080/3299
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-06-07 17:29:59 -07:00
Michael Ho
86ff18eee9 IMPALA-3223: Removal of non-toolchain builds.
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).

$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.

By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.

Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-06-07 17:29:59 -07:00
Tim Armstrong
d23e5505c8 IMPALA-3670: fix sorter buffer mgmt bugs
Also make test_scratch_disk.py more deterministic, by using
max_block_mgr_memory, which doesn't include scanner memory.
The fixed test_scratch_disk.py exercises the other sorter bugs
that occurs when scratch cannot be written.

Testing:
Added a test that does a sort with various memory limits and consumes
the whole output of the sorter (we have many tests of sorts with limits
but limited coverage of sorts without limits).  Ran an exhaustive test
run before posting for review.

This added test reproduced one of the sorter bugs, where var-len blocks
were not always attached to the output batch. The other test was
reproduced by the test change in IMPALA-3669: test_scratch_disk fix.

Change-Id: Ia1a0ddffa0a5b157ab86a376b7b7360a923698d6
Reviewed-on: http://gerrit.cloudera.org:8080/3315
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-06 22:34:19 -07:00
Tim Armstrong
ee53ddb389 IMPALA-1346/1590/2344: fix sorter buffer mgmt when spilling
The Sorter's memory management logic failed to correctly manage buffers
when spilling. It would try to make use of all buffers in the system,
neglecting to account for other operators' buffer usage.

This patch adjusts the logic so that it handles contention for buffers
so long as it can get enough buffers to make progress. Instead of
precalculating the number of buffers it thinks it should be able to
pin, it just makes a best-effort attempt to pin the initial buffers
as many runs as possible, up to a limit. As long as it can pin three
runs, it can make progress.

Testing:
Added an additional test that failed before the patch without OOM.
An analytic function test that was meant to fail also started succeeding
so I had to adjust the limit there too.

Change-Id: Idfe55cc13c7f2b54cba1d05ade44cbcf6bb573c0
Reviewed-on: http://gerrit.cloudera.org:8080/2908
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-06 17:34:07 -07:00
Tim Armstrong
37ec25396f IMPALA-3344: Simplify sorter and document/enforce invariants.
Clarify relationships between classes, clean up the previous mess
where every class was friends with the other so there's an actual
distinction between public and private members. TupleIterator
is now no longer tied to TupleSorter, just Run.

Document and enforce invariants in many cases.

Factor out some functions from large functions.

Simplify and document iterator logic.

Make management of buffers when iterating over output stream more
explicitly correct: either use MarkNeedToReturn() or attach block
to the batch as appropriate. The SortedRunMerger didn't handle
resource transfer correctly, except if all the memory came from
the batch's MemPool. This patch fixes the cases when resources
are attached to the batches, but not the 'need_to_return' case.
Document that SortedRunMerger requires 'deep_copy_input' to be true
if batches can have the 'need_to_return' flag set.

Also use the atomic block exchange operation when moving between
blocks in unpinned runs to prevent pin failures at that point.
I explicitly have avoided changing the hairy block management logic
when allocating buffers for merging, that will need addressing in
a follow-up patch.

Add a SpilledRuns counter so that it's more explicit that spilling
occurred.

Testing:
Added some tests for corner cases with empty and NULL strings.
Fixed a test that previously failed with OOM but now succeeds.

Performance:
Benchmarking against old code initial revealed some regressions from
changes in inlining. Force inlining the TupleComparator::operator() and
iterator Next()/Prev() functions helped and performance seems similar or
slightly better on the targeted orderby benchmarks.

Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Reviewed-on: http://gerrit.cloudera.org:8080/2826
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-06-02 21:33:08 -07:00
Michael Ho
b14ca6d09f IMPALA-3645: Free probe expressions' local allocations in ConstructBuildSide()
With the prefetching changes, the probe expressions' local
allocations are no longer freed via QueryMaintenance() in
PHJ. Instead, they are freed explicitly in GetNext() after
an entire probe batch has been processed. Due to this
change in how we handle local allocations of probe expressions,
a DCHECK was added to verify that there is no local allocation
from the probe expression in ProcessBuildInput(). Turns out that
Expr::Open() called in ConstructBuildSide() on the probe
expressions may have caused local allocations to occur for
certain UDFs (e.g. extract()).

This change handles the situation above by freeing local
allocations of the probe expressions once before calling
ProcessBuildInput() in ConstructBuildSide(). A new regression
test is also added for this specific case.

Change-Id: I2096ca3e2093c5ab0ecc0e7ca4cd1b5f3c1ed1ed
Reviewed-on: http://gerrit.cloudera.org:8080/3253
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-06-02 09:32:54 -07:00
Thomas Tauber-Marshall
710fa06b7c IMPALA-3639: expr-test fails on ASAN
In ExprTest::GetValue, we create a local string and then end up returning
a reference to that string, resulting in a memory error. The mistake
wasn't obvious from looking at the code due to the convoluted way
that GetValue and ConvertValue work. This patch modifies GetValue
and ConvertValue to be simpler and eliminates the memory error.

Change-Id: I040179ee44782a22c88b810ff97612aaa89839f4
Reviewed-on: http://gerrit.cloudera.org:8080/3278
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-06-02 09:32:54 -07:00
Michael Ho
5f3996e6d1 IMPALA-3181: Add noexcept to some functions
This commit adds noexcept specifier to some cross-compiled
functions which are known to not throw exceptions. This helps
avoid some exception related instructions (e.g. invoke,
landingpad) in the IR.

Change-Id: I96bd2fec6c14771acae1e700bed958951368ee77
Reviewed-on: http://gerrit.cloudera.org:8080/3256
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-06-02 09:32:54 -07:00
Thomas Tauber-Marshall
5231301084 IMPALA-1633: GetOperationStatus should set errorMessage and sqlState
Currently, we never populate the errorMessage or sqlState
fields of TGetOperationStatusResp when the GetOperationStatus
HiveServer2 rpc is called. This patch checks if the query has
an error status and if so sets errorMessage and sqlState.

GetOperationStatus also now takes the QueryExecState lock since
QueryExecState::query_state_ and QueryExecState::query_status_
are supposed to be protected by it.

Additionally, this patch performs some cleanup and adds some
documentation around our behavior for updating
QueryExecState::query_state_/query_status_.

This also addresses IMPALA-3298: TGetOperationStatusResp missing
error message when data is expired

Change-Id: Icb792f88286779fcf2ce409828de818bc4e80bed
Reviewed-on: http://gerrit.cloudera.org:8080/3094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-06-01 19:32:39 -07:00
Tim Armstrong
585ee48dc7 IMPALA-3647: track runtime filter memory in separate tracker
This change breaks out runtime filter memory consumption from the
query-wide tracker to improve debuggability of memory limit exceeded
errors.

Testing: ran exhaustive tests, ran local and cluster stress tests.

Change-Id: I9f28f3b55b5c62e6f0f9838c5947c9446d444d20
Reviewed-on: http://gerrit.cloudera.org:8080/3247
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:12 -07:00
Tim Armstrong
4edb8bb60d IMPALA-3633: cancel fragment if coordinator is gone
The bug is that return_val.status is an optional field, so setting
the status without __isset is equivalent to Status::OK(). This
meant that fragment did not get notified when reporting status
if the coordinator had gone away. This means that is a cancel
RPC was lost, we could be left with zombie fragments with no
coordinator that kept on running until completion.

Testing:
I couldn't see a way to replicate this reliably with our existing test
setup, since it requires some RPCs to be dropped to get into this state.
I manually tested by commenting out CancelRemoteFragments(), starting a
long-running query then cancelling it. Before the patch, perf top showed
that the fragments continue to execute the query. After the patch, the
fragments stopped executing quickly.

Change-Id: I62ab6f4df7c0ee60c6aa6291513f9f0cbfac3fe7
Reviewed-on: http://gerrit.cloudera.org:8080/3238
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:12 -07:00
Lars Volker
5be7c68ed8 IMPALA-3627: Clean up RPC structures in ImpalaInternalService
This change is a pre-requisite for IMPALA-2550.

Change-Id: I0659c94f6b80bd7bbe0bd150ce243f9efa9a41ad
TODO: Write commit message
Reviewed-on: http://gerrit.cloudera.org:8080/3202
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:12 -07:00
Taras Bobrovytsky
98d7b8a90d IMPALA-3163: Fix Decimal to Timestamp casting
Before this patch, we would first convert the Decimal to Double, then
Double to Timestamp. This resulted in imprecise results.

I ran a benchmark where we read decimal values from a large parquet
table and cast them to timestamp. The new correct implementation is
slightly slower than the old one (101 seconds vs 70 seconds).

Change-Id: Iabeea9f4ab4880b2f814408add63c77916e2dba9
Reviewed-on: http://gerrit.cloudera.org:8080/3154
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Tim Armstrong
4896895988 IMPALA-3619: disable IR symbols by default
These come with significant memory overhead, meaning that the memory
usage of the debug build diverges significantly from the release build.

We should disable them by default. They can be enable by setting
ENABLE_IMPALA_IR_DEBUG_INFO=true.

Change-Id: Ia5426fe3f8be0b7a100c0c3683c8ef1eaf507146
Reviewed-on: http://gerrit.cloudera.org:8080/3223
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Lars Volker
d16e83214a IMPALA-3581: Change location of minidump folders to log_dir
Currently the default minidump location is /tmp/impala-minidumps, which can be wiped on
reboot on various distributions. This change moves the default location to
FLAGS_log_dir/minidumps/$daemon. The additional trailing $daemon folder is kept to prevent
name collisions in case of local test clusters and strangely configured installations.

For local test clusters the minidumps will be written to
$IMPALA_HOME/logs/cluster/minidumps/{catalogd,impalad,statestored}.

Change-Id: Idecf5a314bfb8b0870e8aa4819c4fb39a107702f
Reviewed-on: http://gerrit.cloudera.org:8080/3171
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Sailesh Mukil
6f1fe4ebe7 IMPALA-3577, IMPALA-3486: Partitions on multiple filesystems breaks with S3_SKIP_INSERT_STAGING
The HdfsTableSink usualy creates a HDFS connection to the filesystem
that the base table resides in. However, if we create a partition in
a FS different than that of the base table and set
S3_SKIP_INSERT_STAGING to "true", the table sink will try to write to
a different filesystem with the wrong filesystem connector.

This patch allows the table sink itself to work with different
filesystems by getting rid of a single FS connector and getting a
connector per partition.

This also reenables the multiple_filesystems test and modifies it to
use the unique_database fixture so that parallel runs on the same
bucket do not clash and end up in failures.

This patch also introduces a SECONDARY_FILESYSTEM environment variable
which will be set by the test to allow S3, Isilon and the localFS to
be used as the secondary filesystems.

All jobs with HDFS as the default filesystem need to set the
appropriate environment for S3 and Isilon, i.e. the following:
 - export AWS_SECERT_ACCESS_KEY
 - export AWS_ACCESS_KEY_ID
 - export SECONDARY_FILESYSTEM (to whatever filesystem needs to be
   tested)

TODO: SECONDARY_FILESYSTEM and FILESYSTEM_PREFIX and NAMENODE have a
lot of similarities. Need to clean them up in a following patch.

Change-Id: Ib13b610eb9efb68c83894786cea862d7eae43aa7
Reviewed-on: http://gerrit.cloudera.org:8080/3146
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:11 -07:00
Tim Armstrong
8d2320df26 IMPALA-3597: mislabelled cache levels on debug webpage
Change-Id: I638f518b6f460bea6724c1b1efd4c4aefecf5219
Reviewed-on: http://gerrit.cloudera.org:8080/3210
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:10 -07:00
Bharath Vissapragada
e26dc85684 IMPALA-3554: Use kerberos principal in SentryProxy class
For kerberized clusters, users expect the Catalog service to use
the kerberos principal instead of operating sytem user that runs
the Catalog process. This patch fixes that.

Change-Id: I842e558e59023c7d937796a4cac51a013d948e02
Reviewed-on: http://gerrit.cloudera.org:8080/3165
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-31 23:32:10 -07:00
Michael Ho
0b7ae6e4eb IMPALA-3223: Relocate squeasel and mustache directories
This change moves the source and header files of squeasel
and mustache to be/src/thirdparty. This is a step towards
removing thirdparty as a preparation to move to ASF.

There is also corresponding change to Impala-lzo to update
its include path.

Change-Id: I782e493bc28086a1587274b3c474ea6b6f201855
Reviewed-on: http://gerrit.cloudera.org:8080/3206
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-05-31 23:31:41 -07:00
Tim Armstrong
6198d9262e Refactor RuntimeState and ExecEnv dependencies
Previously including runtime-state.h or exec-env.h pulled in a huge
number of headers. By replacing all of those includes with forward
declarations, we can reduce the number of headers included when building
each source file.

This required various changes, including splitting header files, and in
one case extracting the nested DiskIoMgr::RequestContext class so that
the RequestContext can be instantiated without the full DiskIoMgr
header.

The payoff is that touching many header files results in significantly
smaller incremental builds. E.g. changes to bloom-filter.h only require
recompiling a handful of files, instead of 100+.

Build time of individual files should also be slightly quicker, since
they pull in fewer headers.

Change-Id: I3b246ad9c3681d649e7bfc969c7fa885c6242d84
Reviewed-on: http://gerrit.cloudera.org:8080/3108
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-25 19:41:45 -07:00
Matthew Jacobs
f413e236a8 IMPALA-3579: Strict handling of numeric overflow in text parsing
Adds a query option 'strict_mode' which treats integer and
floating pt overflows as parse errors. In the past,
overflows were ignored and the max value was returned. When
this query option is set, overflowing values are treated as if
they were completely invalid data, i.e. NULL is returned.
When abort_on_error is enabled, this means the query is
aborted.

Notes:
* DECIMAL overflow/underflow is already treated as an error.
* The handling in text-converter treats underflows the same
  as overflows, so they would result in the same behavior.
  However, floating point parsing never returns an underflow
  today.
* We may also want to handle numeric values that are truncated
  when parsing to integer types, e.g. 10.5 -> 10.

Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836
Reviewed-on: http://gerrit.cloudera.org:8080/3150
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:20 -07:00
Bharath Vissapragada
49610e2cfa IMPALA-3314/IMPALA-3513: Fix querying tables/partitions altered to Avro format
Bug: Impalads crash if we query an Avro table with stale metadata

Cause: This happens because avroSchema_ is not set in HdfsTable,
which is not propagated to the avro scanner and it doesn't have
appropriate checks to make sure the schema is non-null.

The patch fixes the following.

1. Avro scanner should gracefully handle the case where the avro schema
   is not set. Appropriate null checks and a meaning error message have
   been added.

2. This is a special case with multi-fileformat partitioned tables.
   avroSchema_ should be set in HdfsTable even if any subset of the
   partitions are backed by avro. Without this patch, we only set it
   if the base table file format is Avro.

Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Reviewed-on: http://gerrit.cloudera.org:8080/3136
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:20 -07:00
Michael Ho
0243a21da8 IMPALA-3242: Remove most usages of RuntimeState::SetMemLimitExceeded()
There are multiple places in the code which call
RuntimeState::SetMemLimitExceeded(). Most of them are
unnecessary as the error status constructed will eventually
be propagated up the tree of exec nodes. There is no obvious
reason to treat query memory limit exceeded differently.
In some cases such as scan-node, calling SetMemLimitExceeded()
is actually confusing as all scanner threads may pick up error
status when any thread exceeds query memory limit, causing a
lot of noise in the log.

This change replaces most calls to RuntimeState::SetMemLimitExceeded()
with MemTracker::MemLimitExceeded(). The remaining places are:
the old hash table code, the UDF framework and QueryMaintenance()
which checks for memory limit periodically. The query maintenance
case will be removed eventually once IMPALA-2399 is fixed.

Change-Id: Ic0ca128c768d1e73713866e8c513a1b75e6b4b59
Reviewed-on: http://gerrit.cloudera.org:8080/3140
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Tim Armstrong
7d5d36a6e4 Use MemTracker::MemLimitExceeded() where appropriate
This is an incremental improvement towards IMPALA-3090. Where possible
we use MemTracker::MemLimitExceeded() instead of directly constructing
the Status object.

The remaining cases where we directly construct the state are related
the the BufferedBlockMgr, which will be deprecated: either they are
produced by the BufferedBlockMgr, or produced when a Pin() unexpectedly
fails. Both of these will go away anyway.

Change-Id: I77c37f86dd15ace39e28b5cc72d37bc8d4109041
Reviewed-on: http://gerrit.cloudera.org:8080/3148
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Tim Armstrong
1ccfc45d41 IMPALA-3569: handle errors in timezone db initialization
Change-Id: I6b4d5e6b992ea023f801edb7b487e57f39920c03
Reviewed-on: http://gerrit.cloudera.org:8080/3125
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Thomas Tauber-Marshall
51869eac56 IMPALA-3542: do_as_user empty check missing
Fixes a typo in ImpalaServer::AuthorizeProxyUser where we
check that the 'user' parameters isn't empty twice instead
of also checking the 'do_as_user' parameter.

Change-Id: I8e3962f6f397804e37d4f2c667e97b55bd3ca2bf
Reviewed-on: http://gerrit.cloudera.org:8080/3120
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Michael Ho
f7501d2ec1 IMPALA-3332: Free local allocations in sorter.
Sorter can have runaway memory consumption as it never frees
local allocations made in comparator_.Less(). In addition, it
doesn't check for errors generated during expression evaluation
so it may keep sorting even after failures have occurred.

This change fixes the problem by freeing local allocations for
every n invocations of comparator_.Less() where n is the row
batch size specified in the query options. Various error checks
are also added to return early if any error is encountered.

Change-Id: I941729b4836e5dbb827d4313a0b45bc5df2fa8e1
Reviewed-on: http://gerrit.cloudera.org:8080/3116
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:18 -07:00
Tim Armstrong
38416eeeb9 IMPALA-3546: don't die if sysconf() reports bogus cache info
With RHEL5 on AWS EC2 for example, sysconf() returns bad info about
cache line sizes. We should tolerate this instead of bringing down
impalad.

Change-Id: Id4d61b05fe213028a7e9aaabe98adc2792b90e07
Reviewed-on: http://gerrit.cloudera.org:8080/3111
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:15 -07:00
Tim Armstrong
cb2a3aacd6 Turn on C++14 in cross-compiled code
Enabling this revealed a latent bug where a #include was wrapped in the
impala namespace, resulting in the functions being defined in the wrong
namespace.

Change-Id: If723167b2d03da7592b64a204e31e81ea868e4f2
Reviewed-on: http://gerrit.cloudera.org:8080/3024
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-18 14:40:34 -07:00
Dimitris Tsirogiannis
f992dc7f88 IMPALA-2956: Filters should be able to target multiple scan nodes
With this commit runtime filters can be assigned to multiple destination
nodes (scans). For each filter, the destination nodes are determined
using equivalent classes during planning. For each filter, all its
destination nodes are in the left subtree rooted at the join node
that constructs this filter. A runtime filter may have both
local and remote targets. The backend determines how to route each
filter depending on the number and type (local, remote) of its destination
nodes.

With this commit, we enable runtime filter propagation in all the
operands of UNION [ALL|DISTINCT] nodes.

Change-Id: Iad2ce4e579a30616c469312a4e658140d317507b
Reviewed-on: http://gerrit.cloudera.org:8080/2932
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-05-18 01:40:22 -07:00
Tim Armstrong
265e39f89a IMPALA-3168: replace HashTable parameters with constants
This addresses the regression for small-ndv aggs resulting from
prefetching. The idea is that for small-ndv aggs prefetching
increases the # of instructions and memory references, but
doesn't provide any compensating benefit. This change replaces
constant values in the hash-table code, which reduces the
instruction count and # of memory references in aggregations.

Preliminary perf results show that a low-NDV decimal agg is
around 20% faster (2.1s -> 1.7s) and a high-NDV decimal agg
is around 7% faster (15s -> 14s). I haven't investigated
how much of the speedup is reduced codegen time.

Change-Id: I483a19662c90ca54bc21d60fd6ba97dbed93eaef
Reviewed-on: http://gerrit.cloudera.org:8080/3088
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2016-05-17 10:09:06 -07:00
Tim Armstrong
9f4276eea8 IMPALA-3286: prefetching for PartitionedAggregationNode
This patch builds on top of the prefetching infrastructure to add
prefetching to PartitionedAggregationNode. Input batches are evaluated
in prefetch groups and hash table buckets are prefetched if the
prefetch_mode query option is set to HT_BUCKET.

We avoid some pointer indirections on the critical path by caching hash
tables in a 'hash_tbls_' array.

There is also a bit of cleanup to directly instantiate the templated
ProcessBatch() method to remove the ProcessBatch_true() and
ProcessBatch_false() hack, and also to separate out
ProcessBatchNoGrouping() so that it doesn't have to have the same
argument list as ProcessBatch().

Co-author: Michael Ho <kwho@cloudera.com>

Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Reviewed-on: http://gerrit.cloudera.org:8080/3070
Reviewed-by: Michael Ho <kwho@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 10:09:06 -07:00
Matthew Jacobs
9172f4b824 IMPALA-1928: Fix Thrift client transport wrapping order
The thrift client incorrectly wraps the TSaslTransport
around the TBufferedTransport which leads to significant
performance issues. (Note that the server-side wraps the
transports in the correct order already.)

Currently: TSaslTransport(TBufferedTransport(socket))
Should be: TBufferedTransport(TSaslTransport(socket))

As a result, when we write a structure, we end up doing lots
of write calls which hit the TSaslTransport which does no
buffering. So it ends up producing output that looks like:
[0, 0, 0, 1], <one char>, [0, 0, 0, 1], <one char>, etc.
for each individual write call.

These end up buffered so we don't get lots of tiny packets
on the send side. However, on the receiver side we are doing
one recv call per Sasl frame.

This patch reorders the wrapping of transports in the thrift
client, so that it matches the order on the thrift server
which improves exhange performance making it within 10% of
non-kerberos.

Change-Id: I81d30b3d8d10fe6dcd8eb88cca49734af09f9d91
Reviewed-on: http://gerrit.cloudera.org:8080/3093
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 10:09:06 -07:00
Matthew Jacobs
f067929f3a IMPALA-3535: Ignore invalid per-pool default query options
In 2.5 we added the ability to set per-pool default query
options. A string of key-value pairs can be specified with a
pool configuration. However, if any options fail to parse,
then all the options are ignored. We want that behavior (and
returning an error) when parsing the process-wide default
query options on startup and when parsing the options sent
from a client (e.g. in beeswax server) because an error can
be returned immediately for the triggering action at that
time (i.e. starting the impalad or submitting a query with
the options set). This behavior is bad for the pool default
query options because (a) the configuration is set by the
administrator and there's nothing we can do until a query is
submitted and (b) one invalid option shouldn't mean that
other valid options aren't set.

Change-Id: If04733b775963091b0314c65286df126fd812358
Reviewed-on: http://gerrit.cloudera.org:8080/3056
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 10:09:05 -07:00
Michael Ho
a59408b575 IMPALA-3286: Prefetching for PHJ probing.
This change pipelines the code which probes the hash tables.
This is based on the idea which Mostafa presented earlier.
Essentially, all rows in a row batch will be evaluated and
hashed first before being probed against the hash tables.
Hash table buckets are prefetched as hash values of rows are
computed.

To avoid re-evaluating the rows again during probing (as the rows
have been evaluated once to compute the hash values), hash table
context has been updated to cache the evaluated expression values,
null bits and hash values of some number of rows. Hash table context
provies a new iterator like interface to iterate through the cached
values.

A PREFETCH_MODE query option has also been added to disable prefetching
if necessary. The default mode is 1 which means hash table buckets will
be prefetched. In the future, this mode may be extended to support hash
table buckets' data prefetching too.

Combined with the build side prefetching, a self join of table lineitem
improves by 40% on a single node run on average:

select count(*)
from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
      o1.l_linenumber = o2.l_linenumber;

Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Reviewed-on: http://gerrit.cloudera.org:8080/2959
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 01:30:12 -07:00
Casey Ching
b634a55b92 Kudu: Fix warnings from clang
Changes:
1) Several places in the tests didn't check return statuses.
   KUDU_ASSERT_OK can only be used in functions that return void,
   KUDU_CHECK_OK is used otherwise.
2) The forward declared "class ColumnType" should have actually been a
   struct.

Now there aren't any more Kudu related warnings from clang.

Change-Id: Id3e2f5ec9925c3cf81c7f4048decc6a5f97eee66
Reviewed-on: http://gerrit.cloudera.org:8080/3062
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 01:30:12 -07:00
Alex Behm
b4558d384e IMPALA-3539: Return error status if def/rep level caches failed to allocate.
The information in the JIRA is consistent with a failure to allocate
memory for the def level cache. There was a bug where this failure status
was not properly propagated, so eventually a DCHECK was hit that expected
the cache memory to be allocated.

Change-Id: I38856e6e1f5fbdbf5327cf31a2a109e6c930901d
Reviewed-on: http://gerrit.cloudera.org:8080/3065
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-14 01:30:01 -07:00
Skye Wanderman-Milne
1aeda141aa IMPALA-3533: fix Tuple::CodegenMaterializeExprs()
The args[] buffer was too small. I also reverted to the usual style of
commenting out each unused argument Value*, which makes it slightly easier
to spot this kind of bug.

Change-Id: Ic2546b8f42ac0a4e0715b134c384ccf311f663c2
Reviewed-on: http://gerrit.cloudera.org:8080/3051
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-13 15:52:53 -07:00
Lars Volker
3649ff89e1 IMPALA-3019: Fix unnecessary resets of iterator
In order to perform round-robin backend selection, the simple
scheduler uses an iterator to the next backend entry to be selected.
This iterator needs to be reset whenever it is invalidated by changes
to the underlying map. The current behavior resets the pointer on
every message of the statestored, even if the message was empty and
thus did not result in any changes to the map.

After every reset of the iterator round-robin selection starts from
the first backend in the scheduler's backend map. As the statestored
sends empty keepalive messages every couple of seconds, this
effectively limits scheduling of remote reads to only a few backends.

This change introduces a check to prevent those unnecessary iterator
resets, which will spread remote reads more evenly over all backends.

Change-Id: I831d485b46c7d9460fb014a302a26864b6bd573e
Reviewed-on: http://gerrit.cloudera.org:8080/2330
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/3031
2016-05-13 15:52:53 -07:00
Youwei Wang
0306dd576d IMPALA-2809: Improve scalar ByteSwap().
This patch improves our ByteSwap() function by handling
more byte sizes in the fast path, as opposed to the
loop-based slow path.

ByteSwap() is used heavily in when scanning Parquet decimals.

Before this patch, VTune showed ByteSwap() among the top
three worst cycle offenders when running TPCH-Q6 on my local
setup with a large lineitem table.

After this patch, ByteSwap() shows no significant contribution
to the overall cycles spent.

There was a measurable improvement of a few percent for TPCH-Q6.

Change-Id: I4f462e6bdb022db46b48889a6a7426120a80d9b4
Reviewed-on: http://gerrit.cloudera.org:8080/3033
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-13 15:52:53 -07:00
Skye Wanderman-Milne
7767d300a3 IMPALA-3311: fix string data coming out of aggs in subplans
The problem: varlen data (e.g. strings) produced by aggregations is
freed by FreeLocalAllocations() after passing up the output
batch. This works for streaming operators or blocking operators that
copy their input, but results in memory corruption when the output
reaches non-copying blocking operators, e.g. SubplanNode and
NestedLoopJoinNode.

The fix: this patch makes the PartitionedAggregationNode copy out
produced string data if the node is in a subplan. Otherwise it calls
MarkNeedsToReturn() on the output batch. Marking the batch would work
in the subplan case as well, but would likely be less efficient since
it would result in many small batches coming out of the subplan.

The patch includes a test case. However, this test only exposes the
problem with an ASAN build and the --disable_mem_pools flag, which we
don't currently have automated testing for.

Change-Id: Iada891504c261ba54f4eb8c9d7e4e5223668d7b9
Reviewed-on: http://gerrit.cloudera.org:8080/2929
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Lars Volker
cb377741ec Remove replica_preference query option
Change-Id: I5a3134b874a53241706d850d186acbfed768f5ee
Reviewed-on: http://gerrit.cloudera.org:8080/2323
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/3030
Reviewed-by: Lars Volker <lv@cloudera.com>
2016-05-12 23:06:36 -07:00
Skye Wanderman-Milne
9174dee395 IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Alex Behm
e96b463587 IMPALA-3528: Transfer scratch tuple memory in Close() of Parquet scanner.
The lifetime of a scanner thread is decoupled from that of row batches that
it produces. That means that all resources associated with row batches
produced by the scanner thread should be transferred to those batches.

The bug was that we were not transferring the ownership of memory from the
scratch batch to the final row batch returned in HdfsParquetScanner::Close().

Triggering an event that would cause the freed memory to be dereferenced is
possible, but very difficult. My understanding is that it is only possible
in exceptional non-deterministic scenarios, e.g., a query is cancelled just
at the right time, or the scanner hits a parse/decoding error.

Testing: I tested this change locally by running the scanner and nested
types test as well as TPCH, nested TPCH, and TPC-DS.

Change-Id: Ic34d32c9a41ea66b2b2d8f5e187cc84d4cb569b2
Reviewed-on: http://gerrit.cloudera.org:8080/3041
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Tim Armstrong
6910f4975a IMPALA-3527: use codegen'd ProcessProbeBatch() when spilling.
Change-Id: I92ebfb01e370d0a842270771c9e5f1a4610dc16a
Reviewed-on: http://gerrit.cloudera.org:8080/3035
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:35 -07:00
Tim Armstrong
a2e88f0e6c IMPALA-3495: incorrect join result due to implicit cast in Murmur hash
We observed that some spilling joins started returning incorrect
results. The behaviour seems to happen when a codegen'd insert and a
non-codegen'd probe function is used (or vice-versa). This only seems to
happen in a subset of cases.

The bug appears to be a result of the implicit cast of the uint32_t seed
value to the int32_t hash argument to HashTable::Hash(). The behaviour
is unspecified if the uint32_t does not fit in the int32_t. In Murmur
hash, this value is subsequently cast to a uint64_t, so we have a chain
of uint32_t->int32_t->uint64_t conversions. It would require a very
careful reading of the C++ standard to understand what the expected
result is, and whether we're seeing a compiler bug or just unspecified
behaviour, but we can avoid it entirely by keeping the values unsigned.

Testing:
I was able to reproduce the issue under a very specific of circumstances,
listed below. Before this change it consistently returned 0 rows. After the
change it consistently returned the correct results. I haven't had much
luck creating a suitable regression test.

* 1 impalad
* --disable_mem_pools=true
* use tpch_20_parquet;
* set mem_limit=1275mb;
* TPC-H query 7:

select
  supp_nation,
  cust_nation,
  l_year,
  sum(volume) as revenue
from (
  select
    n1.n_name as supp_nation,
    n2.n_name as cust_nation,
    year(l_shipdate) as l_year,
    l_extendedprice * (1 - l_discount) as volume
  from
    supplier,
    lineitem,
    orders,
    customer,
    nation n1,
    nation n2
  where
    s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and (
      (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
      or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
    )
    and l_shipdate between '1995-01-01' and '1996-12-31'
  ) as shipping
group by
  supp_nation,
  cust_nation,
  l_year
order by
  supp_nation,
  cust_nation,
  l_year

Change-Id: I952638dc94119a4bc93126ea94cc6a3edf438956
Reviewed-on: http://gerrit.cloudera.org:8080/3034
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:35 -07:00
Henry Robinson
df1412c962 IMPALA-3480: Add query options for min/max filter sizes
This patch adds two query options for runtime filters:

  RUNTIME_FILTER_MAX_SIZE
  RUNTIME_FILTER_MIN_SIZE

These options define the minimum and maximum filter sizes for a filter,
no matter what the estimates produced by the planner are. Filter sizes
are rounded up to the nearest power of two.

Change-Id: I5c13c200a0f1855f38a5da50ca34a737e741868b
Reviewed-on: http://gerrit.cloudera.org:8080/2966
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-05-12 23:06:35 -07:00
Alex Behm
14cdb0497c IMPALA-2736: Optimized ReadValueBatch() for Parquet scalar column readers.
This change builds on top of the recent move to column-wise
materialization of scalar values in the Parquet scanner.

The goal of this patch is to improve the scan efficiency, and
show the future direction for all column readers.

Major TODO:
The current patch has minor code duplication/redundancy,
and the new ReadValueBatch() departs from (but improves) the
existing column reader control flow. To improve code reuse
and readability we should overhaul all column readers to be
more uniform.

Summary of changes:
- refactor ReadValueBatch() to simplify control flow
- introduce caching of def/rep levels for faster level
  decoding, and for a tigher value materialization loop
- new templated function for value materialization that
  takes the value encoding as a template argument

Mini benchmark vs. cdh5-trunk
I ran the following queries on a single impalad before and after my
change using a synthetic 'huge_lineitem' table.
I modified hdfs-scan-node.cc to set the number of rows of any row
batch to 0 to focus the measurement on the scan time.

Query options:
set num_scanner_threads=1;
set disable_codegen=true;
set num_nodes=1;

select * from huge_lineitem;
Before: 22.39s
Afer:   13.62s

select * from huge_lineitem where l_linenumber < 0;
Before: 25.11s
After:  17.73s

select * from huge_lineitem where l_linenumber % 2 = 0;
Before: 26.32s
After:  16.68s

select l_linenumber from huge_lineitem;
Before: 1.74s
After:  0.92s

Testing:
I ran a private exhaustive build and all tests passed.

Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30
Reviewed-on: http://gerrit.cloudera.org:8080/2843
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-05-12 14:18:05 -07:00
Lars Volker
df8bf3a965 IMPALA-3490: Add flag to reduce minidump size
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.

Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00