Commit Graph

4691 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
7767d300a3 IMPALA-3311: fix string data coming out of aggs in subplans
The problem: varlen data (e.g. strings) produced by aggregations is
freed by FreeLocalAllocations() after passing up the output
batch. This works for streaming operators or blocking operators that
copy their input, but results in memory corruption when the output
reaches non-copying blocking operators, e.g. SubplanNode and
NestedLoopJoinNode.

The fix: this patch makes the PartitionedAggregationNode copy out
produced string data if the node is in a subplan. Otherwise it calls
MarkNeedsToReturn() on the output batch. Marking the batch would work
in the subplan case as well, but would likely be less efficient since
it would result in many small batches coming out of the subplan.

The patch includes a test case. However, this test only exposes the
problem with an ASAN build and the --disable_mem_pools flag, which we
don't currently have automated testing for.

Change-Id: Iada891504c261ba54f4eb8c9d7e4e5223668d7b9
Reviewed-on: http://gerrit.cloudera.org:8080/2929
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Lars Volker
cb377741ec Remove replica_preference query option
Change-Id: I5a3134b874a53241706d850d186acbfed768f5ee
Reviewed-on: http://gerrit.cloudera.org:8080/2323
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/3030
Reviewed-by: Lars Volker <lv@cloudera.com>
2016-05-12 23:06:36 -07:00
Skye Wanderman-Milne
9174dee395 IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks
This patch modifies HdfsTextScanner to specifically check for split
"\r\n" delimiters when the scan range ends with '\r'. If there does
turn out to be a split delimiter, the next tuple is considered the
responsibility of the next scan range's scanner, as if the delimiter
appeared fully in the second scan range. This should not affect the
overall performance characteristics of the text scanner since it
already must do a remote read past the end of the scan range to read
the last tuple.

Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420
Reviewed-on: http://gerrit.cloudera.org:8080/2803
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Tim Armstrong
2b61ae7f2a IMPALA-3534: allow overriding of CMAKE_CXX_COMPILER for ASAN
This makes it consistent with the regular toolchain and makes it easier
to use wrapper scripts like distcc.

Change-Id: I3ab488182c46f9ccb1850a0a2b064653e7e3da26
Reviewed-on: http://gerrit.cloudera.org:8080/3050
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Alex Behm
e96b463587 IMPALA-3528: Transfer scratch tuple memory in Close() of Parquet scanner.
The lifetime of a scanner thread is decoupled from that of row batches that
it produces. That means that all resources associated with row batches
produced by the scanner thread should be transferred to those batches.

The bug was that we were not transferring the ownership of memory from the
scratch batch to the final row batch returned in HdfsParquetScanner::Close().

Triggering an event that would cause the freed memory to be dereferenced is
possible, but very difficult. My understanding is that it is only possible
in exceptional non-deterministic scenarios, e.g., a query is cancelled just
at the right time, or the scanner hits a parse/decoding error.

Testing: I tested this change locally by running the scanner and nested
types test as well as TPCH, nested TPCH, and TPC-DS.

Change-Id: Ic34d32c9a41ea66b2b2d8f5e187cc84d4cb569b2
Reviewed-on: http://gerrit.cloudera.org:8080/3041
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Sailesh Mukil
7e0cbaf1a0 IMPALA-3459: Add test for DROP TABLE PURGE for S3
It was previously thought that PURGE had no effect on S3. However,
the Hive Metastore actually created a .Trash directory and copied the
files there when a DROP TABLE was conducted from Impala.

This patch just enables the existing PURGE tests for S3. There were a
few reasons this wasn't working before. The paths given to the S3
client (boto3) should not have a leading "/". This has been fixed as
it doesn't make a difference for HDFS if that exists or not.

Also, PURGE is a pure delete whereas a regular DROP is a copy. A copy
is consistent whereas a delete is only eventually consistent, so when
we PURGE a table or partition, the files will still be visible for
sometime after the query has completed. The tests have been modified
to accomodate for this case as well.

Change-Id: I52d2451e090b00ae2fd9a879c28defa6c940047c
Reviewed-on: http://gerrit.cloudera.org:8080/3036
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Tim Armstrong
6910f4975a IMPALA-3527: use codegen'd ProcessProbeBatch() when spilling.
Change-Id: I92ebfb01e370d0a842270771c9e5f1a4610dc16a
Reviewed-on: http://gerrit.cloudera.org:8080/3035
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:35 -07:00
Tim Armstrong
a2e88f0e6c IMPALA-3495: incorrect join result due to implicit cast in Murmur hash
We observed that some spilling joins started returning incorrect
results. The behaviour seems to happen when a codegen'd insert and a
non-codegen'd probe function is used (or vice-versa). This only seems to
happen in a subset of cases.

The bug appears to be a result of the implicit cast of the uint32_t seed
value to the int32_t hash argument to HashTable::Hash(). The behaviour
is unspecified if the uint32_t does not fit in the int32_t. In Murmur
hash, this value is subsequently cast to a uint64_t, so we have a chain
of uint32_t->int32_t->uint64_t conversions. It would require a very
careful reading of the C++ standard to understand what the expected
result is, and whether we're seeing a compiler bug or just unspecified
behaviour, but we can avoid it entirely by keeping the values unsigned.

Testing:
I was able to reproduce the issue under a very specific of circumstances,
listed below. Before this change it consistently returned 0 rows. After the
change it consistently returned the correct results. I haven't had much
luck creating a suitable regression test.

* 1 impalad
* --disable_mem_pools=true
* use tpch_20_parquet;
* set mem_limit=1275mb;
* TPC-H query 7:

select
  supp_nation,
  cust_nation,
  l_year,
  sum(volume) as revenue
from (
  select
    n1.n_name as supp_nation,
    n2.n_name as cust_nation,
    year(l_shipdate) as l_year,
    l_extendedprice * (1 - l_discount) as volume
  from
    supplier,
    lineitem,
    orders,
    customer,
    nation n1,
    nation n2
  where
    s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and (
      (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
      or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
    )
    and l_shipdate between '1995-01-01' and '1996-12-31'
  ) as shipping
group by
  supp_nation,
  cust_nation,
  l_year
order by
  supp_nation,
  cust_nation,
  l_year

Change-Id: I952638dc94119a4bc93126ea94cc6a3edf438956
Reviewed-on: http://gerrit.cloudera.org:8080/3034
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:35 -07:00
Henry Robinson
df1412c962 IMPALA-3480: Add query options for min/max filter sizes
This patch adds two query options for runtime filters:

  RUNTIME_FILTER_MAX_SIZE
  RUNTIME_FILTER_MIN_SIZE

These options define the minimum and maximum filter sizes for a filter,
no matter what the estimates produced by the planner are. Filter sizes
are rounded up to the nearest power of two.

Change-Id: I5c13c200a0f1855f38a5da50ca34a737e741868b
Reviewed-on: http://gerrit.cloudera.org:8080/2966
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-05-12 23:06:35 -07:00
Alex Behm
14cdb0497c IMPALA-2736: Optimized ReadValueBatch() for Parquet scalar column readers.
This change builds on top of the recent move to column-wise
materialization of scalar values in the Parquet scanner.

The goal of this patch is to improve the scan efficiency, and
show the future direction for all column readers.

Major TODO:
The current patch has minor code duplication/redundancy,
and the new ReadValueBatch() departs from (but improves) the
existing column reader control flow. To improve code reuse
and readability we should overhaul all column readers to be
more uniform.

Summary of changes:
- refactor ReadValueBatch() to simplify control flow
- introduce caching of def/rep levels for faster level
  decoding, and for a tigher value materialization loop
- new templated function for value materialization that
  takes the value encoding as a template argument

Mini benchmark vs. cdh5-trunk
I ran the following queries on a single impalad before and after my
change using a synthetic 'huge_lineitem' table.
I modified hdfs-scan-node.cc to set the number of rows of any row
batch to 0 to focus the measurement on the scan time.

Query options:
set num_scanner_threads=1;
set disable_codegen=true;
set num_nodes=1;

select * from huge_lineitem;
Before: 22.39s
Afer:   13.62s

select * from huge_lineitem where l_linenumber < 0;
Before: 25.11s
After:  17.73s

select * from huge_lineitem where l_linenumber % 2 = 0;
Before: 26.32s
After:  16.68s

select l_linenumber from huge_lineitem;
Before: 1.74s
After:  0.92s

Testing:
I ran a private exhaustive build and all tests passed.

Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30
Reviewed-on: http://gerrit.cloudera.org:8080/2843
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2016-05-12 14:18:05 -07:00
Lars Volker
df8bf3a965 IMPALA-3490: Add flag to reduce minidump size
IMPALA-2686 added the breakpad library to all impala daemons, thus enabling them
to write minidump files. This change introduces a flag
'minidump_size_limit_hint_kb', which causes breakpad to reduce the amount of
thread stack memory it includes in a minidump, aiming to reduce the minidump
size during crashes with a lot of threads. Once a minidump is expected to
exceed the configured value, breakpad will include the full stack memory for the
first 20 threads, and afterwards capture only 2KB of stack memory for each
additional thread.

Change-Id: I2f3aa0df51be9f0bf0755fb288702911cdb88052
Reviewed-on: http://gerrit.cloudera.org:8080/2990
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00
Dimitris Tsirogiannis
eaa3926452 IMPALA-3502: Fix race in the coordinator while updating filter routing
table

This commit fixes an issue where a fragment may send an
UpdateFilter message to the coordinator while the latter is still
updating the runtime filter routing table. The fix is to decouple the
update of the filter routing table from starting the fragments. With
this fix, the coordinator will finish populating the filter routing
table before it starts any remote fragments.

Change-Id: Iecc737106fd38aa4af0c72959a577adfb413728d
Reviewed-on: http://gerrit.cloudera.org:8080/3018
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00
Sailesh Mukil
02d3e93774 IMPALA-3453: S3: Uneven split sizes are generated for Parquet causing execution skew
Previously the Parquet file format was considered by us as a
non-splittable file format. However, we have since done some work on
our parquet scanner that will assign row groups based on the split
that contains them. This allows for us to chop up a parquet file into
multiple splits and still have the file be scanned reliably.

This patch changes our perception of Parquet as a splittable file
format, which now allows synthesizeBlockMetadata() to split a parquet
file on S3 into multiple "blocks" instead of assigning one scan range
per file, so that there is an even distribution of scan ranges across
the cluster, hence minimizing skew greatly.

P.S: To control the size of scan ranges for splittable files on S3,
you can change the default "block" size for the S3A filesystem which
is governed by "fs.s3a.block.size". Its default value is 32MB.

Change-Id: Ib1518ad0c89ef35a3b0567c3902e85a41e34bc3d
Reviewed-on: http://gerrit.cloudera.org:8080/2968
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:04 -07:00
Tim Armstrong
1c704f3cfd IMPALA-3166: basic perf support and asm dumps for codegened code
Adds support for communicating function-level symbols to perf by writing
/tmp/perf-<pid>.data if the --perf_map=true argument is set. Perf must
be run under the same user as Impala. I.e. 'sudo perf top' does not
work. To get perf to work under a non-root user you will probably need
to disable some kernel security features that perf complains about:

sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c 'echo 0 > /proc/sys/kernel/kptr_restrict'

Once you get it working you should see IR function names concatenated with
the fragment instance id in 'perf top'. 'perf annotate' does not work.

Implements --asm_module_dir, analogous to --opt_module_dir. We dump
disassembly to files there. Debug symbols are interleaved with the
assembly if they are available. I enabled them for the debug
build, now that we have some purpose for them.  In some cases
it would be useful to have them for the release build, but
they make the llvm module much larger so I haven't enabled them
there.

The asm dump for a random exception constructor looks like this:

Disassembly for __cxx_global_var_init.165:324bc8754182e7c6:22735c36d7a2bc0 (0x7f50f2140300):
        date_facet.hpp:date_facet.hpp:<invalid>:363:0
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
0:              movabsq $0, %rax
10:             movb    (%rax), %cl
12:             cmpb    $0, %cl
15:             jne     17
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
17:             movabsq $0, %rax
27:             movq    $1, (%rax)
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
34:             retq

Change-Id: If25de61e46f4db005956686cddbd4d71a1424528
Reviewed-on: http://gerrit.cloudera.org:8080/2793
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:03 -07:00
Skye Wanderman-Milne
8e8df2f2f6 IMPALA-2548: Codegen Tuple::MaterializeExprs() and use in TopN node
For the following benchmark query:
 select count(*) from (select l_orderkey from biglineitem order by l_orderkey limit 1000) a

The overall query time goes from 2.74s to 1.74s, with the top-n node
time going from 2.2s to 1.0s. There is no effect on sort node time.

The overall approach of this patch is to move the
TopNNode::InsertTupleRow() call into a cross-compiled batched function
(InsertBatch()), and then replace the MaterializeExprs() calls with
new functions built using the IRBuilder. This involves new codegen
utilities, such as CodegenAnyVal::WriteToSlot() and the ability to
hardcode in a MemPool pointer from which to make varlen data
allocations. This patch also adds a new timer measuring the time spent
inserting tuple rows.

The existing TestQueries::test_top_n and TestQueries::test_sort tests
pass with this patch.

Change-Id: Ib422a8d50303c21c6a228675157bf867e8619444
Reviewed-on: http://gerrit.cloudera.org:8080/1901
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:03 -07:00
Sailesh Mukil
da49a37edc IMPALA-3517: S3: Get rid of hdfsDelete() warning message
In the patch for IMPALA-3452, a bug was introduced which does an
unnecessary call to hdfsDelete() on the staging directory of a
partition(s), when the user selects the option of skipping the
staging step. Also, this delete will not happen when it should happen,
e.g on HDFS partitions. (Although it will be deleted by the
coordinator later).

This patch fixes this bug by making sure that we delete the staging
directory when it is necessary and skip the call to delete when we
do not want it.

Change-Id: I0a81ba0abfc24ee56689211579f46ac353e98adb
Reviewed-on: http://gerrit.cloudera.org:8080/3019
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:03 -07:00
Tim Armstrong
5c56ec0997 Fix some ASAN compile warnings and remove redundant flags
Change-Id: I7b2772d917449ca747820641c56e65545f610b23
Reviewed-on: http://gerrit.cloudera.org:8080/3025
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:02 -07:00
Alex Behm
1c19c232f3 IMPALA-3491: Use unique_database fixture in test_views_compatibility.
Testing: Ran the test locally. It was already possible to run the
test in parallel before.

Change-Id: I68a1349276c90a42c238bed40a1c7c221199a67a
Reviewed-on: http://gerrit.cloudera.org:8080/3009
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:02 -07:00
Matthew Jacobs
ca3f911f1a IMPALA-3500: GetEffectiveUser() segfault using -enable_rm
A previous change to improve admission control broke Llama
integration. The Coordinator's runtime_state has a partial
TQueryCtx, it doesn't have the TSessionState which is used
by RuntimeState::effective_user(), so calling that in the
SimpleScheduler caused the process to crash.

This is easily fixed by using the TSessionState from the
schedule (which is what the Admission Control code was
doing).

Llama integration is no longer supported, so this was only
tested manually.

Change-Id: Ia04b6aec35ae794d7062fd32104f3964f397f00c
Reviewed-on: http://gerrit.cloudera.org:8080/3016
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:02 -07:00
Jim Apple
18f8f46be2 IMPALA-3507: Use toolchain linker only if using gold
This is a workaround for extremely slow linking when not using gold.

Change-Id: I822a78642993e95abc279944f454fdf67dd8e1d5
Reviewed-on: http://gerrit.cloudera.org:8080/3014
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:01 -07:00
Bharath Vissapragada
3092c96619 IMPALA-2660: Respect auth_to_local configs from hdfs configs
This patch implements a new feature to read the auth_to_local
configs from hdfs configuration files, using the parameter
hadoop.security.auth_to_local. This is done by modifying the
User#getShortName() method to use its hdfs equivalent.

This patch includes an end to end authorization test using
sentry where we add specific auth_to_local setting for a certain
user and test if the sentry authorization passes for this user
after applying these rules. Given we don't have tests that run
on a kerberized min-cluster, this patch adds a hack to load this
configuration during even on non-kerberized 'test runs'.

However this feature is disabled by default to preserve the
existing behavior. To enable it,

1. Use kerberos as authentication mechanism (by setting --principal) and
2. Add "--load_auth_to_local_rules=true" to the cluster startup args

Change-Id: I76485b83c14ba26f6fce66e5f83e8014667829e0
Reviewed-on: http://gerrit.cloudera.org:8080/2800
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:01 -07:00
Alex Behm
c0ee93bbbe IMPALA-3491: Use unique_database fixture in test_recover_partitions.py.
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.

Change-Id: I5be5fa5d20bc6ed5b7830e0ce90201431d6aa008
Reviewed-on: http://gerrit.cloudera.org:8080/3003
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Sailesh Mukil
27815818b9 IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup
INSERTs on S3 are slower because of double buffering where we buffer
once locally and once in a staging directoy in S3 before moving the
file(s) to the final location. Also, moving the file from the staging
directory to the final location in HDFS is a quick rename which is
only a metadata operation. However, on S3, renames are not supported,
thus becoming a full file copy instead of just a metadata rename
operation.

This patch instroduces a boolean query option "s3_skip_insert_staging"
which avoids the staging step on S3 and allows the sinks to write to
the final location directly.

This trades in consistency for the sake of performance. If a node(s)
fails during the query, then we will end up with inconsistent results
in the final location.

P.S: This option is disabled for INSERT OVERWRITE queries as that
would require cleaning the destination directory before moving the
final files there. However, the coordinator is responsible for the
cleaning which takes place only after the table sinks have moved
the files to the final location. Thus, INSERT OVERWRITE queries must
still have their files moved to a staging location by the table sinks.

Performance gains:
 - For non-partitioned tables, the INSERT queries run 4-4.5x faster on
   S3. (Tested on a 63GB INSERT to a table)
 - For heavily partitioned tables, there is considerable improvement
   in the order of 4-5 minutes on queries that take ~27 minutes but
   queries are still slow because of IMPALA-3482 where the catalog
   takes too long to update all the metadata. (Tested with a query
   that creates 2.4K partitions in a table totalling ~19GB).

Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698
Reviewed-on: http://gerrit.cloudera.org:8080/2905
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Alex Behm
616eb2fcce IMPALA-3491: Use unique_database fixture in test_partition_metadata.py
Also changes the test to use beeline instead of Hive -e
for the portions executed in Hive because beeline is
significantly faster.

Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.

Change-Id: I70d87941fbfc30f41e1c6fcfee8d8c7f16b88831
Reviewed-on: http://gerrit.cloudera.org:8080/2962
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Jim Apple
30cc3f5ae0 IMPALA-3484: Use the linker from the toolchain
Change-Id: Idf1db88c48ad18521d69903838d33cdb6f45b0fa
Reviewed-on: http://gerrit.cloudera.org:8080/2983
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:00 -07:00
Alex Behm
12097a0707 IMPALA-3491: Use unique_database fixture in test_hidden_files.py.
Testing: Tested the changes locally by running them in a loop
10 times. Also did a private core/hdfs run.

Change-Id: I37e1528c02e598f3fb2d673b6559d55a34bf79b4
Reviewed-on: http://gerrit.cloudera.org:8080/3002
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:59 -07:00
Alex Behm
96e18f9e62 IMPALA-3491: Use unique_database fixture in test_stale_metadata.py
Testing: I ran the test 10 times in a loop locally and ran
a private core/hdfs run.

Change-Id: Ibd058853e6b48671838e5b51611b6c34a7a8d39d
Reviewed-on: http://gerrit.cloudera.org:8080/2982
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:59 -07:00
Casey Ching
07bdb6d484 Add .impala_compiler_opts to .gitignore
Change-Id: I164a077a91fcbe2cd445637ce958e91082bd56e0
Reviewed-on: http://gerrit.cloudera.org:8080/3012
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
2016-05-12 14:17:58 -07:00
Sailesh Mukil
0f1dd55c79 IMPALA-3488 (follow up): test_ddl.py failure on LocalFS run
There was another test which used the hdfs_client and which was not
skipped for localFS. It should never have run on localFS but they
did not fail earlier for the same reasons as mentioned in the previous
patch and in the JIRA. Marking as SkipIfLocal.

Change-Id: I3436e80ccd380ecc5f5d28053b3563db2319f9e9
Reviewed-on: http://gerrit.cloudera.org:8080/2991
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:58 -07:00
Jim Apple
b204d5dea3 IMPALA-3375: Improve TopN performance with a trivial Compare object.
The C++ standard requires that priority_queue operations behave as
wrappers to {push,pop,make,sort}_heap, which take their comparator
object by value. This is expensive for objects like
TupleRowComparator.

This patch creates a wrapper type this is trivial to copy. It also
renames the operator()s of stateful comparator types to prevent their
accidental use in STL functions that take comparators by value.

This speeds up primitive_topn_bigint by 39% locally with scale factor
13 and 31% on the 16-node with scale factor 300. It speeds up
primitive_top-n_all by 13% locally and 7% on the 16-node.

Change-Id: I24755227b5bbbca6ad7c7d31d9bb8e132ca89e11
Reviewed-on: http://gerrit.cloudera.org:8080/2936
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:58 -07:00
Alex Behm
bff194ce17 IMPALA-3491: Use unique_database fixture in test_col_stats.py.
The patch also addresses a TODO asking for
test_col_stats.py to be merged into test_compute_stats.py

Testing: I ran the test by itself in a loop 10 times,
and the whole test_compute_stats.py locally. Also did
a private core/hdfs run.

Change-Id: I88aa77464a95993c018e19a52eeb496d7c3eef08
Reviewed-on: http://gerrit.cloudera.org:8080/2963
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:58 -07:00
Dimitris Tsirogiannis
5cae398a48 IMPALA-3133: Wrong privileges after a REVOKE ALL ON SERVER statement
This commit fixes an issue where a GRANT ALL ON SERVER to role_name statement
followed by a REVOKE ALL ON SERVER from role_name statement would not revoke all
privileges from role_name. The problem was triggered by a specific
combination of Sentry client API calls used in Impala during
grant/revoke statements at server scope. In particular, during GRANT, Impala was using
an API call that didn't explicitly specify the privilege action (Sentry uses '*' if
no action is specified). In contrast, the corresponding REVOKE call was explicitly
specifying the privilege action to be 'ALL'. Sentry doesn't seem to
handle this case correctly, thereby failing to remove all the privileges
after a REVOKE ALL ON SERVER call. The fix from the Impala side, that
results in the correct behavior, is to always specify the privilege
action by using the appropriate API calls.

Change-Id: I6b3a0d10f5e88c6a0a10bd20f620562d2de7ab25
Reviewed-on: http://gerrit.cloudera.org:8080/2979
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Sailesh Mukil
41e31439d3 IMPALA-3488: test_ddl.py failure on LocalFS run
Our test_ddl.py always had a bug where in the _cleanup() function,
we used the hdfs_client on local FS runs. It always ended up passing
because we caught generic exceptions in hdfs_client.delete_file_dir()
while checking if a file existed which always caused the test to pass.

With the introduction of an hdfs_client.exists() function in
IMPALA-1878 which catches only the right FileNotFound exception, this
bug was exposed causing our local FS runs to fail.

This patch returns from the _cleanup() function if it's a local FS run
because the directories of the tables it cleans up are not used in
these runs.

Change-Id: Ie0c9eec31a90e8f66102d18d900c613bd1306968
Reviewed-on: http://gerrit.cloudera.org:8080/2980
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Taras Bobrovytsky
2bb98a08c0 IMPALA-3155: Disable implicit casting of CHAR to STRING in CASE statements
All CHAR children of a CASE expression were automaticallally implicitly
cast to STRING. This commit removes this behavior for the THEN clause of
a CASE statement, i.e., if all THEN exprs of a CASE expression are CHAR,
then the type of the CASE expression will be CHAR.

Change-Id: I4aebac6849898693570bc3164fff40786c215358
Reviewed-on: http://gerrit.cloudera.org:8080/2762
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Dan Hecht
a0d4249652 IMPALA-3337: fix "Cancelled" warnings when LIMIT clause is specified
The cancelled status is propagated in scanner threads to cause them to
shut down once the limit has been satisified, but depending on the code
path and when abort_on_error=false, this internal status would sometimes
incorrectly end up in the error log. Fix this by factoring out the
abort_on_error handling code so that it's handled more consistently
across scanners. Parquet, RC, and Avro all suffered from this bug.

Testing: exhastive

Change-Id: I4a91a22608e346ca21a23ea66c855eae54bbced6
Reviewed-on: http://gerrit.cloudera.org:8080/2964
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:57 -07:00
Marcel Kornacker
3b7d5b7c17 MT: Planner for multi-threaded execution
New classes:
- ParallelPlanner: creates build plans, assigns plans to cohorts
- JoinBuildSink: DataSink for plan fragments that materialize build sides
- ids for plans, hash tables, plan fragments

Tests: this adds a new test file section PARALLELPLANS and augments the tpc-h/-ds tests with
those sections.

In the interest of keeping this patch small I didn't augment other test files with that
section yet (which will happen at a later date, to cover more corner cases).

Change-Id: Ic3c34dd3f9190a131e6f03d901b4bfcd164a5174
Reviewed-on: http://gerrit.cloudera.org:8080/2846
Tested-by: Internal Jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2016-05-12 14:17:56 -07:00
Tim Armstrong
8e64273fee Fix Kudu hole punch check to work if /tmp is on different fs
/tmp isn't necessarily on the same filesystem as the Kudu data
directory. Fix the check so that it checks the actual Kudu directory.

Change-Id: Ic6aa27569a0650db7dcf5759952cd50c8e47f8c9
Reviewed-on: http://gerrit.cloudera.org:8080/2967
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:56 -07:00
Henry Robinson
5454086c74 IMPALA-3443: Replace BOOST_FOREACH with ranged for()
This patch doesn't use 'auto' for the loop index type, as it's not clear
yet where the savings in typing outweigh the cost of eliding the type.

Change-Id: Iae1ca36313e3562311b6418478bf54b6d9b0bf7d
Reviewed-on: http://gerrit.cloudera.org:8080/2890
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-05-12 14:17:56 -07:00
Sailesh Mukil
b5d77f43f4 IMPALA-3472: hdfs-util-test failing on local FS
Our GetFilesystemNameLength() function had a special case for
locations starting with "file:". However, if a scheme had only one
'/' in the scheme delimiter (i.e. ':/' vs '://'), we would return '0'
as the length for that URI.

This is exactly what one of our test cases in hdfs-util-test was
doing. I think it's safe to say we will never get a scheme with only
one following '/' for schemes other than "file:", so I've changed that
testcase.

Change-Id: I0f1232345c62db48575530785c79c0ffe77c2772
Reviewed-on: http://gerrit.cloudera.org:8080/2958
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:55 -07:00
Tim Armstrong
34c95c9590 IMPALA-2345,2991: test coverage for spilling and sorts
Add missing coverage for sorting by CHAR and VARCHAR.

Add more coverage for spilling sorts.

Fix spilling tests: ensure that they actually reliably spill (many of
them had memory limits high enough that they could run entirely in
memory).

I ran this in a loop for a while to flush out flaky tests. The tests
should be fairly predictable given that they're not run concurrently
with other tests and we allocate enough block manager memory so that
each operator can obtain its reservation.

Change-Id: Ia2d2627a2c327dcdf269ea3216385b1af9dfa305
Reviewed-on: http://gerrit.cloudera.org:8080/2877
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:55 -07:00
Henry Robinson
a805e100b2 IMPALA-3397: Source query files from shell.
This patch allows you to write SOURCE <file> or SRC <file>, and have the
shell read the file and execute all the queries in it.

Change-Id: Ib05df3e755cd12e9e9562de6b353857940eace03
Reviewed-on: http://gerrit.cloudera.org:8080/2663
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:54 -07:00
Kapil Rastogi
e1c5959b4d Reuse session for executing queries (Hive on Spark)
Change-Id: I06c798dc311d63eb0a875450fd26d06db4e84a03
Reviewed-on: http://gerrit.cloudera.org:8080/2374
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:54 -07:00
Sailesh Mukil
3093054e95 IMPALA-3460: test_grant_revoke: remove S3-specific workload
Now that we functionally support writes to S3 via Impala,
test_grant_revoke should not have a special case for S3, which
till this patch did the test without INSERTs.

Change-Id: Id981e7f83bf86b32d1a5b267ad3781db02337e86
Reviewed-on: http://gerrit.cloudera.org:8080/2949
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:54 -07:00
Thomas Tauber-Marshall
8c2bf9769a IMPALA-2805: Order conjuncts based on selectivity and cost
Added costs to all Exprs, which estimate the relative cost of evaluating
an expression and all of its children. Costs are calculated during
analysis. For now, these costs are intended as a simple way to order
expressions from cheap to expensive, not necessarily to be a precise
reflection of running times.

In general, expressions that deal with variable length types like strings
will have higher cost than those dealing with fixed length types
like numbers and booleans. Additionally, expressions with complicated
subexpressions will have higher cost than simpler expressions.

Also added PlanNode.orderConjunctsByCost, which takes a list of Exprs and
returns a new list sorted according to an estimate of the cheapest order to
evaulate the conjuncts in, based on their cost and selectivity.

The conjuncts are sorted by repeatedly iterating over them and choosing the
conjunct that would result in the least total estimated work were it to be
applied before the remaining conjuncts. Selectivities are exponentially
backed off, and Exprs without selectivity estimates are given a reasonable
default.

Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034
Reviewed-on: http://gerrit.cloudera.org:8080/2598
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:53 -07:00
Tim Armstrong
6e89f1a250 Add ninja support for faster incremental builds
Ninja resolves dependencies much faster, so if only a couple of files
are changed "ninja -j ${IMPALA_BUILD_THREADS} impalad" returns within a
second or two, while make can take tens of seconds to resolve all the
dependencies.

This requires ninja to be installed. It is widely available, e.g. in the
ninja-build package on Ubuntu.

Ninja can be enabled by passing "-ninja" to buildall.sh or
make_impala.sh. The same targets should work as with make.

The default Ninja status output is fairly terse. It can be customised
with an environment variable. E.g. I have

export NINJA_STATUS="[%u to run/%r running/%f finished] "

Also fixes a bug in make_impala.sh where invalid arguments were ignored.

Change-Id: I2cea479615fe850c98d30110de043ecb6358dcda
Reviewed-on: http://gerrit.cloudera.org:8080/2923
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-05-12 14:17:53 -07:00
Lars Volker
c9df348c38 IMPALA-2686: Add breakpad crash handler to all daemons
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.

Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:52 -07:00
Lars Volker
05acec5b00 IMPALA-2918: Unit test framework for simple scheduler
The SimpleScheduler class is currently hard to change because of the lack of
comprehensive tests. This change adds support classes to make writing tests
easier.

The overall testing approach looks like this: Each test builds a list of hosts,
a physical schema, and a plan, to all of which elements can be added using
various helper methods.  Then scheduling can be tested by instantiating
SchedulerWrapper and calling Compute(...). The result can be verified using a
set of helper methods. There are also helper methods to modify the internal
state of the scheduler between subsequent calls to SchedulerWrapper::Compute().

The model currently comes with some known limitations:

- Files map 1:1 to blocks and to scan ranges.
- All files have the same size (i.e., 1MB). Tables that differ in size can be
  expressed as having a different number of files.
- We don't support multiple backends on a single host.
- Ports are assigned to hosts automatically and are not configurable by the
  test.

Change-Id: Ia7aee9b16067e8728a8e96d4def3e568ad21f4bf
Reviewed-on: http://gerrit.cloudera.org:8080/2431
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:52 -07:00
Henry Robinson
f6fcee9a7a IMPALA-3462: Fix exec option text for old HJ w/ runtime filters
Change-Id: I737e261ce251b05dd89bce939ad5df8d95d39b61
Reviewed-on: http://gerrit.cloudera.org:8080/2933
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:51 -07:00
Michael Ho
e32720e022 IMPALA-3286: Software prefetching for hash table build.
This change pipelines the code which builds the hash table.
This is based on the idea which Mostafa presented earlier.
Essentially, the pipelined code will first evaluate all the
rows to be inserted, compute their hash values and prefetch
the corresponding hash table buckets before going through
all the rows again to insert them into the hash table. This
change also introduces lazy evaluation of the build side
expression in Equals() to avoid unnecessary build side
expression evaluation for the second time in case the hash
table bucket is empty or the hash doesn't match due to
collision.

With this change, the hash table build time of a self-join
with lineitem reduces by more than half (going from 10.5s to 4.5s).
The overall query time drops from 37.28s to 31.15s (~16% reduction).

select count(*) from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
o1.l_linenumber = o2.l_linenumber

TPCH(15) also improves by 2.5% overall, with certain queries
improving up to 8%:

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(15) | parquet / none / none | 14.34   | -2.49%     | 9.36       | -1.65%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(15) | TPCH-Q1  | parquet / none / none | 8.44   | 8.05        |   +4.92%   |   2.89%   |   1.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q11 | parquet / none / none | 1.85   | 1.76        |   +4.86%   |   3.88%   |   3.93%        | 1           | 10    |
| TPCH(15) | TPCH-Q2  | parquet / none / none | 2.90   | 2.78        |   +4.41%   |   8.68%   | * 15.78% *     | 1           | 10    |
| TPCH(15) | TPCH-Q19 | parquet / none / none | 39.46  | 38.53       |   +2.40%   |   2.21%   |   2.23%        | 1           | 10    |
| TPCH(15) | TPCH-Q16 | parquet / none / none | 1.90   | 1.86        |   +1.81%   |   2.54%   |   2.74%        | 1           | 10    |
| TPCH(15) | TPCH-Q15 | parquet / none / none | 5.50   | 5.43        |   +1.32%   |   2.62%   |   3.34%        | 1           | 10    |
| TPCH(15) | TPCH-Q6  | parquet / none / none | 3.03   | 3.01        |   +0.61%   |   3.54%   |   2.14%        | 1           | 10    |
| TPCH(15) | TPCH-Q17 | parquet / none / none | 31.22  | 31.13       |   +0.29%   |   0.32%   |   0.49%        | 1           | 10    |
| TPCH(15) | TPCH-Q14 | parquet / none / none | 3.63   | 3.64        |   -0.21%   |   2.22%   |   2.70%        | 1           | 10    |
| TPCH(15) | TPCH-Q12 | parquet / none / none | 3.88   | 3.89        |   -0.31%   |   1.90%   |   1.82%        | 1           | 10    |
| TPCH(15) | TPCH-Q7  | parquet / none / none | 26.25  | 26.64       |   -1.50%   |   2.30%   |   2.40%        | 1           | 10    |
| TPCH(15) | TPCH-Q20 | parquet / none / none | 6.26   | 6.42        |   -2.45%   |   1.44%   |   1.81%        | 1           | 10    |
| TPCH(15) | TPCH-Q9  | parquet / none / none | 30.56  | 31.43       |   -2.77%   |   0.41%   |   0.64%        | 1           | 10    |
| TPCH(15) | TPCH-Q13 | parquet / none / none | 13.53  | 13.94       |   -3.00%   |   1.02%   |   0.50%        | 1           | 10    |
| TPCH(15) | TPCH-Q8  | parquet / none / none | 24.93  | 25.76       |   -3.22%   |   0.95%   |   1.00%        | 1           | 10    |
| TPCH(15) | TPCH-Q10 | parquet / none / none | 6.58   | 6.89        |   -4.50%   |   1.37%   |   1.24%        | 1           | 10    |
| TPCH(15) | TPCH-Q18 | parquet / none / none | 31.44  | 33.12       |   -5.05%   |   0.50%   |   0.66%        | 1           | 10    |
| TPCH(15) | TPCH-Q21 | parquet / none / none | 31.56  | 33.55       |   -5.92%   |   4.31%   |   5.01%        | 1           | 10    |
| TPCH(15) | TPCH-Q22 | parquet / none / none | 4.17   | 4.44        |   -5.98%   |   0.59%   |   0.75%        | 1           | 10    |
| TPCH(15) | TPCH-Q5  | parquet / none / none | 14.67  | 15.66       |   -6.34%   |   8.08%   |   1.13%        | 1           | 10    |
| TPCH(15) | TPCH-Q3  | parquet / none / none | 11.25  | 12.01       |   -6.38%   |   1.17%   |   0.85%        | 1           | 10    |
| TPCH(15) | TPCH-Q4  | parquet / none / none | 12.38  | 13.49       |   -8.19%   |   1.44%   |   0.70%        | 1           | 10    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Reviewed-on: http://gerrit.cloudera.org:8080/2896
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:51 -07:00
Tim Armstrong
01baf57aa4 Enable BOOST_NO_EXCEPTIONS for codegened code
BOOST_NO_EXCEPTIONS lets us provide an handler for errors instead of
having boost throw exceptions. This lets us crash the process in a
slightly nicer way and also greatly reduces the number of static
exception objects littering the cross-compiled IR module, which helps
with codegen time.

Also turn on colour diagnostics for cross-compiled clang (it's already
enabled for ASAN clang).

Change-Id: Iaff17b502a752963346b3a2f17fc58d22e778d50
Reviewed-on: http://gerrit.cloudera.org:8080/2909
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:51 -07:00