impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Author	SHA1	Message	Date
Michael Brown	4028e9c5ec	IMPALA-6759: align stress test memory estimation parse pattern The stress test never expected to see memory estimates on the order of PB. Apparently it can happen with TPC DS 10000, so update the pattern. It's not clear how to quickly write a test to catch this, because it involves crossing language boundaries and possibly having a massively-scaled dataset. I think leaving a comment in both places is good enough for now. Change-Id: I317c271888584ed2a817ee52ad70267eae64d341 Reviewed-on: http://gerrit.cloudera.org:8080/9846 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-29 03:27:25 +00:00
Michael Ho	77efb2820e	KUDU-2385: Fix typo in KinitContext::DoRenewal() On platforms without krb5_get_init_creds_opt_set_out_ccache(), krb5_cc_store_cred() is called to insert the newly acquired credential into the ccache. However, there was a typo in the code which resulted in inserting the old credential into ccache. This change fixes the typo to make sure the new credential is inserted into ccache. Testing done: confirmed on SLES11 that the new credential is being inserted by checking the 'auth time' of the ticket in ccache. Impala uses a slightly different #ifdef which explicitly checks if krb5_get_init_creds_opt_set_out_ccache() is defined on the platform so this code path is actually used when running Impala on SLES11. Change-Id: I3a22b8d41d15eb1982a3fd5b96575e28edaad31c Reviewed-on: http://gerrit.cloudera.org:8080/9840 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9842 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-29 01:40:06 +00:00
Bharath Vissapragada	2883c99500	IMPALA-6747: Automate diagnostics collection. This commit adds the necessary tooling to automate diagnostics collection for Impala daemons. Following diagnostics are supported. 1. Native core dump (+ shared libs) 2. GDB/Java thread dump (pstack + jstack) 3. Java heap dump (jmap) 4. Minidumps (using breakpad) * 5. Profiles Given the required inputs, the script outputs a zip compressed impala diagnostic bundle with all the diagnostics collected. The script can be run manually with the following command. python collect_diagnostics.py --help * minidumps collected here correspond to the state of the Impala process at the time this script is triggered. This is different from collect_minidumps.py which archives the entire minidump directory. Change-Id: Ib29caec7c3be5b6a31e60461294979c318300f64 Reviewed-on: http://gerrit.cloudera.org:8080/9815 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-29 00:12:18 +00:00
Michael Brown	2c0926e2de	Revert "IMPALA-6759: align stress test memory estimation parse pattern" This reverts commit `2521848753`.	2018-03-28 15:28:48 -07:00
Michael Brown	2521848753	IMPALA-6759: align stress test memory estimation parse pattern The stress test never expected to see memory estimates on the order of PB. Apparently it can happen with TPC DS 10000, so update the pattern. It's not clear how to quickly write a test to catch this, because it involves crossing language boundaries and possibly having a massively-scaled dataset. I think leaving a comment in both places is good enough for now. Change-Id: I08976f261582b379696fd0e81bc060577e552309	2018-03-28 15:27:10 -07:00
Tim Armstrong	84e30700f1	IMPALA-6694: fix "buffer pool" child profile order The bug is that child profiles can be re-ordered when being sent between an executor and a coordinator. This occurs if child profile A is present in one update, then another child profile B is inserted at a position before A and is sent to the coordinator in a subsequent update. The algorithm for merging profiles did not preserve the order in that case. The algorithm is fixed to preserve order when the relative order of child profiles is consistent between all updates. Testing: Added a targeted unit test. Change-Id: I230f0673edf20a846fdb13191b7a292d329c1bb8 Reviewed-on: http://gerrit.cloudera.org:8080/9749 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 21:21:36 +00:00
Michael Ho	421af4e40a	IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 19:21:46 +00:00
David Knupp	ccad17b066	IMPALA-6753: Update Hadoop versions in impala-config.sh Updates from 5.15.0-SNAPSHOT to 5.16.0-SNAPSHOT Cherry-picks: not for 2.x Change-Id: I0bbd41484a491990417565ed4dceabd12e29d4c5 Reviewed-on: http://gerrit.cloudera.org:8080/9836 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 07:49:55 +00:00
Bikramjeet Vig	4a39e7c29f	IMPALA-5980: Upgrade to LLVM 5.0.1 Highlighting a few changes in LLVM: - Minor changes to some function signatures - Minor changes to error handling - Split Bitcode/ReaderWriter.h - https://reviews.llvm.org/D26502 - Introduced an optional new GVN optimization pass. Needed to fix a bunch of new clang-tidy warnings. Testing: Ran core and ASAN tests successfully. Performance: Ran single node TPC-H and targeted perf with scale factor 60. Both improved on average. Identified regression in "primitive_filter_in_predicate" which will be addressed by IMPALA-6621. +-------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------+-----------------------+---------+------------+------------+----------------+ \| TARGETED-PERF(60) \| parquet / none / none \| 22.29 \| -0.12% \| 3.90 \| +3.16% \| \| TPCH(60) \| parquet / none / none \| 15.97 \| -3.64% \| 10.14 \| -4.92% \| +-------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| TARGETED-PERF(60) \| PERF_LIMIT-Q1 \| parquet / none / none \| 0.01 \| 0.00 \| R +156.43% \| * 25.80% * \| * 17.14% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_in_predicate \| parquet / none / none \| 3.39 \| 1.92 \| R +76.33% \| 3.23% \| 4.37% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_non_selective \| parquet / none / none \| 1.25 \| 1.11 \| +12.46% \| 3.41% \| 5.36% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_selective \| parquet / none / none \| 1.40 \| 1.25 \| +12.25% \| 3.57% \| 3.44% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_like \| parquet / none / none \| 16.87 \| 15.65 \| +7.78% \| 5.05% \| 0.37% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_min_max_runtime_filter \| parquet / none / none \| 1.79 \| 1.71 \| +4.77% \| 0.71% \| 1.73% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_2 \| parquet / none / none \| 0.60 \| 0.58 \| +3.64% \| 3.19% \| 3.81% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_selective \| parquet / none / none \| 0.95 \| 0.93 \| +2.91% \| 5.23% \| 5.85% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_3 \| parquet / none / none \| 4.33 \| 4.21 \| +2.83% \| 5.46% \| 3.25% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_lowndv \| parquet / none / none \| 4.59 \| 4.47 \| +2.82% \| 3.73% \| 1.14% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_3 \| parquet / none / none \| 0.20 \| 0.19 \| +2.65% \| 4.76% \| 2.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q1 \| parquet / none / none \| 2.49 \| 2.43 \| +2.31% \| 1.06% \| 1.93% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q6 \| parquet / none / none \| 2.04 \| 2.00 \| +2.09% \| 3.51% \| 2.80% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q3 \| parquet / none / none \| 12.37 \| 12.17 \| +1.62% \| 0.80% \| 2.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q5 \| parquet / none / none \| 4.52 \| 4.45 \| +1.54% \| 1.23% \| 1.08% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q6 \| parquet / none / none \| 2.95 \| 2.91 \| +1.33% \| 1.92% \| 1.67% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q4 \| parquet / none / none \| 3.71 \| 3.66 \| +1.26% \| 0.34% \| 0.53% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q1 \| parquet / none / none \| 18.69 \| 18.47 \| +1.19% \| 0.75% \| 0.31% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q7 \| parquet / none / none \| 8.15 \| 8.07 \| +0.99% \| 3.92% \| 1.58% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_highndv \| parquet / none / none \| 31.31 \| 31.01 \| +0.97% \| 1.74% \| 1.14% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q5 \| parquet / none / none \| 7.59 \| 7.53 \| +0.78% \| 0.38% \| 0.99% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q4 \| parquet / none / none \| 21.25 \| 21.09 \| +0.76% \| 0.76% \| 0.75% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_4 \| parquet / none / none \| 0.24 \| 0.24 \| +0.75% \| 3.14% \| 4.76% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q19 \| parquet / none / none \| 7.88 \| 7.82 \| +0.74% \| 2.39% \| 2.64% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_bigint \| parquet / none / none \| 5.10 \| 5.07 \| +0.61% \| 0.74% \| 0.54% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q3 \| parquet / none / none \| 3.61 \| 3.59 \| +0.60% \| 1.45% \| 0.90% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_all \| parquet / none / none \| 27.63 \| 27.48 \| +0.55% \| 0.85% \| 0.10% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q4 \| parquet / none / none \| 5.81 \| 5.79 \| +0.45% \| 1.65% \| 2.16% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q13 \| parquet / none / none \| 23.49 \| 23.43 \| +0.27% \| 0.83% \| 0.63% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q21 \| parquet / none / none \| 68.88 \| 68.76 \| +0.18% \| 0.22% \| 0.19% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_lowndv.test \| parquet / none / none \| 4.38 \| 4.37 \| +0.09% \| 2.45% \| 0.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_5 \| parquet / none / none \| 10.40 \| 10.40 \| +0.07% \| 0.77% \| 0.50% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_long_predicate \| parquet / none / none \| 222.37 \| 222.23 \| +0.06% \| 0.25% \| 0.25% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q8 \| parquet / none / none \| 10.65 \| 10.65 \| +0.03% \| 0.55% \| 1.40% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_one_to_many_string_with_groupby \| parquet / none / none \| 261.84 \| 261.87 \| -0.01% \| 0.91% \| 0.74% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q3 \| parquet / none / none \| 9.44 \| 9.45 \| -0.02% \| 0.92% \| 1.33% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q16 \| parquet / none / none \| 5.21 \| 5.21 \| -0.02% \| 1.46% \| 1.64% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_top-n_all \| parquet / none / none \| 34.58 \| 34.62 \| -0.11% \| 0.22% \| 0.19% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_topn_bigint \| parquet / none / none \| 4.24 \| 4.25 \| -0.13% \| 6.66% \| 2.03% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q2 \| parquet / none / none \| 3.23 \| 3.24 \| -0.34% \| 2.03% \| 0.32% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_1 \| parquet / none / none \| 0.18 \| 0.18 \| -0.40% \| 6.16% \| 2.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_broadcast \| parquet / none / none \| 46.27 \| 46.51 \| -0.52% \| 7.83% \| * 15.60% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_pk \| parquet / none / none \| 114.32 \| 114.92 \| -0.52% \| 0.24% \| 0.61% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q22 \| parquet / none / none \| 6.66 \| 6.70 \| -0.53% \| 1.39% \| 0.84% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q20 \| parquet / none / none \| 5.78 \| 5.81 \| -0.62% \| 1.25% \| 0.67% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q2 \| parquet / none / none \| 2.53 \| 2.55 \| -0.64% \| 3.86% \| 3.72% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q5 \| parquet / none / none \| 0.58 \| 0.58 \| -0.75% \| 0.99% \| 6.89% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q7 \| parquet / none / none \| 2.05 \| 2.07 \| -0.86% \| 2.16% \| 4.73% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_union_all_with_groupby \| parquet / none / none \| 54.86 \| 55.34 \| -0.87% \| 0.25% \| 0.66% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_2 \| parquet / none / none \| 7.52 \| 7.59 \| -0.98% \| 1.53% \| 1.73% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q9 \| parquet / none / none \| 36.43 \| 36.79 \| -1.00% \| 1.60% \| 7.39% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q1 \| parquet / none / none \| 2.79 \| 2.82 \| -1.10% \| 1.15% \| 2.25% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q11 \| parquet / none / none \| 1.95 \| 1.97 \| -1.18% \| 3.14% \| 2.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q2 \| parquet / none / none \| 10.98 \| 11.11 \| -1.24% \| 0.77% \| 1.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_small_join_1 \| parquet / none / none \| 0.22 \| 0.22 \| -1.34% \| * 13.03% * \| * 12.31% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q7 \| parquet / none / none \| 42.82 \| 43.41 \| -1.37% \| 1.63% \| 1.51% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_empty_build_join_1 \| parquet / none / none \| 3.30 \| 3.35 \| -1.54% \| 2.15% \| 1.27% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q6 \| parquet / none / none \| 10.34 \| 10.54 \| -1.81% \| 0.24% \| 2.02% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_highndv \| parquet / none / none \| 32.80 \| 33.46 \| -1.98% \| 1.29% \| 0.61% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_non_selective \| parquet / none / none \| 1.62 \| 1.67 \| -3.01% \| 0.79% \| 1.65% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_1 \| parquet / none / none \| 0.13 \| 0.14 \| -3.36% \| 8.66% \| * 12.66% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_shuffle \| parquet / none / none \| 84.92 \| 87.96 \| -3.46% \| 1.46% \| 1.50% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q12 \| parquet / none / none \| 6.98 \| 7.31 \| -4.57% \| 1.03% \| 7.13% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q18 \| parquet / none / none \| 47.54 \| 50.39 \| -5.64% \| 5.70% \| 5.53% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_non_selective \| parquet / none / none \| 0.88 \| 0.96 \| -7.81% \| 4.27% \| 5.97% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q15 \| parquet / none / none \| 8.14 \| 9.15 \| -11.09% \| 0.63% \| * 10.44% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q10 \| parquet / none / none \| 12.66 \| 14.28 \| -11.34% \| 4.32% \| 1.14% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q17 \| parquet / none / none \| 10.31 \| 12.59 \| -18.14% \| 0.65% \| 3.72% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_selective \| parquet / none / none \| 0.14 \| 0.19 \| I -27.60% \| * 32.55% * \| * 39.78% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q14 \| parquet / none / none \| 6.10 \| 11.00 \| I -44.55% \| 4.06% \| 3.84% \| 1 \| 5 \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ Change-Id: Ib0a15cb53feab89e7b35a56b67b3b30eb3e62c6b Reviewed-on: http://gerrit.cloudera.org:8080/9584 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 04:25:27 +00:00
Tim Armstrong	bc3e4fb376	IMPALA-6752: fix pip --no-binary usage The key facts here are: * --no-cache-dir is crucial because it prevents us pulling in a cached package compiled with the wrong compiler. * --no-binary takes a argument specifying the set of packages it should apply to. The latent bug was that we didn't provide an argument to --no-binary and it instead it took --no-index as the argument, which was a no-op because there are no packages of that name. IMPALA-6731 moved the arguments, and instead --no-cache-dir became the argument to --no-binary Testing: I could reliably reproduce the failure in my environment by deleting infra/python/env then running a test with impala-py.test. This patch is sufficient to solve it. Change-Id: I118738347ca537b2dddfa6142c3eb5608c49c2e0 Reviewed-on: http://gerrit.cloudera.org:8080/9829 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 04:14:40 +00:00
Philip Zeyliger	0812f8737a	IMPALA-4277: Switch to using Hadoop 3 and friends by default. Switches the default from MINICLUSTER_PROFILE=2 to MINICLUSTER_PROFILE=3. This change is separate from the preceding change which does all the heavy lifting, just for convenience. Cherry-picks: not for 2.x Change-Id: I424657abebe1c4d6c360b81dab42c2f7b54f8a3e Reviewed-on: http://gerrit.cloudera.org:8080/9743 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 03:30:03 +00:00
Gabor Kaszab	c8ad56f1d9	IMPALA-6699: Fix DST end time for Australian time zones In Australian time zones where Daylight Saving Time is used (except LHDT) DST should end at 3am on the first Sunday of April when the clock is set back to 2am. However, the current time zone DB contains wrong DST end time for them. This fix sets the DST end time to 3am for the mentioned time zones. Change-Id: I461cd4a9057dfebfe8dd85b568cba4f1e87ad215 Reviewed-on: http://gerrit.cloudera.org:8080/9724 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 03:18:19 +00:00
Zach Amsden	4851b03deb	IMPALA-6389: Make '\0' delimited text files work Initially I didn't want to fully implement this, as the metadata for these tables can't even be fully stored in Postgres; however after digging into some older documentation, it appears that the ASCII NUL character actually has been used as a field separator in various vendors CSV implementation. Therefore, this patch attempts to make things as non-broken as possible and allows \0 as a field or tuple delimiter. Collection column delimiters are not allowed to be \0, as they genuinly may not exist and we don't want to force special escaping on an arbitrary character. Note that the field delimiter must be distinct from the tuple delimiter when they both exist; if it is not, the effect will be that there is no field delimiter (this is actually possible with single column tables). Testing: Created a zero delimited table as described in the JIRA, using MySQL backed Hive metastore; ran select * from tab_separated on the table, updated the unit test. Change-Id: I4b6f38cbe3f1036f60efd31a31d82d0cd8f3d2a8 Reviewed-on: http://gerrit.cloudera.org:8080/9525 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 23:19:59 +00:00
njanarthanan	61b7808dfe	IMPALA-5886 & IMPALA-4812 Update run-tests.py to handle exit_code 5 exit_code for EE tests when no tests are collected.After this change return_code will be either 0 if no tests are expected to be collected (dry-run) and 1 if tests are expected to be collected but are not collected due to some error Testing: - Ran end-to-end shell, hs2 tests for IMPALA-5886 with debug statements to verify the exit_codes - Ran end-to-end shell tests with collect-only for IMPALA-4812 Change-Id: If82f974cc2d1e917464d4053563eaf4afc559150 Reviewed-on: http://gerrit.cloudera.org:8080/9494 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 22:02:12 +00:00
Philip Zeyliger	5d57ca5da2	IMPALA-4277: Pin org.glassfish:javax.el version. This pins a version of a transitive dependency that we likely don't even use. Doing so avoids an issue where Maven tries to talk to all configured repositories (including those transitively picked up in artifacts we depend on) to find the "latest" version, which can lead to errors if those repositories are unavailble. This works around the following message in particular: Failed to execute goal on project impala-frontend: Could not resolve dependencies for project org.apache.impala:impala-frontend:jar:0.1-SNAPSHOT: Failed to collect dependencies at org.apache.sentry:sentry-binding-hive:jar:2.0.0-cdh6.x-SNAPSHOT -> org.apache.hive.hcatalog:hive-hcatalog-server-extensions:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.hive.hcatalog:hive-hcatalog-core:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.hive:hive-cli:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.hive:hive-service:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.hive:hive-llap-server:jar:2.1.1-cdh6.x-SNAPSHOT -> org.apache.hbase:hbase-server:jar:2.0.0-cdh6.x-SNAPSHOT -> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from/to ... The alternative I considered was to blacklist the specific broken dependency, but I prefer this approach as it's more specific and less likely to cause trouble, even if left around. I tested this by running the build. Change-Id: If744ccca193f96e1998bbcc35403a09e0c83cc74 Reviewed-on: http://gerrit.cloudera.org:8080/9808 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 20:00:38 +00:00
Alex Rodoni	b5dcc031f7	IMPALA-6510: [DOCS] Remove refresh_after_connect Removed refresh_after_connect option from impala shell options. Removed the refresh_after_connect from INVALIDATE METADATA doc. Cherry-picks: not for 2.x Change-Id: I7bd49cb32a952362dcefc230d8feb1a7d6c13ea0 Reviewed-on: http://gerrit.cloudera.org:8080/9813 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 18:39:33 +00:00
Michael Ho	ee1b0fccf2	IMPALA-6728: Always use Kudu based kinit if FLAGS_use_krpc=true We rely on the KPRC logic to do the Kerberos authentication when KRPC is enabled. Therefore, when FLAGS_ues_krpc=true, we must always call kudu::security::InitKerberosForServer() to initialize the Kerberos related logic. This change makes Impala ignore FLAGS_use_kudu_kinit=false when FLAGS_use_krpc=true. Change-Id: Ia7086e5c9b460233e9e957f886141b3e6bba414b Reviewed-on: http://gerrit.cloudera.org:8080/9797 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 09:34:32 +00:00
Philip Zeyliger	7263c33ea7	Use "mvn -B" in builds to avoid dowloading progress bars in logs. Maven's batch (or non-interactive) mode prevents progress bar output when Maven is downloading artifacts, which isn't generally useful. Now that we keep Maven logs in logs/mvn/mvn.log, this makes them slightly more tidy. Change-Id: I5aa117272c2a86b63b0f9062099a4145324eb6fc Reviewed-on: http://gerrit.cloudera.org:8080/9792 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 04:04:28 +00:00
Michael Ho	fc1eb75da0	KUDU-2374: Add RpcContext::GetTimeReceived() This change adds RpcContext::GetTimeReceived() which returns the time at which the inbound call associated with the RpcContext was received. It's helpful to make this accessible to the RPC handlers for its own book-keeping purpose (e.g. reporting the average dispatch latency as part of query profile in Impala). Change-Id: I6b39c7f2ea856eccfdab8c1bb1433829e979ae13 Reviewed-on: http://gerrit.cloudera.org:8080/9796 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9807 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-27 02:22:24 +00:00
Lars Volker	37565e3812	IMPALA-6731: Use private index in bootstrap_virtualenv This change switches to using a private pypi index url when using a private pypi mirror. This allows to run the tests without relying on the public Python pypi mirrors. Some packages can not detect their dependencies correctly when they get installed together with the dependencies in the same call to pip. This change adds a second stage of package installation to separate these packages from their dependencies. It also adds a few missing packages and updates some packages to newer versions. Testing: Ran this on a box where I blocked DNS resolution to Python's upstream pypi. Change-Id: I85f75f1f1a305f3043e0910ab88a880eeb30f00b Reviewed-on: http://gerrit.cloudera.org:8080/9798 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-03-26 22:45:03 +00:00
Dan Hecht	66e5a1212a	IMPALA-6614: ClientRequestState should use HS2 TOperationState Currently it uses beeswax's QueryState enum, but the TOperationState is a superset. In order to remove dependencies on beeswax, and also set things up for a future change to use the TOperationState explicit CANCELED_STATE (see IMPALA-1262), migrate CLR to use TOperationState. The intent of this change is to make no client visible change. Change-Id: I36287eaf8f1dac23c306b470f95f379dfdc6bb5b Reviewed-on: http://gerrit.cloudera.org:8080/9501 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-26 22:40:44 +00:00
Michael Brown	cd939a2415	IMPALA-6715,IMPALA-6736: fix stress TPC workload selection IMPALA-6715: This commit IMPALA-6551: Change Kudu TPCDS and TPCH columns to DECIMAL added additional decimal_v2 queries to the stress test that amount to running the same query twice. This makes the binary search run incredibly slow. - Fix the query selection. Add additional queries that weren't matching before, like the tpcds-q[0-9]+a.test series. - Add a test that will at least ensure if testdata/workloads/tpc*/queries is modified, the stress test will still find the same number of queries for the given workload. There's no obvious place to put this test: it's not testing the product at all, so: - Add a new directory tests/infra for such tests and add it to tests/run-tests.py. - Move the test from IMPALA-6441 into tests/infra. Testing: - Core private build passed. I manually looked to make sure the moved and new tests ran. - Short stress test run. I checked the runtime info and saw the new TPCDS queries in the JSON. - While testing on hardware clusters down stream, I noticed... IMPALA-6736: TPC-DS Q67A is 10x more expensive to run without spilling than any other query. I fixed the --filter-query-mem-ratio option to work. This will still run Q67A during the binary search phase, but if a cluster is too small, the query will be skipped. Change-Id: I3e26b64d38aa8d63a176daf95c4ac5dee89508da Reviewed-on: http://gerrit.cloudera.org:8080/9758 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-26 22:26:12 +00:00
Vincent Tran	52a2996100	IMPALA-6641: Support more separators between date and time in default timestamp format This change adds support to the multi-space separator and the 'T' separator between the date and time component of a datetime string during a cast (x as timestamp). Testing: Added valid and invalid tests to expr-test.cc to validate the functionality during a cast. Change-Id: Id2ce3ba09256b3996170e42d42d49d12776cab97 Reviewed-on: http://gerrit.cloudera.org:8080/9725 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-26 19:49:43 +00:00
Fredy Wijaya	ef589727b9	IMPALA-6647: Add CREATE fine-grained privilege This patch allows executing CREATE statements by granting CREATE privilege. These are the new GRANT/REVOKE statements introduced at server and database scopes. GRANT CREATE on SERVER svr TO ROLE testrole; GRANT CREATE on DATABASE db TO ROLE testrole; REVOKE CREATE on SERVER svr FROM ROLE testrole; REVOKE CREATE on DATABASE db FROM ROLE testrole; Testing: - Ran front-end tests Cherry-picks: not for 2.x Change-Id: Id540e78fc9201fc1b4e6cac9b81ea54b8ae9eecd Reviewed-on: http://gerrit.cloudera.org:8080/9738 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-24 22:04:22 +00:00
Taras Bobrovytsky	8fec1911e5	IMPALA-6230, IMPALA-6468: Fix the output type of round() and related fns Before this patch, the output type of round() ceil() floor() trunc() was not always the same as the input type. It was also inconsistent in general. For example, round(double) returned an integer, but round(double, int) returned a double. After looking at other database systems, we decided that the guideline should be that the output type should be the same as the input type. In this patch, we change the behavior of the previously mentioned functions so that if a double is given then a double is returned. We also modify the rounding behavior to always round away from zero. Before, we were rounding towards positive infinity in some cases. Testinging: - Updated tests - Ran an exhaustive build which passed. Cherry-picks: not for 2.x Change-Id: I77541678012edab70b182378b11ca8753be53f97 Reviewed-on: http://gerrit.cloudera.org:8080/9346 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-24 04:43:01 +00:00
Vuk Ercegovac	2894884deb	IMPALA-6670: refresh lib-cache entries from plan When an impalad is in executor-only mode, it receives no catalog updates. As a result, lib-cache entries are never refreshed. A consequence is that udf queries can return incorrect results or may not run due to resolution issues. Both cases are caused by the executor using a stale copy of the lib file. For incorrect results, an old version of the method may be used. Resolution issues can come up if a method is added to a lib file. The solution in this change is to capture the coordinator's view of the lib file's last modified time when planning. This last modified time is then shipped with the plan to executors. Executors must then use both the lib file path and the last modified time as a key for the lib-cache. If the coordinator's last modified time is more recent than the executor's lib-cache entry, then the entry is refreshed. Brief discussion of alternatives: - lib-cache always checks last modified time + easy/local change to lib-cache - adds an fs lookup always. rejected for this reason - keep the last modified time in the catalog - bound on staleness is too loose. consider the case where fn's f1, f2, f3 are created with last modified times of t1, t2, t3. treat the fn's last modified time as a low-watermark; if the cache entry has a more recent time, use it. Such a scheme would allow the version at t2 to persist. An old fn may keep the state from converging to the latest. This could end up with strange cases where different versions of the lib are used across executors for a single query. In contrast, the change in this path relies on the statestore to push versions forward at all coordinators, so will push all versions at all caches forward as well. Testing: - added an e2e custom cluster test Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6 Reviewed-on: http://gerrit.cloudera.org:8080/9697 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-24 04:38:53 +00:00
Philip Zeyliger	783de170c9	IMPALA-4277: Support multiple versions of Hadoop ecosystem Adds support for building against two sets of Hadoop ecosystem components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE, which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for Hadoop 3, Hive 2, and so on). We intend (in a trivial follow-on change soon) to make 3 the new default and to explicitly deprecate 2, but this change only does not switch the default yet. We support both to facilitate a smoother transition, but support will be removed soon in the Impala 3.x line. The switch is done at build time, following the pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). Switching back and forth requires running 'cmake' again. Doing this at build-time avoids complicating the Java code with classloader configuration. There are relatively few incompatible APIs. This implementation encapsulates that by extracting some Java code into fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the pattern established by IMPALA-5184, but, to avoid a proliferation of directories, I've moved the Hive files into the same tree.) pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I consolidated the Hive changes into the same directory structure. For Maven, I introduced Maven "profiles" to handle the two cases where the dependencies (and exclusions) differ. These are driven by the $IMPALA_MINICLUSTER_PROFILE environment variable. For Sentry, exception class names changed. We work around this by adding "isSentry...(Exception)" methods with two different implementations. Sentry is also doing some odd shading, whereby some exceptions are "sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism to create a SentryAuthProvider is slightly different. The easiest way to see the differences is to run: diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java The Sentry work is based on a change by Zach Amsden. In addition, we recently added an explicit "refresh" permission. In Sentry 2, this required creating an ImpalaPrivilegeModel to capture that. It's a slight customization of Hive's equivalent class. For Parquet, the difference is even more mechanical. The package names gone from "parquet" to "org.apache.parquet". The affected code was extracted into ParquetHelper, but only one copy exists. The second copy is generated at build-time using sed. In the rare cases where we need to behave differently at runtime, MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates what version we were built aginst. One of the cases is the results expected by various frontend tests. I avoided the issue by translating one error string into another, which handled the diversion in one place, rather than complicating the several locations which look for "No FileSystem for scheme..." errors. The HBase APIs we use for splitting regions at test time changed. This patch includes a re-write of that code for the new APIs. This piece was contributed by Zach Amsden. To work with newer versions of dependencies, I updated the version of httpcomponents.core we use to 4.4.9. We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase binaries to s3://native-toolchain, and amended the shell scripts to launch the right things. There are minor mechanical differences. Some of this was based on earlier work by Joe McDonnell and Zach Amsden. Hive's logging is changed in Hive 2, necessitating creating a log4j2.properties template and using it appropriately. Furthermore, Hadoop3's new shell script re-writes do a certain amount of classpath de-duplication, causing some issues with locating the relevant logging configurations. Accomodations exist in the code to deal with that. parquet-filtering.test was updated to turn off stats filtering. Older Hive didn't write Parquet statistics, but newer Hive does. By turning off stats filtering, we test what the test had intended to test. For views-compatibility.test, it seems that Hive 2 has fixed certain bugs that we were testing for in Hive. I've added a HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that. For AuthorizationTest, different hive versions show slightly different things for extended output. To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing to see here. rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%) CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This was done manually, but without changing anything except what Java required in terms of accessibility and boilerplate. rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%) copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%) Testing: Ran core & exhaustive tests with both profiles. Cherry-picks: not for 2.x. Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 Reviewed-on: http://gerrit.cloudera.org:8080/9716 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-23 20:56:00 +00:00
Vuk Ercegovac	466d1266fd	IMPALA-6722: include fs prefix for udf test test_native_functions_race failed tests since it did not include a path prefix. The fix uses get_fs_path to include the fs prefix. Change-Id: I314d8c32e4bc3857aefd244b524fb6718d234f30 Reviewed-on: http://gerrit.cloudera.org:8080/9782 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-23 19:50:23 +00:00
David Knupp	08b60a15cc	IMPALA-6716: Store LDAP options as shell member variables When passing comamnd line options to a new instance of the ImpalaShell, we ususally transfer the options to member variables of that new instance. We weren't doing that with all of the LDAP-related options, even though we wanted to access them later. In some environments and under certain conditions, this could then lead to a NameError exception being thrown. This patch takes away any reliance on the original options object returned by parse_args() beyond the __init__() method of the ImpalaShell class, by tranferring all LDAP options to member variables. Also, a test has been added to exercise the code path where the exception had been occurring. Change-Id: I810850f569ef3f4487f7eeba81ca520dc955ac2e Reviewed-on: http://gerrit.cloudera.org:8080/9744 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-22 10:58:40 +00:00
Joe McDonnell	942de7219b	KUDU-2305: Limit sidecars to INT_MAX and fortify socket code Inspection of the code revealed some other local variables that could overflow with large messages. This patch takes two approaches to eliminate the issues. First, it limits the total size of the messages by limiting the total size of the sidecars to INT_MAX. The total size of the protobuf and header components of the message should be considerably smaller, so limiting the sidecars to INT_MAX eliminates messages that are larger than UINT_MAX. This also means that the sidecar offsets, which are unsigned 32-bit integers, are also safe. Given that FLAGS_rpc_max_message_size is limited to INT_MAX at startup, the receiver would reject any message this large anyway. This also helps with the networking codepath, as any given sidecar will have a size less than INT_MAX, so every Slice that interacts with Writev() is shorter than INT_MAX. Second, even with sidecars limited to INT_MAX, the headers and protobuf parts of the messages mean that certain messages could still exceed INT_MAX. This patch changes some of the sockets codepath to tolerate iovec's that reference more than INT_MAX bytes total. Specifically, it changes Writev()'s nwritten bytes to an int64_t for both TlsSocket and Socket. TlsSocket works because it is sending each Slice individually. The first change limited any given Slice to INT_MAX, so each individual Write() should not be impacted. For Socket, Writev() uses sendmsg(). It should do partial network sends to handle this case. Any Write() call specifies its size with a 32-bit integer, and that will not be impacted by this patch. Testing: - Modified TestRpcSidecarLimits() to verify that sidecars are limited to INT_MAX bytes. - Added a test mode to TestRpcSidecarLimits() where it overrides rpc_max_message_size and sends the maximal message. This verifies that the client send codepath can handle the maximal message. Reviewed-on: http://gerrit.cloudera.org:8080/9601 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org> Changes from Kudu version: - Updated declaration of FLAGS_rpc_max_message_size in rpc-mgr.cc and added a warning not to set it larger than INT_MAX. Change-Id: I469feff940fdd07e1e407c9df49de79ed303151e Reviewed-on: http://gerrit.cloudera.org:8080/9748 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-22 10:27:08 +00:00
Philip Zeyliger	4e4cbc6a72	IMPALA-6704: Skip config validations in session-expiry-test. Just like expr-test, we can skip config checking when creating the InProcessImpalaServer in session-expiry-test. This fixes an issue where the test would fail when there is no minicluster. (The test itself would actually race and only fail sometimes, on some machines.) Testing: ran session-expiry-test. Change-Id: Ieff6d4c2451ad26925dacfcdaa337b9e8d32c39d Reviewed-on: http://gerrit.cloudera.org:8080/9746 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-22 03:12:56 +00:00
Tim Armstrong	588e1d46e9	IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner Impala already supported RLE encoding for levels and dictionary pages, so the only task was to integrate it into BoolColumnReader. A new benchmark, rle-benchmark.cc is added to test the speed of RLE decoding for different bit widths and run lengths. There might be a small performance impact on PLAIN encoded booleans, because of the additional branch when the cache of BoolColumnReader is filled. As the cache size is 128, I considered this to be outside the "hot loop". Testing: As Impala cannot write RLE encoded bool columns at the moment, parquet-mr was used to create a test file, testdata/data/rle_encoded_bool.parquet tests/query_test/test_scanners.py#test_rle_encoded_bools creates a table that uses this file, and tries to query from it. Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f Reviewed-on: http://gerrit.cloudera.org:8080/9403 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-22 02:47:33 +00:00
Tianyi Wang	d03b66ca35	IMPALA-6394: Restart HDFS only when no replication progress is made In wait-hdfs-replication, the frequent and eager restart might slow the HDFS replication down. HDFS should be restarted only if no progress is made in a certain amount of time, and we should wait longer before failing the data loading. Testing: It's tested with a fake HDFS fsck script. Change-Id: Ib059480254643dc032731b4b3c55204a93b61e77 Reviewed-on: http://gerrit.cloudera.org:8080/9698 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-22 00:41:16 +00:00
Michael Ho	c9cb5935ee	IMPALA-6713: Fix format string error in Sorter This change fixes a format string which incorrectly used $2 instead of $1 when there are only two arguments to the format string. Change-Id: Icdaa781ced755c896cdc9a1fff690811a59e0492 Reviewed-on: http://gerrit.cloudera.org:8080/9740 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 23:14:42 +00:00
Sailesh Mukil	da3617899c	IMPALA-6691: KRPC w/ kerberos fails on SLES11 The Kerberos version used in SLES 11 seems to have quite a few undocumented bugs. They have krb5-1.6 (krb5-client-1.6.3-133.49.112.1.x86_64). With KRPC we see a new error "GSSAPI Error: A required input parameter could not be read", which we've never seen before. I looked into the krb5 codebase and between krb5-1.6 and krb5-1.7, the code causing the above error (GSSAPI Error: A required input parameter could not be read) has changed subtly without any explanation as to why. That error string corresponds to GSS_S_CALL_INACCESSIBLE_READ. In 1.6, it returns an error if the 'input_token_buffer' string is empty. krb5-1.6: https://github.com/krb5/krb5/blob/krb5-1.6/src/lib/gssapi/mechglue/g_accept_sec_context.c#L149-L150 In 1.7, it returns an error only if the 'input_token_buffer' string is NULL. krb5-1.7: https://github.com/krb5/krb5/blob/krb5-1.7/src/lib/gssapi/mechglue/g_accept_sec_context.c#L149-L150 With KRPC, we test if Kerberos works by passing an empty string to SASL: https://github.com/apache/impala/blob/master/be/src/kudu/rpc/server_negotiation.cc#L289 In 1.6, this is counted as an error, but in 1.7, this is completely fine. I'm not sure why since they haven't documented it. We can attempt to get KRPC working for SLES11 by removing the PreflightGSSAPI() check for any kerberos version < 1.6. A function that is unavailable on krb-1.6 is krb5_get_init_creds_opt_set_fast_ccache_name(), and it is available from krb-1.7 onwards. The PreflightCheckGSSAPI() is compiled in only if this function exists. (However there may be more issues on SLES11 that we're not yet aware of) Change-Id: Ic4cc7f0702f605fca02a2ff5d3d2735e6e080668 Reviewed-on: http://gerrit.cloudera.org:8080/9696 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 22:19:00 +00:00
Vincent Tran	2c1fbecc9f	IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos. This change adds an impala-shell option -b / --kerberos_host_fqdn. This allows user to optionally specify the load-balancer's host so that impala-shell will accept a direct connection to impala daemons in a kerberized cluster. Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9 Reviewed-on: http://gerrit.cloudera.org:8080/7241 Reviewed-by: <andy@phdata.io> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:45:48 +00:00
Alex Rodoni	d972614881	[DOCS] Fixed a typo at line #120 Change-Id: I3f726889071950bc7025079ba9be90fd5d71bc9c Reviewed-on: http://gerrit.cloudera.org:8080/9742 Reviewed-by: John Russell <jrussell@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:34:22 +00:00
Gabor Kaszab	595421e77d	IMPALA-3866: Improve error reporting for scratch write errors The error messages coming from DiskIoMgr::Write() are enhanced by this change. A mapping is introduced between the errno set by open(), fdopen(), fseek(), fwrite() or fclose() low level functions and an error message for displaying purposes. If any of these functions fail then the returned error message is taken from this mapping. In addition there were two functions, NewFile() and FileAllocateSpace() that always returned Status::OK(). I made them void and removed the status checks from the call sites. For testing purposes a fault injection mechanism is introduced to simulate the cases when the above mentioned functions fail. Change-Id: I5aa7b424209b1a5ef8dc7d04c5ba58788e91aad7 Reviewed-on: http://gerrit.cloudera.org:8080/9420 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:31:36 +00:00
Tim Armstrong	8db3fb2cc7	IMPALA-6669: Remove Parquet NeedsSeedingForBatchedReading() I noticed that we could remove this part of the interface and instead do the "seeding" in ParquetColumnReader::Read*ValueBatch(). It should be easier to understand with level reading and consumption happening driven by the same function instead of split between files. Testing: Ran core tests. This code path should be thoroughly exercised by the regular scanner tests. Change-Id: I98f65d0d72e86b1e3db1f3543a03873afb9da062 Reviewed-on: http://gerrit.cloudera.org:8080/9636 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 20:14:02 +00:00
Zoltan Borok-Nagy	644af57b56	IMPALA-6415: [DOCS] fix invalid <ph conref> The current conref returned an error and the conref text was not rendered. In impala_aliases.xml, the current <ph conref> has to be changed to <p conref>. Cherry-picks: not for 2.x Change-Id: I6768f336559eeac41f7f32f989d106740eccdc88 Reviewed-on: http://gerrit.cloudera.org:8080/9731 Reviewed-by: Alex Rodoni <arodoni@cloudera.com> Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 17:50:35 +00:00
Alex Rodoni	c38a3a0307	[DOCS] Publish Choosing the Load-Balancing Algorithm topic Change-Id: I7c29ddac53c0fb4cc6a297444401e50280b95167 Reviewed-on: http://gerrit.cloudera.org:8080/9515 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 17:04:49 +00:00
Philip Zeyliger	be2e226a6c	Make sys-info-test pass in Docker containers. If we can't identify a device id for /home, don't crash the test. A recent commit for "IMPALA-6500: gracefully handle invalid sched_getcpu() values" added tests for DiskInfo::device_name. These tests happen to fail in Docker containers, where you can have directories that don't get mapped to a device. Impala is dealing with this reasonably (returning -1), so this amends the test. Testing: ran test inside of Docker and out. Change-Id: I6cfa5fd9feb1c75fbb7bd70ad952ef1650d8b69f Reviewed-on: http://gerrit.cloudera.org:8080/9723 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 06:44:39 +00:00
Thomas Tauber-Marshall	d7c8902dc8	IMPALA-6687: Fix INSERT with mixed case partition column name For INSERT/UPSERT where a column permutation is specified but some columns are excluded, Impala would fail with an AnalysisException if a partition column for an HDFS table specified in the column permutation, i.e. not in the partition clause (or equivalently, a row-key column for an HBase table or a primary key for a Kudu table) was specified with some upper case letters. Testing: - Added analysis tests for all of the above scenarios. Change-Id: If6975c2978850381904a45107f76850640aff52e Reviewed-on: http://gerrit.cloudera.org:8080/9728 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 03:20:26 +00:00
Bikramjeet Vig	3d65f856f7	IMPALA-6621: Improve set lookup performance for in-predicate evaluation Currently when using a SET_LOOKUP strategy for in-predicates in impala we use a std:set object for checking membership. This patch takes a hybrid approach based on benchmarking results and uses boost::flat_set for int, big int, and float datatypes and boost::unordered_set for the rest (tiny int, small int, double, string, timestamp, decimal). The intent of this change is to fix a regression when upgrading the toolchain to use LLVM 5.0.1 (IMPALA-5980). Performance: Ran a query for each data type with a large in predicate containing 500 elements on a single node with mt_dop set to 1. +-----------+---------------+----------+---------------+----------+ \| Data Type \| Llvm 3 hybrid \| Llvm 3 \| Llvm 5 hybrid \| Llvm 5 \| +-----------+---------------+----------+---------------+----------+ \| Table used: tpch100_parquet.lineitem \| +-----------+---------------+----------+--------------+-----------+ \| big int \| 17s782ms \| 13s941ms \| 13s201ms \| 25s604ms \| \| string \| 40s750ms \| 64s \| 40s723ms \| 73s \| \| decimal \| 13s929ms \| 22s272ms \| 13s710ms \| 34s338ms \| \| int \| 19s368ms \| 11s308ms \| 9s169ms \| 15s254ms \| +-----------+---------------+----------+--------------+-----------+ \| Table used: alltypes with 33638400 rows \| +-----------+---------------+----------+--------------+-----------+ \| double \| 5s726ms \| 5s894ms \| 5s595ms \| 6s592ms \| \| small int \| 4s776ms \| 5s057ms \| 4s740ms \| 5s358ms \| \| float \| 7s223ms \| 6s397ms \| 6s287ms \| 6s926ms \| +-----------+---------------+----------+---------------+----------+ Also added a targeted perf query that uses a large in-predicate over a decimal column. Testing: - Ran expr-test and test_exprs successfully. Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df Reviewed-on: http://gerrit.cloudera.org:8080/9570 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-21 00:40:10 +00:00
Fredy Wijaya	1d38c584ae	IMPALA-6643: Add REFRESH fine-grained privilege Before this patch, ALL privilege was required to execute INVALIDATE METADATA and having any privilege allowed executing REFRESH <table> and INVALIDATE METADATA <table>. With this patch, REFRESH privilege is now required to execute INVALIDATE METADATA or REFRESH statement. These are the new GRANT/REVOKE statements introduced at server, database, and table scopes. GRANT REFRESH on SERVER svr TO ROLE testrole; GRANT REFRESH on DATABASE db TO ROLE testrole; GRANT REFRESH on TABLE db.tbl TO ROLE testrole; REVOKE REFRESH on SERVER svr FROM ROLE testrole; REVOKE REFRESH on DATABASE db FROM ROLE testrole; REVOKE REFRESH on TABLE db.tbl FROM ROLE testrole; Testing: - Ran front-end tests Cherry-picks: not for 2.x Change-Id: I4c3c5a51fe493d39fd719c7a388d4d5760049ce4 Reviewed-on: http://gerrit.cloudera.org:8080/9589 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-20 20:37:35 +00:00
davidxdh	e6aadc9c35	IMPALA-6610: Improve LDAP auth fail warning message in impala-shell The value of LDAP password in Impala shell contains extra line break causes authentication failure, but the user can't detect the cause of the failure. I fixed the issue by adding inspection to the password for common pitfalls and issuing a warning in the shell when authentication fails. Change-Id: Ie570166aea62af223905b7f0124e9efb15a88ac7 Reviewed-on: http://gerrit.cloudera.org:8080/9506 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-20 06:47:30 +00:00
Lars Volker	4fbe4cb208	IMPALA-6697: Downgrade setuptools to be compatible with Python 2.6 Change-Id: I0d4727b7a5911269b82287ed9ce759f1e211f386 Reviewed-on: http://gerrit.cloudera.org:8080/9713 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-03-18 23:31:17 +00:00
Lars Volker	b1ef7de0e7	IMPALA-6695: Fix PyPi regex, update setuptools version pytest-runner, which is required by kudu-python requires are more recent version of setuptools. Adding an explicit dependency required an update to the regular expression to parse PyPi URLs. Change-Id: Ia67189f81a31a9a5a0ed80cd4d6661762ef427b2 Reviewed-on: http://gerrit.cloudera.org:8080/9711 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-03-18 16:39:32 +00:00
Philip Zeyliger	5c8da5d13a	Consistently use Java 1.7 compiler. We use Java 1.7 in fe/pom.xml, where most of our Java code is. For consistency, this updates the rest of our Maven configurations to use the same version of Java. A change I'm working with uses try-with-resources in HBase splitting, which is how I ran into this. Testing: ran core tests Change-Id: I6cecddf367f00185a14a8b08c03456e3b756bd70 Reviewed-on: http://gerrit.cloudera.org:8080/9600 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-17 04:08:53 +00:00
Tim Armstrong	e148c1a7c3	IMPALA-6589: remove invalid DCHECK in parquet reader The DCHECK was only valid if the Parquet file metadata is internally consistent, with the number of values reported by the metadata matching the number of encoded levels. The DCHECK was intended to directly detect misuse of the RleBatchDecoder interface, which would lead to incorrect results. However, our other test coverage for reading Parquet files is sufficient to test the correctness of level decoding. Testing: Added a minimal corrupt test file that reproduces the issue. Change-Id: Idd6e09f8c8cca8991be5b5b379f6420adaa97daa Reviewed-on: http://gerrit.cloudera.org:8080/9556 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-17 02:52:19 +00:00

1 2 3 4 5 ...

6789 Commits