impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 03:18:15 -05:00

Author	SHA1	Message	Date
stiga-huang	374783c55e	IMPALA-10898: Add runtime IN-list filters for ORC tables ORC files have optional bloom filter indexes for each column. Since ORC-1.7.0, the C++ reader supports pushing down predicates to skip unreleated RowGroups. The pushed down predicates will be evaludated on file indexes (i.e. statistics and bloom filter indexes). Note that only EQUALS and IN-list predicates can leverage bloom filter indexes. Currently Impala has two kinds of runtime filters: bloom filter and min-max filter. Unfortunately they can't be converted into EQUALS or IN-list predicates. So they can't leverage the file level bloom filter indexes. This patch adds runtime IN-list filters for this purpose. Currently they are generated for the build side of a broadcast join. They will only be applied on ORC tables and be pushed down to the ORC reader(i.e. ORC lib). To avoid exploding the IN-list, if # of distinct values of the build side exceeds a threshold (default to 1024), we set the filter to ALWAYS_TRUE and clear its entry. The threshold can be configured by a new query option, RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT. Evaluating runtime IN-list filters is much slower than evaluating runtime bloom filters due to the current simple implementation (i.e. std::unorder_set) and the lack of codegen. So we disable it at row level. For visibility, this patch addes two counters in the HdfsScanNode: - NumPushedDownPredicates - NumPushedDownRuntimeFilters They reflect the predicates and runtime filters that are pushed down to the ORC reader. Currently, runtime IN-list filters are disabled by default. This patch extends the query option, ENABLED_RUNTIME_FILTER_TYPES, to support a comma separated list of filter types. It defaults to be "BLOOM,MIN_MAX". Add "IN_LIST" in it to enable runtime IN-list filters. Ran perf tests on a 3 instances cluster on my desktop using TPC-DS with scale factor 20. It shows significant improvements in some queries: +-----------+-------------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+--------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +-----------+-------------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+--------+ \| TPCDS(20) \| TPCDS-Q67A \| orc / snap / block \| 35.07 \| 44.01 \| I -20.32% \| 0.38% \| 1.38% \| 10 \| I -25.69% \| -3.58 \| -45.33 \| \| TPCDS(20) \| TPCDS-Q37 \| orc / snap / block \| 1.08 \| 1.45 \| I -25.23% \| 7.14% \| 3.09% \| 10 \| I -34.09% \| -3.58 \| -12.94 \| \| TPCDS(20) \| TPCDS-Q70A \| orc / snap / block \| 6.30 \| 8.60 \| I -26.81% \| 5.24% \| 4.21% \| 10 \| I -36.67% \| -3.58 \| -14.88 \| \| TPCDS(20) \| TPCDS-Q16 \| orc / snap / block \| 1.33 \| 1.85 \| I -28.28% \| 4.98% \| 5.92% \| 10 \| I -39.38% \| -3.58 \| -12.93 \| \| TPCDS(20) \| TPCDS-Q18A \| orc / snap / block \| 5.70 \| 8.06 \| I -29.25% \| 3.00% \| 4.12% \| 10 \| I -40.30% \| -3.58 \| -19.95 \| \| TPCDS(20) \| TPCDS-Q22A \| orc / snap / block \| 2.01 \| 2.97 \| I -32.21% \| 6.12% \| 5.94% \| 10 \| I -47.68% \| -3.58 \| -14.05 \| \| TPCDS(20) \| TPCDS-Q77A \| orc / snap / block \| 8.49 \| 12.44 \| I -31.75% \| 6.44% \| 3.96% \| 10 \| I -49.71% \| -3.58 \| -16.97 \| \| TPCDS(20) \| TPCDS-Q75 \| orc / snap / block \| 7.76 \| 12.27 \| I -36.76% \| 5.01% \| 3.87% \| 10 \| I -59.56% \| -3.58 \| -23.26 \| \| TPCDS(20) \| TPCDS-Q21 \| orc / snap / block \| 0.71 \| 1.27 \| I -44.26% \| 4.56% \| 4.24% \| 10 \| I -77.31% \| -3.58 \| -28.31 \| \| TPCDS(20) \| TPCDS-Q80A \| orc / snap / block \| 9.24 \| 20.42 \| I -54.77% \| 4.03% \| 3.82% \| 10 \| I -123.12% \| -3.58 \| -40.90 \| \| TPCDS(20) \| TPCDS-Q39-1 \| orc / snap / block \| 1.07 \| 2.26 \| I -52.74% \| * 23.83% * \| 2.60% \| 10 \| I -149.68% \| -3.58 \| -14.43 \| \| TPCDS(20) \| TPCDS-Q39-2 \| orc / snap / block \| 1.00 \| 2.33 \| I -56.95% \| * 19.53% * \| 2.07% \| 10 \| I -151.89% \| -3.58 \| -20.81 \| +-----------+-------------+--------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+--------+ "Base Avg" is the avg of the original time. "Avg" is the current time. However, we also see some regressions due to the suboptimal implementation. The follow-up JIRAs will focus on improvements: - IMPALA-11140: Codegen InListFilter::Insert() and InListFilter::Find() - IMPALA-11141: Use exact data types in IN-list filters instead of casting data to a set of int64_t or a set of string. - IMPALA-11142: Consider IN-list filters in partitioned joins. Tests: - Test IN-list filter on string, date and all integer types - Test IN-list filter with NULL - Test IN-list filter on complex exprs targets Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Reviewed-on: http://gerrit.cloudera.org:8080/18141 Reviewed-by: Qifan Chen <qchen@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-03-03 00:21:06 +00:00
Tim Armstrong	f38da0df8e	IMPALA-4400: aggregate runtime filters locally Move RuntimeFilterBank to QueryState(). Implement fine-grained locking for each filter to mitigate any increased lock contention from the change. Make RuntimeFilterBank handle multiple producers of the same filter, e.g. multiple instances of a partitioned join. It computes the expected number of filters upfront then sends the filter to the coordinator once all the local instances have been merged together. The merging can be done in parallel locally to improve latency of filter propagation. Add Or() methods to MinMaxFilter and BloomFilter, since we now need to merge those, not just the thrift versions. Update coordinator filter routing to expect only one instance of a filter from each producer backend and to only send one instance to each consumer backend (instead of sending one per fragment). Update memory reservations and estimates to be lower to account for sharing of filters between fragment instances. mt_dop plans are modified to show these shared and non-shared resources separately. Enable waiting for runtime filters for kudu scanner with mt_dop. Made min/max filters const-correct. Testing * Added unit tests for Or() methods. * Added some additional e2e test coverage for mt_dop queries * Updated planner tests with new estimates and reservation. * Ran a single node 3-impalad stress test with TPC-H kudu and TPC-DS parquet. * Ran exhaustive tests. * Ran core tests with ASAN. Perf * Did a single-node perf run on TPC-H with default settings. No perf change. * Single-node perf run with mt_dop=8 showed significant speedups: +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 10.14 \| -7.29% \| 5.05 \| -11.68% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 38.87 \| 38.44 \| +1.13% \| 7.17% \| * 10.92% * \| 20 \| +0.72% \| 0.72 \| 0.39 \| \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 4.28 \| 4.26 \| +0.50% \| 1.92% \| 1.09% \| 20 \| +0.03% \| 0.31 \| 1.01 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 2.32 \| 2.32 \| +0.05% \| 2.01% \| 1.89% \| 20 \| -0.03% \| -0.36 \| 0.08 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.73 \| 3.75 \| -0.42% \| 0.84% \| 1.05% \| 20 \| -0.25% \| -0.77 \| -1.40 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 9.80 \| 9.83 \| -0.38% \| 0.51% \| 0.80% \| 20 \| -0.32% \| -1.30 \| -1.81 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 1.98 \| 2.00 \| -1.32% \| 1.74% \| 2.81% \| 20 \| -0.64% \| -1.71 \| -1.79 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.22 \| 1.25 \| -2.14% \| 2.66% \| 4.15% \| 20 \| -0.96% \| -2.00 \| -1.95 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 5.13 \| 5.22 \| -1.65% \| 1.20% \| 1.40% \| 20 \| -1.76% \| -3.34 \| -4.02 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 2.46 \| 2.56 \| -4.13% \| 2.49% \| 1.99% \| 20 \| -4.31% \| -4.04 \| -5.94 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 81.63 \| 85.07 \| -4.05% \| 4.94% \| 3.06% \| 20 \| -5.46% \| -3.28 \| -3.21 \| \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 5.07 \| 5.50 \| I -7.92% \| 0.96% \| 1.33% \| 20 \| I -8.51% \| -5.27 \| -22.14 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 24.00 \| 26.24 \| I -8.57% \| 0.46% \| 0.38% \| 20 \| I -9.34% \| -5.27 \| -67.47 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 8.66 \| 9.50 \| I -8.86% \| 0.62% \| 0.44% \| 20 \| I -9.75% \| -5.27 \| -55.17 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 6.01 \| 6.70 \| I -10.19% \| 1.01% \| 0.90% \| 20 \| I -11.25% \| -5.27 \| -35.76 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 2.98 \| 3.39 \| I -12.23% \| 1.48% \| 1.48% \| 20 \| I -13.56% \| -5.27 \| -27.75 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.69 \| 2.00 \| I -15.55% \| 1.63% \| 1.47% \| 20 \| I -18.09% \| -5.27 \| -34.60 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.42 \| 2.87 \| I -15.69% \| 1.48% \| 1.26% \| 20 \| I -18.61% \| -5.27 \| -39.50 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 4.64 \| 6.27 \| I -26.02% \| 1.35% \| 0.73% \| 20 \| I -35.37% \| -5.27 \| -94.07 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 3.19 \| 4.37 \| I -27.01% \| 1.54% \| 0.99% \| 20 \| I -36.85% \| -5.27 \| -80.74 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 4.57 \| 6.39 \| I -28.36% \| 1.04% \| 0.75% \| 20 \| I -39.56% \| -5.27 \| -120.02 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 3.15 \| 4.71 \| I -33.06% \| 1.59% \| 1.31% \| 20 \| I -49.43% \| -5.27 \| -87.64 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 5.25 \| 7.95 \| I -33.95% \| 0.95% \| 0.53% \| 20 \| I -51.11% \| -5.27 \| -185.02 \| +----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+ Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Reviewed-on: http://gerrit.cloudera.org:8080/14538 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-29 00:58:24 +00:00
Fang-Yu Rao	702e6c4fa8	IMPALA-7984: Port runtime filter from Thrift RPC to KRPC Previously the aggregation and propagation of a runtime filter in Impala is implemented using Thrift RPC, which suffers from a disadvantage that the number of connections in a cluster grows with both the number of queries and cluster size. This patch ports the functions that implement the aggregation and propagation of a runtime filter, i.e., UpdateFilter() and PublishFilter(), respctively, to KRPC, which requires only one connection per direction between every pair of hosts, thus reducing the number of connections in a cluster. In addition, this patch also incorporates KRPC sidecar when the runtime filter is a Bloom filter. KRPC sidecar eliminates the need for an extra copy of the Bloom filter contents when a Bloom filter is serialized to be transmitted and hence reduces the serialization overhead. Due to the incorporation of KRPC sidecar, a SpinLock is also added to prevent a BloomFilter from being deallocated before its associated KRPC call finishes. Two related BE tests bloom-filter-test.cc and bloom-filter-benchmark.cc are also modified accordingly because of the changes to the signatures of some functions in BloomFilter. Testing: This patch has passed the exhaustive tests. Change-Id: I11a2f92a91750c2470fba082c30f97529524b9c8 Reviewed-on: http://gerrit.cloudera.org:8080/13882 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/14974 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2020-01-21 00:49:05 +00:00
Fang-Yu Rao	e716e76ccc	IMPALA-9154: Revert "IMPALA-7984: Port runtime filter from Thrift RPC to KRPC" The previous patch porting runtime filter from Thrift RPC to KRPC introduces a deadlock if there are a very limited number of threads on the Impala cluster. Specifically, in that patch a Coordinator used a synchronous KRPC to propagate an aggregated filter to other hosts. A deadlock would happen if there is no thread available on the receiving side to answer that KRPC especially the calling and receiving threads are called from the same thread pool. One possible way to address this issue is to make the call of propagating a runtime filter asynchronous to free the calling thread. Before resolving this issue, we revert this patch for now. This reverts commit `ec11c18884`. Change-Id: I32371a515fb607da396914502da8c7fb071406bc Reviewed-on: http://gerrit.cloudera.org:8080/14780 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-22 23:10:46 +00:00
Fang-Yu Rao	ec11c18884	IMPALA-7984: Port runtime filter from Thrift RPC to KRPC Previously the aggregation and propagation of a runtime filter in Impala is implemented using Thrift RPC, which suffers from a disadvantage that the number of connections in a cluster grows with both the number of queries and cluster size. This patch ports the functions that implement the aggregation and propagation of a runtime filter, i.e., UpdateFilter() and PublishFilter(), respctively, to KRPC, which requires only one connection per direction between every pair of hosts, thus reducing the number of connections in a cluster. In addition, this patch also incorporates KRPC sidecar when the runtime filter is a Bloom filter. KRPC sidecar eliminates the need for an extra copy of the Bloom filter contents when a Bloom filter is serialized to be transmitted and hence reduces the serialization overhead. Due to the incorporation of KRPC sidecar, a SpinLock is also added to prevent a BloomFilter from being deallocated before its associated KRPC call finishes. Two related BE tests bloom-filter-test.cc and bloom-filter-benchmark.cc are also modified accordingly because of the changes to the signatures of some functions in BloomFilter. Testing: This patch has passed the exhaustive tests. Change-Id: I6b394796d250286510e157ae326882bfc01d387a Reviewed-on: http://gerrit.cloudera.org:8080/13882 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-09 01:54:51 +00:00
Michael Ho	5391100c7e	IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC This change converts ReportExecStatus() RPC from thrift based RPC to KRPC. This is done in part of the preparation for fixing IMPALA-2990 as we can take advantage of TCP connection multiplexing in KRPC to avoid overwhelming the coordinator with too many connections by reducing the number of TCP connection to one for each executor. This patch also introduces a new service pool for all query execution control related RPCs in the future so that control commands from coordinators aren't blocked by long-running DataStream services' RPCs. To avoid unnecessary delays due to sharing the network connections between DataStream service and Control service, this change added the service name as part of the user credentials for the ConnectionId so each service will use a separate connection. The majority of this patch is mechanical conversion of some Thrift structures used in ReportExecStatus() RPC to Protobuf. Note that the runtime profile is still retained as a Thrift structure as Impala clients will still fetch query profiles using Thrift RPCs. This also avoids duplicating the serialization implementation in both Thrift and Protobuf for the runtime profile. The Thrift runtime profiles are serialized and sent as a sidecar in ReportExecStatus() RPC. This patch also fixes IMPALA-7241 which may lead to duplicated dml stats being applied. The fix is by adding a monotonically increasing version number for fragment instances' reports. The coordinator will ignore any report smaller than or equal to the version in the last report. Testing done: 1. Exhaustive build. 2. Added some targeted test cases for profile serialization failure and RPC retries/timeout. Change-Id: I7638583b433dcac066b87198e448743d90415ebe Reviewed-on: http://gerrit.cloudera.org:8080/10855 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-01 21:12:12 +00:00
Michael Ho	5c541b9604	Add missing authorization in KRPC In 2.12.0, Impala adopted Kudu RPC library for certain backened services (TransmitData(), EndDataStream()). While the implementation uses Kerberos for authenticating users connecting to the backend services, there is no authorization implemented. This is a regression from the Thrift based implementation because it registered a SASL callback (SaslAuthorizeInternal) to be invoked during the connection negotiation. With this regression, an unauthorized but authenticated user may invoke RPC calls to Impala backend services. This change fixes the issue above by overriding the default authorization method for the DataStreamService. The authorization method will only let authenticated principal which matches FLAGS_principal / FLAGS_be_principal to access the service. Also added a new startup flag --krb5_ccname to allow users to customize the locations of the Kerberos credentials cache. Testing done: 1. Added a new test case in rpc-mgr-kerberized-test.cc to confirm an unauthorized user is not allowed to access the service. 2. Ran some queries in a Kerberos enabled cluster to make sure there is no error. 3. Exhaustive builds. Thanks to Todd Lipcon for pointing out the problem and his guidance on the fix. Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5 Reviewed-on: http://gerrit.cloudera.org:8080/11331 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-30 04:06:09 +00:00
Michael Ho	421af4e40a	IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 19:21:46 +00:00
Lars Volker	3bfda33487	IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called "Data Stream Queued RPC Calls" until a receiver registers and takes over the early sender RPCs. Inside the receiver, memory for deferred RPCs is tracked against the fragment instance's memtracker until we unpack the batches and add them to the row batch queue. The DCHECK in MemTracker::Close() covers that all memory consumed by a tracker gets release eventually. In addition to that, this change adds a custom cluster test that makes sure that queued memory gets tracked by inspecting the peak consumption of the new memtrackers. Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Reviewed-on: http://gerrit.cloudera.org:8080/8914 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-01 08:53:36 +00:00
Michael Ho	b4ea57a7e3	IMPALA-4856: Port data stream service to KRPC This patch implements a new data stream service which utilizes KRPC. Similar to the thrift RPC implementation, there are 3 major components to the data stream services: KrpcDataStreamSender serializes and sends row batches materialized by a fragment instance to a KrpcDataStreamRecvr. KrpcDataStreamMgr is responsible for routing an incoming row batch to the appropriate receiver. The data stream service runs on the port FLAGS_krpc_port which is 29000 by default. Unlike the implementation with thrift RPC, KRPC provides an asynchronous interface for invoking remote methods. As a result, KrpcDataStreamSender doesn't need to create a thread per connection. There is one connection between two Impalad nodes for each direction (i.e. client and server). Multiple queries can multi-plex on the same connection for transmitting row batches between two Impalad nodes. The asynchronous interface also prevents avoids the possibility that a thread is stuck in the RPC code for extended amount of time without checking for cancellation. A TransmitData() call with KRPC is in essence a trio of RpcController, a serialized protobuf request buffer and a protobuf response buffer. The call is invoked via a DataStreamService proxy object. The serialized tuple offsets and row batches are sent via "sidecars" in KRPC to avoid extra copy into the serialized request buffer. Each impalad node creates a singleton DataStreamService object at start-up time. All incoming calls are served by a service thread pool created as part of DataStreamService. By default, the number of service threads equals the number of logical cores. The service threads are shared across all queries so the RPC handler should avoid blocking as much as possible. In thrift RPC implementation, we make a thrift thread handling a TransmitData() RPC to block for extended period of time when the receiver is not yet created when the call arrives. In KRPC implementation, we store TransmitData() or EndDataStream() requests which arrive before the receiver is ready in a per-receiver early sender list stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to when the receiver is created or when timeout occurs. Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr. If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit to exceed, the request will be stashed in a queue for deferred processing. The stashed RPC requests will not be responded to until they are processed so as to exert back pressure to the senders. An alternative would be to reply with an error and the request / row batches need to be sent again. This may end up consuming more network bandwidth than the thrift RPC implementation. This change adopts the behavior of allowing one stashed request per sender. All rpc requests and responses are serialized using protobuf. The equivalent of TRowBatch would be ProtoRowBatch which contains a serialized header about the meta-data of the row batch and two Kudu Slice objects which contain pointers to the actual data (i.e. tuple offsets and tuple data). This patch is based on an abandoned patch by Henry Robinson. TESTING ------- * Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true. TO DO ----- * Port some BE tests to KRPC services. Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Reviewed-on: http://gerrit.cloudera.org:8080/8023 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-09 20:05:08 +00:00

10 Commits