Commit Graph

20 Commits

Author SHA1 Message Date
Attila Jeges
c8aa5796d9 IMPALA-10879: Add parquet stats to iceberg manifest
This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Reviewed-on: http://gerrit.cloudera.org:8080/17806
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
2021-09-02 21:34:41 +00:00
Csaba Ringhofer
c65d7861d9 IMPALA-10656: Fire insert events before commit
Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Reviewed-on: http://gerrit.cloudera.org:8080/17313
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-04-27 00:41:05 +00:00
wzhou-code
b5e2a0ce2e IMPALA-9224: Blacklist nodes with faulty disk for spilling
This patch extends blacklist functionality by adding executor node to
blacklist if a query fails caused by disk failure during spill-to-disk.
Also classifies disk error codes and defines a blacklistable error set
for non-transient disk errors. Coordinator blacklists executor only if
the executor hitted blacklistable error during spill-to-disk.

Adds a new debug action to simulate disk write error during spill-to-
disk. To use, specify in query options as:
  'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>'

  where <hostname> and <port> represent the impalad which execute the
  fragment instances, <port> is the BE krpc port (default 27000).

Adds new test cases for blacklist and query-retry to cover the code
changes.

Testing:
 - Passed new test cases.
 - Passed exhaustive test.
 - Manually simulated disk failures in scratch directories on nodes
   of a cluster, verified that the nodes were blacklisted as
   expected.

Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437
Reviewed-on: http://gerrit.cloudera.org:8080/16949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-02-04 05:12:42 +00:00
Fucun Chu
4099a60689 IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-12-22 06:10:39 +00:00
Tim Armstrong
9429bd779d IMPALA-9382: part 2/3: aggregate profiles sent to coordinator
This reworks the status reporting so that serialized
AggregatedRuntimeProfile objects are sent from executors
to coordinators. These profiles are substantially denser
and faster to process for higher mt_dop values. The aggregation
is also done in a single step, merging the aggregated thrift
profile from the executor directly into the final aggregated
profile, instead of converting it to an unaggregated profile
first.

The changes required were:
* A new Update() method for AggregatedRuntimeProfile that
  updates the profile from a serialised AggregateRuntimeProfile
  for a subset of the instances. The code is generalized from the
  existing InitFromThrift() code path.
* Per-fragment reports included in the status report protobuf
  when --gen_experimental_profile=true.
* Logic on the coordinator that either consumes serialized
  AggregatedRuntimeProfile per fragment, when
  --gen_experimental_profile=true, or consumes a serialized
  RuntimeProfile per finstance otherwise.

This also adds support for event sequences and time series
in the aggregated profile, so the amount of information
in the aggregated profile is now on par with the basic profile.

We also finish off support for JSON profile. The JSON profile is
more stripped down because we do not need to round-trip profiles
via JSON and it is a much less dense profile representation.

Part 3 will clean up and improve the display of the profile.

Testing:
* Add sanity tests for aggregated runtime profile.
* Add unit tests to exercise aggregation of the various counter types
* Ran core tests.

Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca
Reviewed-on: http://gerrit.cloudera.org:8080/16057
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-26 06:50:41 +00:00
wzhou-code
1af60a1560 IMPALA-9180 (part 3): Remove legacy backend port
The legacy Thrift based Impala internal service has been removed so
the backend port 22000 can be freed up.

This patch set flag be_port as a REMOVED_FLAG and all infrastructures
around it are cleaned up. StatestoreSubscriber::subscriber_id is set
as hostname + krpc_port.

Testing:
 - Passed the exhaustive test.

Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6
Reviewed-on: http://gerrit.cloudera.org:8080/16533
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-03 00:56:26 +00:00
Zoltan Borok-Nagy
981ef10465 IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
This commit adds support for INSERT INTO statements against Iceberg
tables when the table is non-partitioned and the underlying file format
is Parquet.

We still use Impala's HdfsParquetTableWriter to write the data files,
though they needed some modifications to conform to the Iceberg spec,
namely:
 * write Iceberg/Parquet 'field_id' for the columns
 * TIMESTAMPs are encoded as INT64 micros (without time zone)

We use DmlExecState to transfer information from the table sink
operators to the coordinator, then updateCatalog() invokes the
AppendFiles API to add files atomically. DmlExecState is encoded in
protobuf, communication with the Frontend uses Thrift. Therefore to
avoid defining Iceberg DataFile multiple times they are stored in
FlatBuffers.

The commit also does some corrections on Impala type <-> Iceberg type
mapping:
 * Impala TIMESTAMP is Iceberg TIMESTAMP (without time zone)
 * Impala CHAR is Iceberg FIXED

Testing:
 * Added INSERT tests to iceberg-insert.test
 * Added negative tests to iceberg-negative.test
 * I also did some manual testing with Spark. Spark is able to read
   Iceberg tables written by Impala until we use TIMESTAMPs. In that
   case Spark rejects the data files because it only accepts TIMESTAMPS
   with time zone.
 * Added concurrent INSERT tests to test_insert_stress.py

Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Reviewed-on: http://gerrit.cloudera.org:8080/16545
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-26 20:01:09 +00:00
wzhou-code
01316ad1c9 IMPALA-10154: Fix data race on coord_backend_id in TSAN build
This issue was introduced by the patch for IMPALA-5746.
QueryState::exec_rpc_params_.coord_backend_id is set in function
QuestState::Init(), but it could be accessed by QueryExecMgr object in
QueryExecMgr::CancelQueriesForFailedCoordinators() before or during
QueryState::Init() is called, hence cause data race.
To fix it, move coord_backend_id from class ExecQueryFInstancesRequestPB
to class TQueryCtx. QueryState::query_ctx_ is a constant variable and is
set in QueryState c'tor so that QueryState::query_ctx_.coord_backend_id
is valid and will not be changed once the QuestState object is created.

Testing:
 - Passed tests/custom_cluster/test_process_failures.py.
 - Passed the core tests for normal build.
 - Passed the core tests against a TSAN build.

Change-Id: I1c4b51e741a28b80bf3485adff8c97aabe0a3f67
Reviewed-on: http://gerrit.cloudera.org:8080/16437
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-09-14 21:50:37 +00:00
wzhou-code
9d43cfdaee IMPALA-5746: Cancel all queries scheduled by failed coordinators
Executor registers the updating of cluster membership. When coordinators
are absence from the active cluster membership list, executer cancels
all the running fragments of the queries which are scheduled by the
inactive coordinators since the executer cannot send results back to
the inactive/failed coordinators. This makes executers quickly release
the resources allocated for those running fragments to be cancelled.

Testing:
- Added new test case TestProcessFailures::test_kill_coordinator
  and ran the test case as following command:
    ./bin/impala-py.test tests/custom_cluster/test_process_failures.py\
      ::TestProcessFailures::test_kill_coordinator \
      --exploration_strategy=exhaustive.
- Passed the core test.

Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Reviewed-on: http://gerrit.cloudera.org:8080/16215
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-07-31 21:39:08 +00:00
Tim Armstrong
67b4764853 IMPALA-9752: aggregate profile stats on executor
Before this change the coordinator depended on getting the full
fragment instance profiles from executors to pull out various
things. This change removes that dependency by pulling out the
information on the executor, and including it in the status
report protobuf. This should slightly reduce the amount of work
done on the coordinator, but more importantly, makes it easier
to switch to sending aggregated profiles from executor to
coordinator, because the coordinator no longer depends on
receiving individual instance profiles.

Per-host peak memory is included directly in the status report.

Per-backend stats - where the per-backend total is needed -
are aggregated on the executor with the result included in the
status report. These counters are: BytesRead, ScanRangesComplete,
TotalBytesSent, TotalThreads{User,Sys}Time.

One subtlety to keep in mind that status reports don't include
stats for instances where the final update was sent in a previous
status report. So the executor needs to ensure that stats for
finished fragment instances are included in updates. This is
achieved by caching those values in FragmentInstanceState.

The stats used in the exec summary were previously also plucked
out of the profile on the coordinator. This change moves the work
to the executor, and includes the per-node stats in the status
report.

I did a little cleanup of the profile counter declarations,
making sure they were consistently inside the impala namespace
in the files that I touched.

Testing:
Ran core tests.

Manually checked exec summary, query profiles and backends
page for a running query.

Change-Id: Ia2aca354d803ce78a798a1a64f9f98353b813e4a
Reviewed-on: http://gerrit.cloudera.org:8080/16050
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-12 04:01:37 +00:00
Thomas Tauber-Marshall
b67c0906f5 IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf
The new admission control service will be written in protobuf, so
there are various admission control related structures currently
stored in Thrift that it would be convenient to convert to protobuf,
to minimize the amount of converting back and forth that needs to be
done.

This patch converts some portions of TExecPlanFragmentInfo to
protobuf. TExecPlanFragmentInfo is sent as a sidecar with the Exec()
rpc, so the refactored parts are now just directly included in the
ExecQueryFInstancesRequestPB.

The portions that are converted are those that are part of the
QuerySchedule, in particular the TPlanFragmentDestination,
TScanRangeParams, and TJoinBuildInput.

This patch is just a refactor and doesn't contain any functional
changes.

One notable related change is that DataSink::CreateSink() has two
parameters removed - TPlanFragmentCtx (which no longer exists) and
TPlanFragmentInstanceCtx. These variables and the new PB eqivalents
are available via the RuntimeState that was already being passed in as
another parameter and don't need to be individually passed in.

Testing:
- Passed a full run of existing tests.
- Ran the single node perf test and didn't detect any regressions.

Change-Id: I3a8e46767b257bbf677171ac2f4efb1b623ba41b
Reviewed-on: http://gerrit.cloudera.org:8080/15844
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-29 01:26:58 +00:00
Sahil Takiar
7f7743dcc6 IMPALA-9296: Move AuxErrorInfo to StatefulStatus
This patch moves AuxErrorInfoPB from FragmentInstanceExecStatusPB to
StatefulStatusPB. This is necessary because if the report with the
AuxErrorInfoPB is dropped (e.g. due to backpressure at the Coordinator
or a flaky network), the next report won't contain the AuxErrorInfoPB,
and the error info will be lost. StatefulStatus solves this by detecting
any reports that may not have been received by the Coordinator, and
re-transmitting any StatefulStatuses that were not successfully
delivered.

This change also makes the setting of AuxErrorInfoPB stateful, so that
the error info only shows up in one report and is then dropped from the
RuntimeState.

Change-Id: Iabbb48dd3ab58ba7b76b1ab6979b92d0e25e72e3
Reviewed-on: http://gerrit.cloudera.org:8080/15046
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-18 02:05:45 +00:00
Sahil Takiar
8a4fececcf IMPALA-9137: Blacklist node if a DataStreamService RPC to the node fails
Introduces a new optional field to FragmentInstanceExecStatusPB:
AuxErrorInfoPB. AuxErrorInfoPB contains optional metadata associated
with a failed fragment instance. Currently, AuxErrorInfoPB only contains
one field: RPCErrorInfoPB, which is only set if the fragment failed
because a RPC to another impalad failed. The RPCErrorInfoPB contains
the destination node of the failed RPC and the posix error code of the
failed RPC.

Coordinator::UpdateBackendExecStatus(ReportExecStatusRequestPB, ...)
uses the information in RPCErrorInfoPB (if one is set) to blacklist
the target node. While RPCErrorInfoPB::dest_node can be set to the address
of the Coordinator, the Coordinator will not blacklist itself. The
Coordinator only blacklists the node if the RPC failed with a specific
error code (currently either ENOTCONN, ECONNREFUSED, ESHUTDOWN).

Testing:
* Ran core tests
* Added new test to test_blacklist.py

Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c
Reviewed-on: http://gerrit.cloudera.org:8080/14677
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-12-20 02:50:46 +00:00
Thomas Tauber-Marshall
a1588e4498 IMPALA-9181: Serialize TQueryCtx once per query
When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Reviewed-on: http://gerrit.cloudera.org:8080/14777
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-12-03 23:26:25 +00:00
Thomas Tauber-Marshall
30c3cd95a4 IMPALA-7467: Port ExecQueryFInstances to krpc
This patch ports the ExecQueryFInstances rpc to use KRPC. The
parameters for this call contain a huge number of Thrift structs
(eg. everything related to TPlanNode and TExpr), so to avoid
converting all of these to protobuf and the resulting effect that
would have on the FE and catalog, this patch stores most of the
parameters in a sidecar (in particular the TQueryCtx,
TPlanFragmentCtx's, and TPlanFragmentInstanceCtx's).

Testing:
- Passed a full exhaustive run on the minicluster.
Set up a ten node cluster with tpch 500:
- Ran perf tests: 3 iterations per tpch query, 4 concurrent streams,
  no perf change.
- Ran the stress test for 1000 queries, passed.

Change-Id: Id3f1c6099109bd8e5361188005a7d0e892147570
Reviewed-on: http://gerrit.cloudera.org:8080/13583
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-19 07:21:01 +00:00
Andrew Sherman
adde66b37c IMPALA-7985: Port RemoteShutdown() to KRPC.
The :shutdown command is used to shutdown a remote server. The common
case is that a user specifies the impalad to shutdown by specifying a
host e.g. :shutdown('host100'). If a user has more than one impalad on a
remote host then the form :shutdown('<host>:<port>') can be used to
specify the port by which the impalad can be contacted. Prior to
IMPALA-7985 this port was the backend port, e.g.
:shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC
port, e.g. :shutdown('host100:27000').

Shutdown is implemented by making an rpc call to the target impalad.
This changes the implementation of this call to use KRPC.

To aid the user in finding the KRPC port, the KRPC address is added to
the /backends section of the debug web page.

We attempt to detect the case where :shutdown is pointed at a thrift
port (like the backend port) and print an informative message.

Documentation of this change will be done in IMPALA-8098.
Further improvements to DoRpcWithRetry() will be done in IMPALA-8143.

For discussion of why it was chosen to implement this change in an
incompatible way, see comments in
https://issues.apache.org/jira/browse/IMPALA-7985.

TESTING

Ran all end-to-end tests.
Enhance the test for /backends in test_web_pages.py.
In test_restart_services.py add a call to the old backend port to the
test. Some expected error messages were changed in line with what KRPC
returns.

Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Reviewed-on: http://gerrit.cloudera.org:8080/12260
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-09 02:18:09 +00:00
Thomas Tauber-Marshall
b1e4957ba7 IMPALA-4555: Make QueryState's status reporting more robust
QueryState periodically collects runtime profiles from all of its
fragment instances and sends them to the coordinator. Previously, each
time this happens, if the rpc fails, QueryState will retry twice after
a configurable timeout and then cancel the fragment instances under
the assumption that the coordinator no longer exists.

We've found in real clusters that this logic is too sensitive to
failed rpcs and can result in fragment instances being cancelled even
in cases where the coordinator is still running.

This patch makes a few improvements to this logic:
- When a report fails to send, instead of retrying the same report
  quickly (after waiting report_status_retry_interval_ms), we wait the
  regular reporting interval (status_report_interval_ms), regenerate
  any stale portions of the report, and then retry.
- A new flag, --status_report_max_retries, is introduced, which
  controls the number of failed reports that are allowed before the
  query is cancelled. --report_status_retry_interval_ms is removed.
- Backoff is used for repeated failed attempts, such that for a period
  between retries of 't', on try 'n' the actual timeout will be t * n.

Testing:
- Added a test which results in a large number of failed intermediate
  status reports but still succeeds.

Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Reviewed-on: http://gerrit.cloudera.org:8080/12049
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-07 22:21:51 +00:00
Andrew Sherman
e4cff7d0d6 IMPALA-7468: Port CancelQueryFInstances() to KRPC.
When the Coordinator needs to cancel a query (for example because a user
has hit Control-C), it does this by sending a CancelQueryFInstances
message to each fragment instance. This change switches this code to use
KRPC.

Add new protobuf definitions for the messages, and remove the old thrift
definitions. Move the server-side implementation of Cancel() from
ImpalaInternalService to ControlService. Rework the scheduler so
that the FInstanceExecParams always contains the KRPC address of the
fragment executors, this address can then be used if a query is to be
cancelled.

For now keep the KRPC calls to CancelQueryFInstances() as synchronous.

While moving the client-side code, remove the fault injection code that
was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and
FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with
--fault_injection_rpc_exception_type=1) as this tickles code in
client-cache.h which is now not used.

TESTING:
  Ran all end-to-end tests.
  No new tests as test_cancellation.py provides good coverage.
  Checked in debugger that DebugAction style fault injection (triggered
  from test_cancellation.py) was working correctly.

Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Reviewed-on: http://gerrit.cloudera.org:8080/12142
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-08 01:05:54 +00:00
Michael Ho
941038229a IMPALA-4063: Merge report of query fragment instances per executor
Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Reviewed-on: http://gerrit.cloudera.org:8080/11615
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-06 01:01:07 +00:00
Michael Ho
5391100c7e IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC
This change converts ReportExecStatus() RPC from thrift
based RPC to KRPC. This is done in part of the preparation
for fixing IMPALA-2990 as we can take advantage of TCP connection
multiplexing in KRPC to avoid overwhelming the coordinator
with too many connections by reducing the number of TCP connection
to one for each executor.

This patch also introduces a new service pool for all query execution
control related RPCs in the future so that control commands from
coordinators aren't blocked by long-running DataStream services' RPCs.
To avoid unnecessary delays due to sharing the network connections
between DataStream service and Control service, this change added the
service name as part of the user credentials for the ConnectionId
so each service will use a separate connection.

The majority of this patch is mechanical conversion of some Thrift
structures used in ReportExecStatus() RPC to Protobuf. Note that the
runtime profile is still retained as a Thrift structure as Impala
clients will still fetch query profiles using Thrift RPCs. This also
avoids duplicating the serialization implementation in both Thrift
and Protobuf for the runtime profile. The Thrift runtime profiles
are serialized and sent as a sidecar in ReportExecStatus() RPC.

This patch also fixes IMPALA-7241 which may lead to duplicated
dml stats being applied. The fix is by adding a monotonically
increasing version number for fragment instances' reports. The
coordinator will ignore any report smaller than or equal to the
version in the last report.

Testing done:
1. Exhaustive build.
2. Added some targeted test cases for profile serialization failure
   and RPC retries/timeout.

Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Reviewed-on: http://gerrit.cloudera.org:8080/10855
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-01 21:12:12 +00:00