4 Commits

Author SHA1 Message Date
Joe McDonnell
ba4cb95b62 IMPALA-11257: Fix CMake warnings for module names and cmake_minimum_required
This fixes a few different CMake warnings:
1. This removes cmake_minimum_required invocations except for the
   top-most CMakeLists.txt. This eliminates the warnings like this:
     Compatibility with CMake < 2.8.12 will be removed from a future version of
     CMake.

     Update the VERSION argument <min> value or use a ...<max> suffix to tell
     CMake that the project does not need compatibility with older versions.
   Moving to a later version also required setting CMAKE_ENABLE_EXPORTS
   to continue exporting symbols.
2. This modifies the module names so that they match the corresponding
   module names from Find*.cmake. This is mostly dealing with case
   differences. This address warnings like:
     The package name passed to `find_package_handle_standard_args` (PROTOBUF)
     does not match the name of the calling package (Protobuf).  This can lead
     to problems in calling code that expects `find_package` result variables
     (e.g., `_FOUND`) to follow a certain pattern.
   This fixed the detection logic for KerberosPrograms, and so it required
   adding more Kerberos packages to bin/bootstrap_build.sh.
3. This adds a missing .cc suffix. This addresses the following warning:
     CMake Warning (dev) at be/src/util/CMakeLists.txt:141 (add_library):
     Policy CMP0115 is not set: Source file extensions must be explicit.  Run
     "cmake --help-policy CMP0115" for policy details.  Use the cmake_policy
     command to set the policy and suppress this warning.

These fixes mostly match how these warnings were handled in
Apache Kudu.

Testing:
 - Ran GVO

Change-Id: I2a97dd07cdd0831e90882a2035415ac71d670147
Reviewed-on: http://gerrit.cloudera.org:8080/18444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-11 05:48:36 +00:00
Michael Ho
b4ea57a7e3 IMPALA-4856: Port data stream service to KRPC
This patch implements a new data stream service which utilizes KRPC.
Similar to the thrift RPC implementation, there are 3 major components
to the data stream services: KrpcDataStreamSender serializes and sends
row batches materialized by a fragment instance to a KrpcDataStreamRecvr.
KrpcDataStreamMgr is responsible for routing an incoming row batch to
the appropriate receiver. The data stream service runs on the port
FLAGS_krpc_port which is 29000 by default.

Unlike the implementation with thrift RPC, KRPC provides an asynchronous
interface for invoking remote methods. As a result, KrpcDataStreamSender
doesn't need to create a thread per connection. There is one connection
between two Impalad nodes for each direction (i.e. client and server).
Multiple queries can multi-plex on the same connection for transmitting
row batches between two Impalad nodes. The asynchronous interface also
prevents avoids the possibility that a thread is stuck in the RPC code
for extended amount of time without checking for cancellation. A TransmitData()
call with KRPC is in essence a trio of RpcController, a serialized protobuf
request buffer and a protobuf response buffer. The call is invoked via a
DataStreamService proxy object. The serialized tuple offsets and row batches
are sent via "sidecars" in KRPC to avoid extra copy into the serialized
request buffer.

Each impalad node creates a singleton DataStreamService object at start-up
time. All incoming calls are served by a service thread pool created as part
of DataStreamService. By default, the number of service threads equals the
number of logical cores. The service threads are shared across all queries so
the RPC handler should avoid blocking as much as possible. In thrift RPC
implementation, we make a thrift thread handling a TransmitData() RPC to block
for extended period of time when the receiver is not yet created when the call
arrives. In KRPC implementation, we store TransmitData() or EndDataStream()
requests which arrive before the receiver is ready in a per-receiver early
sender list stored in KrpcDataStreamMgr. These RPC calls will be processed
and responded to when the receiver is created or when timeout occurs.

Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr.
If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit
to exceed, the request will be stashed in a queue for deferred processing.
The stashed RPC requests will not be responded to until they are processed
so as to exert back pressure to the senders. An alternative would be to reply with
an error and the request / row batches need to be sent again. This may end up
consuming more network bandwidth than the thrift RPC implementation. This change
adopts the behavior of allowing one stashed request per sender.

All rpc requests and responses are serialized using protobuf. The equivalent of
TRowBatch would be ProtoRowBatch which contains a serialized header about the
meta-data of the row batch and two Kudu Slice objects which contain pointers to
the actual data (i.e. tuple offsets and tuple data).

This patch is based on an abandoned patch by Henry Robinson.

TESTING
-------

* Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true.

TO DO
-----

* Port some BE tests to KRPC services.

Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Reviewed-on: http://gerrit.cloudera.org:8080/8023
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-09 20:05:08 +00:00
Henry Robinson
ed0aa66ee1 IMPALA-4650: Allow protobuf to find non-system libraries and binaries
This change makes PROTOBUF_GENERATE_CPP able to pick up Protobuf
libraries and binaries that are found by CMake but not installed on the
system LD_LIBRARY_PATH.

Change-Id: I942b3f18e25e2abc5aac167412b65abb680d3c5a
Reviewed-on: http://gerrit.cloudera.org:8080/5658
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 05:18:33 +00:00
Henry Robinson
4b3fdc3301 IMPALA-4650: Add Protobuf to build
This patch adds Protobuf 2.6.1 to Impala's build, and bumps the
toolchain version so that the dependency is available. Protobuf is
unused in this commit, but is required for KRPC.

FindProtobuf.cmake includes some utility CMake methods to generate
source code from Protobuf definitions. It is taken from Kudu.

Change-Id: Ic9357fe0f201cbf7df1ba19fe4773dfb6c10b4ef
Reviewed-on: http://gerrit.cloudera.org:8080/5657
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 05:18:17 +00:00