The decoders aren't used yet, but will be when we materialize
arrays. In addition, setting up the repetition level decoder makes
sure the definition level encoder is initialized to the right place in
the data buffer.
Change-Id: Ic85ae812b10c747b36d884794d8dcf5976dfe74f
Reviewed-on: http://gerrit.cloudera.org:8080/405
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the function GetConstant(), which is used by
expr compute function and UDFs to access query constants. There is a
corresponding GetIrConstant() function that returns the IR versions of
the same constants. Currently the only implemented constants are the
expr's return type and argument types, but other constants can be
easily be added to these functions. Interpreted expr functions run
normally, but cross-compiled functions can be passed to
InlineConstants(), which looks for calls to GetConstant() and replaces
them with the result of calling GetIrConstant().
I used this technique in the decimal functions that previously were
not switching on the type at all. The performance of LeastGreatest()
after this patch is the same as it was before it switched on the type.
Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41
Reviewed-on: http://gerrit.cloudera.org:8080/352
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
If a Heartbeat() RPC appears hung, the statestore should abort that RPC
so as not to hold on to a sender thread, and to trigger the failure
detector to evict the hung node.
We could just add a TCP timeout to the client cache used by the
statestore, but doing so would mean that all RPCs were subject to the
timeout, and UpdateState() typically takes *much* longer than
Heartbeat() by design, so setting a reasonable timeout would be
impossible. Instead, this patch adds a second client cache designed only
for Heartbeat() RPCs, with an aggressive timeout of 3s by
default. (Heartbeat() usually takes ~1-2ms). A timeout for UpdateState()
is also set to avoid thread starvation, but this is much less aggressive
at 300s.
This patch also adds ClientConnection::DoRpc(), which calls an RPC and
handles various failure modes, including timeout. If DoRpc() returns an
error, the statestore handles it in the usual way, including updating
the failure detector if the failed RPC is Heartbeat().
Change-Id: I2f2462278e59581937c9c10910625d2724a11efa
Reviewed-on: http://gerrit.cloudera.org:8080/206
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins