Normally, error Status logs backtrace, but this doesn't happen for
MEM_LIMIT_EXCEEDED because these were copied from a global status.
Instead, construct them on the fly so that we get the normal Status
backtrace logging. Also, remove some special case backtrace logging
that is now redundant from buffered-block-mgr.cc.
Primary motivation for this is to help debug IMPALA-2327, where it
appears a MEM_LIMIT_EXCEEDED status is dropped. But I think this will
be generally useful for debugging problems that happen after
MEM_LIMIT_EXCEEDED.
Testing: Run test_mem_scaling.py and see all "Memory limit exceeded"
now have backtraces.
Change-Id: I4cd04426e63397c24d3e16faa33caafc2a608c0c
Reviewed-on: http://gerrit.cloudera.org:8080/872
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
FnvHash64to32 produces pathologically bad results when hashing zero-byte
input: it always returns 0 regardless of the input hash seed. This
is a result of it xoring the 32-bit hash seed with itself. This patch
adds a DCHECK to this function to verify that this function is not
invoked with zero-byte inputs, and updates all callsites to check for
the zero-length case.
This patch also improves hashing of booleans: false and NULL no longer
hash to the same value.
Change-Id: I6706f6ea167e5362d55351f7cc0c637c680a315d
Reviewed-on: http://gerrit.cloudera.org:8080/720
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch extends the deduplication of tuples in row batches to work on
non-adjacent tuples. This deduplication requires an additional data
structure (a hash table) and adds additional performance overhead (up to
3x serialization time), so it is only enabled for row batches with
compositions that are likely to blow up due to non-adjacent duplication
of large tuples. This avoids performance regression in typical cases,
while preventing size blow-ups in problematic cases, such as joining
three streams of tuples some of which contain may contain large
collections.
A test is included that ensures that adjacent deduplication is enabled.
The row batch serialize benchmark shows that deduplication does not regress
performance of serialization or deserialization.
Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407
Reviewed-on: http://gerrit.cloudera.org:8080/573
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Before this patch, the row-batch serialization procedure 'blindly'
copied the data of all tuples to the serialization output buffer,
even if the same tuple appeared multiple times in the row batch
(e.g., as a result of a join we can have repeated tuples that are
backed by the same tuple data).
This patch addresses the most common case of the problem by checking
tuples in adjacent rows to see if they are duplicates, and if so
refers back to the previously serialized tuple.
Deduping adjacent tuples has minimal performance overhead, and offers
significant performance improvements when duplicates are present.
Tests are included to validate the correctness of deduplication and an
benchmark is included to show that deduplication does not regress
performance of serialization or deserialization.
Change-Id: I0e4153c7f73685a116dd3e70072a0895b4daa561
Reviewed-on: http://gerrit.cloudera.org:8080/659
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This is to allow user to convert string with timezone offset to timestamp.
The supported formats are: +|-hh:mm, +|-hhmm, +|-hh
The unix timestamp is calculted by time - timezone offset, for example:
Without timezone offset
Query: select unix_timestamp('2001-01-01 09:00:00', 'yyyy-MM-dd HH:mm:ss')
978339600
With timezone offset "+01:00"
Query: select unix_timestamp('2001-01-01 09:00:00+01:00', 'yyyy-MM-dd HH:mm:ss+hh:mm')
978336000
Change-Id: Id3b5f9c354131d2ed6ee677a9c5709a808ac0b15
Reviewed-on: http://gerrit.cloudera.org:8080/441
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
By doing so, we avoid unnecessarily calling the copy constructor for
Status OK objects and loading the value from memory (due to the old
Status::OK being a global). The impact of this patch was validated by
inspecting both optimized assembly code and generated IR code.
Applying this patch has some effect on the amount of generated code. The
new tool `get_code_size` will list the text, data, and bss sizes for all
archives that we produce in a release build. This patch reduces the code
size by ~20 kB.
Text Data BSS
Old 10578622 576864 40825
New 10559367 576864 40809
The majority of the changes in this patch have been mechanically applied
using:
find be/src -name "*.cc" -or -name "*.h" | xargs sed -i
's/Status::OK;/Status::OK\(\);/'
A new micro-benchmark was added to determine the overhead of using
Status in hot code sections.
Machine Info: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
status: Function Rate (iters/ms) Comparison
----------------------------------------------------------------------
Call Status::OK() 9.555e+08 1X
Call static Status::Error 4.515e+07 0.04725X
Call Status(Code, 'string') 9.873e+06 0.01033X
Call w/ Assignment 5.422e+08 0.5674X
Call Cond Branch OK 5.941e+06 0.006218X
Call Cond Branch ERROR 7.047e+06 0.007375X
Call Cond Branch Bool (false) 1.914e+10 20.03X
Call Cond Branch Bool (true) 1.491e+11 156X
Call Cond Boost Optional (true) 3.935e+09 4.118X
Call Cond Boost Optional (false) 2.147e+10 22.47X
Change-Id: I1be6f4c52e2db8cba35b3938a236913faa321e9e
Reviewed-on: http://gerrit.cloudera.org:8080/351
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
This change allows a string like "123e-4" to be parsed into a DECIMAL.
This is about 5% slower than the old parser for Decimal 16 and 8 and
about 10% faster for Decimal 4.
Machine Info: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
atod: Function Rate (iters/ms) Comparison
----------------------------------------------------------------------
Old Decimal16Value 18.03 1X
New Decimal16Value 17.14 0.9505X
----------------------------------------------------------------------
Old Decimal8Value 48.14 1X
New Decimal8Value 46.09 0.9573X
----------------------------------------------------------------------
Old Decimal4Value 71.17 1X
New Decimal4Value 79.58 1.118X
Change-Id: If4f818eaa15f51b50afec2a047c63ed1c25c0239
Reviewed-on: http://gerrit.cloudera.org:8080/365
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
To be able to use our own spinlock implementation together with the std
/ boost lock_guards, it needs to be lock compatible. This patch adds the
three required methods: lock(), unlock() and try_lock().
Furthermore, the old ScopedSpinLock class is removed to avoid code
duplication.
Change-Id: Icb082b573e5ee71752f5da65a21c7753f40a4a4b
Reviewed-on: http://gerrit.cloudera.org:8080/304
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
This patch removes all occurrences of "using namespace std" and "using
namespace boost(.*)" from the codebase. However, there are still cases
where namespace directives are used (e.g. for rapidjson, thrift,
gutil). These have to be tackled in subsequent patches.
To reduce the patch size, this patch introduces a new header file called
"names.h" that will include many of our most frequently used symbols iff
the corresponding include was already added. This means, that this
header file will pull in for example map / string / vector etc, only iff
vector was already included. This requires "common/names.h" to be the
last include. After including `names.h` a new block contains a sorted list
of using definitions (this patch does not fix namespace directive
declarations for other than std / boost namespaces.)
Change-Id: Iebe4c054670d655bc355347e381dae90999cfddf
Reviewed-on: http://gerrit.cloudera.org:8080/338
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.
Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.
When error messages are recorded they are recorded based on the
following algorithm:
- If an error message is of type GENERAL, do not aggregate this message
and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
GENERAL and display a count.
For example, in the case of the parquet file spanning multiple blocks
the output will look like:
Parquet files should not be split into multiple hdfs-blocks.
file=hdfs://localhost:20500/fid.parq (1 of 321 similar)
All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.
The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.
Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
Currently, Impala binaries are compiled with -msse4.2 so that the SSE
4.2 intrinsics can be used on the SSE-optimized paths. While these
paths correctly have runtime checks for SSE 4.2 support, the -msse4.2
flag allows the compiler to automatically emit SSE 4.2 instructions.
Compilers have become smart enough to do so, and so various builds of
Impala don't work on CPUs that lack SSE 4.2 (or 4.1, etc) support.
Fix this by defining our own implementations of the intrinsics so that
we don't have to use -msse4.2 during the native compile. This allows us
to continue to inline these paths. We were already cross-compiling for
SSE 4.2 and non-SSE 4.2, so we continue to leverage that for IR.
Additional testing:
- spot check the SSE 4.2 optimized paths and see that the new code is
the same as old code.
- mask support for SSE 4.2 and run tests to verify non-SSE 4.2 code
paths work. (A follow on change will add an option to do this so
we can add it to our regression tests).
- Spot check a few locations where we know the compiler was
auto-emitting SSE 4 instructions and verify that's not happening now.
(Unfortunately, we don't have any such machines in our test cluster).
Change-Id: I85afd8f0c880e4dfe86b110e09061593f7698619
Reviewed-on: http://gerrit.cloudera.org:8080/111
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Otherwise multiple plan fragments' modules clobber each other. This
patch changes the -unopt_module and -opt_module flags to
-unopt_module_dir and -opt_module_dir respectively, and the flags
cause each fragment's module to be written to the specified directory
with a filename including the fragment instance ID. These are still
debugging flags.
Change-Id: I38b1970c2507e7e545ce8c08a4be6cc5a20ff7a4
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5950
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
This patch reworks a lot of the metrics subsystem, laying much of the
groundwork for unifying runtime profiles and metrics in the future, as
well as enabling better rendering of metric data in our webpages, and
richer integration with thirdparty monitoring tools like CM.
There are lots of changes. The most significant are below.
TODO (incomplete list):
* Add descriptions for all metrics
* Settle on a standard hierarchy for process-wide metric groups
* Add path-based resolution for searching for metrics (i.e. resolve
"group1.group2.metric_name")
* Add a histogram metric type
Improvements for all metrics:
** New 'description' field, which allows a human-readable description to
be provided for each metric.
** Metrics must serialise themselves to JSON via the RapidJson
library (all by-hand JSON serialisation has been removed).
** Metrics are contained in MetricGroups (replacing the old 'Metrics'
class), which are hierarchically arranged to make grouping metrics
into smaller subsystems more natural.
** Metrics are rendered via the new webserver templating engine,
replacing the old /metrics endpoint. The old /jsonmetrics endpoint is
retained for backwards compatibility.
Improvements for 'simple' metrics:
** SimpleMetric replaces the old PrimitiveMetric class (using much of
the same code), and are metrics whose value does not itself have
relevant structure (as opposed to sets, lists, etc).
** SimpleMetrics have 'kinds' (counter, gauge, property etc)
** ... and units (from TCounterType), to make pretty-printing easier.
Change-Id: Ida1d125172d8572dfe9541b4271604eff95cfea6
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5722
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This was one idea to just cast to __int128_t as a poor man's int96.
Unfortunately, it seems too slow: ~15x for add, ~10x for multi and
3x for divide compared to __int128_t.
Change-Id: I06eb3fa3ac1edc2c174873a73a252a0165911b1c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2433
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
The main improvement introduced by this patch is speeding up
compilation time by pruning the module of unused functions before
running the rest of the optimization passes. This eliminates most of
the functions in the module since it includes all cross-compiled
functions. This requires that all codegen'd functions are registered
with AddFunctionToJit(), so we know which functions can't be deleted.
With this change, the compilation time decreased from 131.398ms to
36.579ms on a simple "select *" query.
The rest of the changes are minor additions to LlvmCodegen that will
be used in the expr refactoring.
Change-Id: If08000d3dc3fd4d777f6d1f7a30639badad89d6c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2378
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit a1bd583fd743ce39b233f2917e13385179a8c217)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2572
Tested-by: jenkins
The c++ standard int128_t is exactly what we want. It is 16 bytes, stored as 2's
complement little endian (the exact extension of the native int types). It out
performs the boost library we were using (see benchmark) and looking at the assembly
for some of the operators, I doubt we can do better. This also seems like the kind
of thing hardware might be able to do natively in the future if we stuck with the
standard implementation.
This requires minimal changes to the rest of our code so the multi int library is
abstracted away.
The standard only added int128 and not 96 or any others. We still will need to use
the boost library for some cases but nothing in the hot path. We might want to revisit
implementing an int96 in the future that is of the same format to get some space
and efficiency savings but I think we can live with just int128 for a while.
Change-Id: I137ef7be812675036dd9b6e5b48dfc5c7aa9ab37
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2200
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2249
FNV hash has the property that the least significant bit of the hashed value
is just the XOR of the LSBs of its input bytes. This results in poor
distribution of rows when the partition keys are duplicated -- for example,
if the partition key is (l_orderkey, l_orderkey). A recommended technique
to mitigate this is to generate a larger hash and use XOR-folding to reduce
it to the desired length.
In this patch FnvHash has been modified to use generate a 64-bit hash and
fold the result down to 32-bits. It has been renamed FnvHash64to32 to make
this explicit.
Change-Id: Ie12ad3f863fca15092803d3e4d616a654cb8d244
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2220
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Adds a builtin FNV hash function. Also renames HashUtil::FvnHash to HashUtil::FnvHash
since it was spelled incorrectly.
Change-Id: Ic6dbfbce58ceeded72442ff22d3cd04f1010ea78
Reviewed-on: http://gerrit.ent.cloudera.com:8080/995
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
Strings with leading and trailing whitespaces are accepted.
Branching is heavily optimized for the non-whitespace successful case.
All the StringTo<Type> first executes the parsing assuming it has no leading
whitespace. If failed, it will trim the leading whitespace and parse again.
Therefore, strings with whitespaces will take hit on branch mis-prediction.
Change-Id: Ie65da37a3c220e019f6dd9e3fed4baea9fb4460c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/661
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
Changes MemLimit to MemTracker:
- the limit is optional
- it also records a label and an optional parent
- Consume() and Release() also update the ancestors and there's also a new
AnyLimitExceeded(), which also checks the ancestors
- the consumption counter is a HighwaterMarkCounter and can optionally be created
as part of a profile
Each fragment instance now has a MemTracker that is part of a 3-level
hierarchy: process, query, fragment instance.
Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07
Reviewed-on: http://gerrit.ent.cloudera.com:8080/339
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
* Allows access to the thread ID for assignment to cgroups and getting
stats via /proc
* Adds /threadz page to debug webserver that shows threads by group, and
breaks down CPU usage by thread
Change-Id: Id3b9aae92f4ae5c01ed1aab2185bbaa99fce4385
Reviewed-on: http://gerrit.ent.cloudera.com:8080/94
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
"distinctpc" and "distinctpcsa".
We've gathered statistics on an internal dataset (all columns) which is
part of our regression data. It's roughly 400mb, ~100 columns,
int/bigint/string type.
On Hive, it took roughly 64sec.
On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't),
we can achieve 24~26sec.
Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3