Augment the error message to mention that oversubscription is likely the
problem and hint at solutions.
Change-Id: I8e367e1b0cb08e11fdd0546880df23b785e3b7c9
Reviewed-on: http://gerrit.cloudera.org:8080/7861
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Sometimes the client is not open when the debug action fires at the
start of Open() or Prepare(). In that case we should set the
probability when the client is opened later.
This caused one of the large row tests to start failing with a "failed
to repartition" error in the aggregation. The error is a false positive
caused by two distinct keys hashing to the same partition. Removing the
check allows the query to succeed because the keys hash to different
partitions in the next round of repartitioning.
If we repeatedly get unlucky and have collisions, the query will still
fail when it reaches MAX_PARTITION_DEPTH.
Testing:
Ran TestSpilling in a loop for a couple of hours, including the
exhaustive-only tests.
Change-Id: Ib26b697544d6c2312a8e1fe91b0cf8c0917e5603
Reviewed-on: http://gerrit.cloudera.org:8080/7771
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.
The default value is 512KB. I picked that number because it doesn't
increase minimum reservations *too* much even with smaller buffers
like 64kb but should be large enough for almost all reasonable
workloads.
This is implemented in the aggs and joins using the variable page size
support added to BufferedTupleStream in an earlier commit. The synopsis
is that each stream requires reservation for one default-sized page
per read and write iterator, and temporarily requires reservation
for a max-sized page when reading or writing larger pages. The
max-sized write reservation is released immediately after the row
is appended and the max-size read reservation is released after
advancing to the next row.
The sorter and analytic simply use max-sized buffers for all pages
in the stream.
Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.
Added "set" test to check validation of query option.
Added end-to-end tests exercising spilling operators with large rows
with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY.
Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Reviewed-on: http://gerrit.cloudera.org:8080/7629
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Rejects queries during admission control if:
* the largest (across all backends) min buffer reservation is
greater than the query mem_limit or buffer_pool_limit
* the sum of the min buffer reservations across the cluster
is larger than the pool max mem resources
There are some other interesting cases to consider later:
* every per-backend min buffer reservation is less than the
associated backend's process mem_limit; the current
admission control code doesn't know about other backend's
proc mem_limits.
Also reduces minimum non-reservation memory (IMPALA-5810).
See the JIRA for experimental results that show this
slightly improves min memory requirements for small queries.
One reason to tweak this is to compensate for the fact that
BufferedBlockMgr didn't count small buffers against the
BlockMgr limit, but BufferPool counts all buffers against
it.
Testing:
* Adds new test cases in test_admission_controller.py
* Adds BE tests in reservation-tracker-test for the
reservation-util code.
Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b
Reviewed-on: http://gerrit.cloudera.org:8080/7678
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
Remove BTS_BLOCK_OVERFLOW error code, which is no longer used and
referenced --read_size.
Improve the flag description. The output is now:
-read_size ((Advanced) The preferred I/O request size in bytes to issue to
HDFS or the local filesystem. Increasing the read size will increase
memory requirements. Decreasing the read size may decrease I/O
throughput.) type: int32 default: 8388608
Testing:
Tested that Impala built and basic queries could run.
Change-Id: I3c20a9d55f89170b11f569c90b7f2949ddbe4211
Reviewed-on: http://gerrit.cloudera.org:8080/7623
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.
min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.
Port ExecNodes to use BufferPool:
* Each ExecNode has to claim its reservation during Open()
* Port Sorter to use BufferPool.
* Switch from BufferedTupleStream to BufferedTupleStreamV2
* Port HashTable to use BufferPool via a Suballocator.
This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)
TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.
Testing:
* Updated tests to reflect new memory requirements
Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Reviewed-on: http://gerrit.cloudera.org:8080/5801
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.
This change includes the followings:
1. Include aggregate functions (AggFn) as Expr. This separates
AggFn from its evaluator. AggFn is similar to existing Expr
as both are represented as a tree of Expr nodes but it doesn't
really make sense to call Get*Val() on AggFn. This change
restructures the class hierarchy: much of the existing Expr
class is now renamed to ScalarExpr. Expr is the parent class
of both AggFn and ScalarExpr. Expr is defined to be a tree
with root of either AggFn or ScalarExpr and all descendants
being ScalarExpr.
2. ExprContext is renamed to ScalarExprEvaluator which is the
interface for evaluating ScalarExpr; AggFnEvaluator is the
interface for evaluating AggFn. Multiple evaluators can be
instantiated per Expr. Expr contains static states of an
expression while evaluator contains runtime states needed
for execution (i.e. evaluating the expression).
3. Update all exec nodes to instantiate Expr and their evaluators
separately. ExecNode::Init() will be responsible for creating
all the Exprs in an ExecNode while their evaluators are created
in ExecNode::Prepare(). Certain evaluators are also moved into
the data structures which actually utilize them. For instance,
HashTableCtx now owns the build and probe expression evaluators.
Similarly, TupleRowComparator and Sorter also own the evaluators.
ExecNode which utilizes these data structures are only responsible
for creating the expressions used by these data structures.
4. All codegen functions take Exprs instead of evaluators. Also, codegen
functions will not return error status should the IR function fails the
LLVM verification step.
5. The assignment of index into the FunctionContext vector is now done
during the construction of ScalarExpr. Evaluators are only responsible
for allocating and initializing the FunctionContexts.
6. Open(), Prepare() are now removed from Expr classes. The interface
for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
which will convert a thrift Expr into an initialized Expr object.
Similarly, Create() interface is used for creating evaluators from an
intialized Expr object.
This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.
Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Reviewed-on: http://gerrit.cloudera.org:8080/5483
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
The stream defaults to pages of default_page_len_. If a row doesn't
fit in that page, it will allocate another page up to max_page_len_
bytes and append a single row to that page, then immediately unpin
the page. This means that when writing a stream, the large
page only needs to be kept in memory temporarily, which helps with
memory requirements. E.g. consider a hash join that is repartitioning
1 unpinned stream into 16 unpinned streams. We will need
default_page_len_ * 15 + max_page_len_ * 2 bytes of reservation because
when processing a large row we only need one large write buffer at a
time.
Also switches the stream to lazily allocating write pages, so that
we don't need to allocate a page until we know the size of the row
to go in it. This required a mechanism to "save" reservation in
PrepareForRead()/PrepareForWrite(). A SubReservation APi is added
to BufferPool for this purpose and the stream now saves read and
write reservation for lazy page allocation. It also saves reservation
instead of double-pinning pages in the read/write case.
The large row cases are not as optimised for memory consumption or
performance - queries processing very large numbers of large rows
are an extreme edge case that is likely to hit other performance
bottlenecks first. Pages with large rows can have up to 50%
internal fragmentation.
To avoid duplicating more logic between AddRow() and AllocateRow()
I restructured things so that AddRowSlow() is implemented in terms
of AllocateRowSlow(). AllocateRow() now takes a function as an
argument to populate the row.
Testing:
* Added tests for the case where 0 rows are added to the stream
* Extend BigRow to exercise the new code.
* Also test large strings and read/write streams.
Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Reviewed-on: http://gerrit.cloudera.org:8080/6638
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds Impala support for TIMESTAMP types stored in Kudu.
Impala stores TIMESTAMP values in 96-bits and has nanosecond
precision. Kudu's timestamp is a 64-bit microsecond delta
from the Unix epoch (called UNIXTIME_MICROS), so a conversion
is necessary.
When writing to Kudu, TIMESTAMP values in nanoseconds are
averaged to the nearest microsecond.
When reading from Kudu, the KuduScanner returns
UNIXTIME_MICROS with 8bytes of padding so Impala can convert
the value to a TimestampValue in-line and copy the entire
row.
Testing:
Updated the functional_kudu schema to use TIMESTAMPs instead
of converting to STRING, so this provides some decent
coverage. Some BE tests were added, and some EE tests as
well.
TODO: Support pushing down TIMESTAMP predicates
TODO: Support TIMESTAMPs in range partitioning expressions
Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Reviewed-on: http://gerrit.cloudera.org:8080/6526
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.
After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.
New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE <file> will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.
HDFS tables created by Impala using CREATE TABLE LIKE <other table>
will copy the property of the table that is copied.
This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.
Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Reviewed-on: http://gerrit.cloudera.org:8080/5939
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Support allocating with mmap instead of TCMalloc to give more control
over memory usage. Also tell Linux to back larger buffers with huge
pages when possible to reduce TLB pressure. The main complication is
that memory returned by mmap() is not necessarily aligned to a huge
page boundary, so we need to "fix up" the mapping ourselves.
Adds additional memory metrics, since we previously relied on the
assumption that all memory was allocated through TCMalloc.
memory.total-used tracks the total across the buffer pool and
TCMalloc. When the buffer pool is not present, they just report
the TCMalloc values.
This can be enabled with the --mmap_buffers flag. The transparent
huge pages support can be disabled with the --madvise_huge_pages
startup flag.
At some point this should become the default, but it requires
more work to validate perf and resource used (virtual address
space, etc).
Testing:
Added some unit tests to test edge cases and the different supported
flags. Many pre-existing tests also exercise the modified code.
Change-Id: Ifbc748f74adcbbdcfa45f3ec7df98284925acbd6
Reviewed-on: http://gerrit.cloudera.org:8080/6474
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds tests for read errors from permissions (i.e. open() fails),
corrupt data (integrity check fails) and truncated files (read() fails).
Fixes a couple of bugs:
* Truncated reads were not detected in TmpFilemgr
* IoMgr buffers weren't returned on error paths (this isn't a true leak
but results in DCHECKs being hit).
Change-Id: I3f2b93588dd47f70a4863ecad3b5556c3634ccb4
Reviewed-on: http://gerrit.cloudera.org:8080/6562
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The test should allow Unpin() to fail with a scratch allocation error to
handle the case where the first write fails and blacklists the scratch
disk around the same time that the second write starts. Usually either
the second write succeeds because it started before the first write
failed or it fails with CANCELLED because the
BufferedBlockMgr::is_cancelled_ flag is set. There is a small
window for a race after the disk is blacklisted in TmpFileMgr but
before BufferedBlockMgr::WriteComplete() is called.
Testing:
I was able to reproduce the problem locally by adding some delays
to the test. I added a variant of the WriteError test that more reliably
reproduces the bug. Ran both WriteError tests in a loop locally to try
to flush out flakiness.
Change-Id: I9878d7000b03a64ee06c2088a8c30e318fe1d2a3
Reviewed-on: http://gerrit.cloudera.org:8080/5940
Tested-by: Impala Public Jenkins
Reviewed-by: Michael Ho <kwho@cloudera.com>
- Removes the runtime unknown disk ID reporting and instead moves
it to the explain plan as a counter that prints the number of
scan ranges missing disk IDs in the corresponding HDFS scan nodes.
- Adds a warning to the header of query profile/explain plan with a
list of tables missing disk ids.
- Removes reference to enabling dfs block metadata configuration,
since it doesn't apply anymore.
- Removes VolumeId terminology from the runtime profile.
Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Reviewed-on: http://gerrit.cloudera.org:8080/5828
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we didn't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the date falls into 1400..9999 year
range as we are scanning Parquet.
Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
Reviewed-on: http://gerrit.cloudera.org:8080/5343
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
Second part of IMPALA-3710, which removed the IGNORE DML
option and changed the following errors on Kudu DML
operations to be ignored:
1) INSERT where the PK already exists
2) UPDATE/DELETE where the PK doesn't exist
This changes other data-related errors to be ignored as
well:
3) NULLs in non-nullable columns, i.e. null constraint
violoations.
4) Rows with PKs that are in an 'uncovered range'.
It became clear that we can't differentiate between (3) and
(4) because both return a Kudu 'NotFound' error code. The
Impala error codes have been simplified as well: we just
report a generic KUDU_NOT_FOUND error in these cases.
This also adds some metadata to the thrift report sent to
the coordinator from sinks so the total number of rows with
errors can be added to the profile. Note that this does not
include a breakdown of error counts by type/code because we
cannot differentiate between all of these cases yet.
An upcoming change will add this new info to the beeswax
interface and show it in the shell output (IMPALA-3713).
Testing: Updated kudu_crud tests to check the number of rows
with errors.
Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Reviewed-on: http://gerrit.cloudera.org:8080/4985
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch prevents an invalid decimal type in an Avro file schema from
crashing Impala. Most invalid Avro schemas are caught by the frontend,
but file schemas still need to be validated by the backend.
After this patch files with bad schemas are skipped.
Testing:
This was hit very rarely by the scanner fuzzing. Added a regression test that
scans a file with a bad schema.
Change-Id: I25a326ee2220bc14d3b5f887dc288b4adf859cfc
Reviewed-on: http://gerrit.cloudera.org:8080/4876
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
1.) IMPALA-4134: Use Kudu AUTO FLUSH
Improves performance of writes to Kudu up to 4.2x in
bulk data loading tests (load 200 million rows from
lineitem).
2.) IMPALA-3704: Improve errors on PK conflicts
The Kudu client reports an error for every PK conflict,
and all errors were being returned in the error status.
As a result, inserts/updates/deletes could return errors
with thousands errors reported. This changes the error
handling to log all reported errors as warnings and
return only the first error in the query error status.
3.) Improve the DataSink reporting of the insert stats.
The per-partition stats returned by the data sink weren't
useful for Kudu sinks. Firstly, the number of appended rows
was not being displayed in the profile. Secondly, the
'stats' field isn't populated for Kudu tables and thus was
confusing in the profile, so it is no longer printed if it
is not set in the thrift struct.
Testing: Ran local tests, including new tests to verify
the query profile insert stats. Manual cluster testing was
conducted of the AUTO FLUSH functionality, and that testing
informed the default mutation buffer value of 100MB which
was found to provide good results.
Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
Reviewed-on: http://gerrit.cloudera.org:8080/4728
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
The scheduler crashed with a segmentation fault when there were no
backends registered: After not being able to find a local backend (none
are configured at all) in ComputeScanRangeAssignment(), the previous
code would eventually try to return the top of
assignment_ctx.assignment_heap in SelectRemoteBackendHost(), but that
heap would be empty. Subsequently, when using the IP address of that
heap node, a segmentation fault would occur.
This change adds a check and aborts scheduling with an error. It also
contains a test.
Change-Id: I6d93158f34841ea66dc3682290266262c87ea7ff
Reviewed-on: http://gerrit.cloudera.org:8080/4776
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Adds code comments and issues a warning for Parquet files
with num_rows=0 but at least one non-empty row group.
Change-Id: I72ccf00191afddb8583ac961f1eaf11e5eb28791
Reviewed-on: http://gerrit.cloudera.org:8080/4696
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*
A prior patch renamed all the files as necessary, and this patch
performs the actual code changes. Most of the changes in this patch
were generated with some commands of the form:
find . | grep "\.java\|\.py\|\.h\|\.cc" | \
xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g
along with some manual fixes.
After this patch, the remaining references to Cloudera in the repo
mostly fall into the categories:
- External components that have cloudera in their own package names,
eg. com.cloudera.kudu/llama
- URLs, eg. https://repository.cloudera.com/
Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2
Reviewed-on: http://gerrit.cloudera.org:8080/3937
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
This patch implements basic in-memory buffer management, with
reservations managed by ReservationTrackers.
Locks are fine-grained so that the buffer pool can scale to many
concurrent queries.
Includes basic tests for buffer pool setup, allocation and reservations.
Change-Id: I4bda61c31cc02d26bc83c3d458c835b0984b86a0
Reviewed-on: http://gerrit.cloudera.org:8080/4070
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Currently we can only disable spilling via a startup option which means
we need to restart the cluster for this.
This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of
scratch directory space that can be used. This would be useful to prevent
runaway queries or to prevent queries from spilling when that is not desired.
This also adds a 'ScratchSpace' counter to the runtime profile of the
BlockMgr that keeps track of the scratch space allocated.
Valid values for the SCRATCH_LIMIT query option are:
- unspecified or a limit of -1 means no limit
- a limit of 0 (zero) means spilling is disabled
- an int (= number of bytes)
- a float followed by "M" (MB) or "G" (GB)
Testing:
A new test file "test_scratch_limit.py" was added for testing functionality.
Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470
Reviewed-on: http://gerrit.cloudera.org:8080/4497
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a configurable timeout for all backend client
RPC to avoid query hang issue.
Prior to this change, Impala doesn't set socket send/recv timeout for
backend client. RPC will wait forever for data. In extreme cases
of bad network or destination host has kernel panic, sender will not
get response and RPC will hang. Query hang is hard to detect. If
hang happens at ExecRemoteFragment() or CancelPlanFragments(), query
cannot be canelled unless you restart coordinator.
Added send/recv timeout to all RPCs to avoid query hang. For catalog
client, keep default timeout to 0 (no timeout) because ExecDdl()
could take very long time if table has many partitons, mainly waiting
for HMS API call.
Added a wrapper RetryRpcRecv() to wait for receiver response for
longer time. This is needed by certain RPCs. For example, TransmitData()
by DataStreamSender, receiver could hold response to add back pressure.
If an RPC fails, the connection is left in an unrecoverable state.
we don't put the underlying connection back to cache but close it. This
is to make sure broken connection won't cause more RPC failure.
Added retry for CancelPlanFragment RPC. This reduces the chance that cancel
request gets lost due to unstable network, but this can cause cancellation
takes longer time. and make test_lifecycle.py more flaky.
The metric num-fragments-in-flight might not be 0 yet due to previous tests.
Modified the test to check the metric delta instead of comparing to 0 to
reduce flakyness. However, this might not capture some failures.
Besides the new EE test, I used the following iptables rule to
inject network failure to verify RPCs never hang.
1. Block network traffic on a port completely
iptables -A INPUT -p tcp -m tcp --dport 22002 -j DROP
2. Randomly drop 5% of TCP packets to slowdown network
iptables -A INPUT -p tcp -m tcp --dport 22000 -m statistic --mode random --probability 0.05 -j DROP
Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Reviewed-on: http://gerrit.cloudera.org:8080/3343
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
This change extends MemPool, FreePool and StringBuffer to support
64-bit allocations, fixes a bug in decompressor and extends various
places in the code to support 64-bit allocation sizes. With this
change, the text scanner can now decompress compressed files larger
than 1GB.
Note that the UDF interfaces FunctionContext::Allocate() and
FunctionContext::Reallocate() still use 32-bit for the input
argument to avoid breaking compatibility.
In addition, the byte size of a tuple is still assumed to be
within 32-bit. If it needs to be upgraded to 64-bit, it will be
done in a separate change.
A new test has been added to test the decompression of a 2GB
snappy block compressed text file.
Change-Id: Ic1af1564953ac02aca2728646973199381c86e5f
Reviewed-on: http://gerrit.cloudera.org:8080/3575
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e.
Unbreak the packaging builds for now.
Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807
Reviewed-on: http://gerrit.cloudera.org:8080/3543
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
This change extends MemPool, FreePool and StringBuffer to support
64-bit allocations, fixes a bug in decompressor and extends various
places in the code to support 64-bit allocation sizes. With this
change, the text scanner can now decompress compressed files larger
than 1GB.
Note that the UDF interfaces FunctionContext::Allocate() and
FunctionContext::Reallocate() still use 32-bit for the input
argument to avoid breaking compatibility.
In addition, the byte size of a tuple is still assumed to be
within 32-bit. If it needs to be upgraded to 64-bit, it will be
done in a separate change.
Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20
Reviewed-on: http://gerrit.cloudera.org:8080/2781
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Added checks/error handling:
* Negative string lengths while decoding dictionary or data page.
* Buffer overruns while decoding dictionary or data page.
* Some metadata FILECHECKs were converted to statuses.
Testing:
Unit tests for:
* decoding of strings with negative lengths
* truncation of all parquet types
* dictionary creation correctly handling error returns from Decode().
End-to-end tests for handling of negative string lengths in
dictionary- and plain-encoded data in corrupt files, and for
handling of buffer overruns for string data. The corrupted
parquet files were generated by hacking Impala's parquet
writer to write invalid lengths, and by hacking it to
write plain-encoded data instead of dictionary-encoded
data by default.
Performance:
set num_nodes=1;
set num_scanner_threads=1;
select * from biglineitem where l_orderkey = -1;
I inspected MaterializeTupleTime. Before the average was 8.24s and after
was 8.36s (a 1.4% slowdown, within the standard deviation of 1.8%).
Change-Id: Id565a2ccb7b82f9f92cc3b07f05642a3a835bece
Reviewed-on: http://gerrit.cloudera.org:8080/3387
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Avro string lengths are encoded as 64-bit integers. Impala can only
handle up to 32-bit integers, so we need to be careful about handling
out-of-range integers. Negative integers were already handled by a
previous patch, but if a positive 64-bit integer is truncated to a
32-bit integer, the result can be a negative length.
This patch fixes CHAR/VARCHAR behaviour, where we can just truncate
the string, and STRING, where we can't truncate the string, so must
return an error.
Testing:
Added unit tests for STRING, CHAR, and VARCHAR that exercise the string
overflow handling.
Change-Id: If6541e7c68255bf599b26386a55057c93e62af51
Reviewed-on: http://gerrit.cloudera.org:8080/3383
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
If a thrift client can't create a socket, all subsequent calls to Open()
should fail fast since socket creation errors are treated as
unrecoverable.
Testing: manual testing with a bad SSL configuration. Impalad startup
fails fast, rather than retrying 10 times as previously.
Change-Id: I394be287143eefc79cf22865898b71ca24c41328
Reviewed-on: http://gerrit.cloudera.org:8080/3317
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
This patch adds error checking to the Avro scanner (both the codegen'd
and interepted paths), including out-of-bounds checks and data
validity checks.
I ran a local benchmark using the following queries:
set num_scanner_threads=1;
select count(i) from default.avro_bigints_big; # file contains only longs
select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema
Both benchmark queries see negligable or no performance impact.
This patch adds a new Avro scanner unit test and an end-to-end test
that queries several corrupted files, as well as updates the zig-zag
varlen int unit test.
Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Reviewed-on: http://gerrit.cloudera.org:8080/3072
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
There are multiple places in the code which call
RuntimeState::SetMemLimitExceeded(). Most of them are
unnecessary as the error status constructed will eventually
be propagated up the tree of exec nodes. There is no obvious
reason to treat query memory limit exceeded differently.
In some cases such as scan-node, calling SetMemLimitExceeded()
is actually confusing as all scanner threads may pick up error
status when any thread exceeds query memory limit, causing a
lot of noise in the log.
This change replaces most calls to RuntimeState::SetMemLimitExceeded()
with MemTracker::MemLimitExceeded(). The remaining places are:
the old hash table code, the UDF framework and QueryMaintenance()
which checks for memory limit periodically. The query maintenance
case will be removed eventually once IMPALA-2399 is fixed.
Change-Id: Ic0ca128c768d1e73713866e8c513a1b75e6b4b59
Reviewed-on: http://gerrit.cloudera.org:8080/3140
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
The stubs in Impala broke during the merge commit. This commit removes
the stubs in hopes of improving robustness of the build. The original
problem (Kudu clients are only available for some OSs) is now addressed
by moving the stubbing into a dummy Kudu client. The dummy client only
allows linking to succeed, if any client method is called, Impala will
crash. Before calling any such method, Kudu availability must be
checked.
Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712
Reviewed-on: http://gerrit.cloudera.org:8080/2585
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This change replaces all calls to MemPool::Allocate() with
MemPool::TryAllocate() in the parquet scanner and the decompressor.
Also streamline CheckQueryState() to avoid unnecessary spinlock
acquisition for the common case when there is no error. Also
removes some dead code in the text converter.
MemPool::Allocate() is also updated to return a valid pointer
instead of NULL when the allocation size is zero. NULL is only
returned during allocation failure.
This change also updates CollectionValueBuilder::GetFreeMemory()
to return Status in case it exceeds memory limit. As part of the
change, the max allocation limit (2 GB) is also removed from it
as 64-bit allocations are supported in MemPool with this change.
Change-Id: Ic70400407b7662999332448f4d1bce2cc344ca89
Reviewed-on: http://gerrit.cloudera.org:8080/2203
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This is for review purposes only. This patch will be merged with David's
big merge patch.
Changes:
1) Make Kudu compilation dependent on the OS since not all OSs support
Kudu.
2) Only run Kudu related tests when Kudu is supported (see #1).
3) Look for Kudu locally, but in a different location. To use a local
build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and
set KUDU_CLIENT_DIR to the path KUDU was installed in.
Example:
git clone https://github.com/cloudera/kudu.git
...build 3rd party etc...
mkdir -p $KUDU_BUILD_DIR
cd $KUDU_BUILD_DIR
cmake <path to Kudu source dir>
make
DESTDIR=$KUDU_CLIENT_DIR make install
4) Look for Kudu in the toolchain if not using a local Kudu build.
5) Add Kudu service startup scripts. The Kudu in the toolchain is
actually a parcel that has been renamed (the contents were not
modified in any way), that mean the Kudu service binaries are there.
Those binaries are now used to run the Kudu service.
Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58
This merges the 'feature/kudu' branch with cdh5-trunk as of commit:
055500cc753f87f6d1c70627321fcc825044e183
This patch is not a pure merge patch in the sense that goes beyond conflict
resolution to also address reviews to the 'feature/kudu' branch as a whole.
The review items and their resolution can be inspected at:
http://gerrit.cloudera.org:8080/#/c/1403/
Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92
This patch adds an output parameter 'already_unregistered' to
FindRecvrOrWait() to signal to the caller in which of two cases it may
have returned NULL. If 'already_unregistered' is true, the receiver has
already been setup and closed (possibly by cancellation, possibly by
the fragment deliberately closing its inputs in the case of a
limit). This is not an error - cancellation will be signalled to the
sender from the coordinator, and deliberate closure means the
coordinator will tear down the query shortly.
If 'already_unregistered' is set to false by FindRecvrOrWait(), the
DataStreamMgr has never seen the intended receiver. This means the
sender has waited for a full timeout period without the upstream
receiver being established; this signals a likely query setup
problem (as long as datastream_sender_timeout_ms is set sufficiently
large) and so we return an error.
We need to tweak the two timeout parameters here:
* datastream_sender_timeout_ms needs to be large enough to avoid false
negatives for problems during query setup (otherwise queries will
unexpectedly cancel that would otherwise have succeeded, if slowly).
* STREAM_EXPIRATION_TIME_MS needs to be set high enough that a query
will not continue executing for longer than STREAM_EXPIRATION_TIME_MS
after it closes its input (otherwise the sender will get
already_unregistered=false, and cancel). This case will only trigger
when a sender tries to call TransmitData() after the receiver has been
closed for STREAM_EXPIRATION_TIME_MS; this should not happen in
non-error cases as receivers are not closed before consuming their
entire input.
In this patch the former has been set to 2 minutes, and the latter to 5
minutes.
Change-Id: Ib1734992c7199b9dd4b03afca5372022051b6fbd
Reviewed-on: http://gerrit.cloudera.org:8080/2305
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
Fix a bug in which Impala only reads the first stream
of a multi-stream bz2/gzip file.
Changes the bz2 decoder to read the file in a streaming
fashion rather than reading the entire file into memory
before it can be decompressed.
Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8
(cherry picked from commit b6d0b4e059329633dc50f1f73ebe35b7ac317a8e)
Reviewed-on: http://gerrit.cloudera.org:8080/2219
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
This change includes the potential space needed for null indicators
when determining whether a row can fit in a new I/O write block or
small buffers in buffered tuple stream. For rows with size close to
the I/O block size, there may not be enough space to hold the entire
row in a block after reserving the header space for null indicators.
This change also updates the buffered tuple stream BE test to test
for this corner case.
Change-Id: I256974281d555f9a015c17ea23a1b4d5e9055c97
Reviewed-on: http://gerrit.cloudera.org:8080/1973
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
Fixes a bug in generate_error_codes where the enum value specified
in generate_error_codes.py was not actually used.
Change-Id: If7e3269d12a839106c595d44da09c573a8a177f2
Reviewed-on: http://gerrit.cloudera.org:8080/1894
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
One of the error codes were removed leaveing a gap in the error code
numbers. This change just closes the gap.
Change-Id: I2e424e55439459d4c7a84dd393f55d72400dabf0
Reviewed-on: http://gerrit.cloudera.org:8080/1891
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This patch removes the workaround that disallows SSL and Kerberos
being enabled at the same time. It was previously disallowed because
SSL and Kerberos wouldn't work together for server-server
(or daemon-daemon) communication by causing a hang.
The issue has been addressed in the following patches:
http://gerrit.cloudera.org:8080/#/c/1594/http://gerrit.cloudera.org:8080/#/c/1599/
This patch should be merged only after the above 2 are merged.
Change-Id: I63d492d1733204edd1249aff2cb3b168ec82ea92
Reviewed-on: http://gerrit.cloudera.org:8080/1772
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
FunctionContext::Allocate(), FunctionContextImpl::AllocateLocal()
and FunctionContext::Reallocate() allocate memory without taking
memory limits into account. The problem is that these functions
invoke FreePool::Allocate() which may call MemPool::Allocate()
that doesn't check against the memory limits. This patch fixes
the problem by making these FunctionContext functions check for
memory limits and set an error in the FunctionContext object if
memory limits are exceeded.
An alternative would be for these functions to call
MemPool::TryAllocate() instead and return NULL if memory limits
are exceeded. However, this may break some existing external
UDAs which don't check for allocation failures, leading to
unexpected crashes of Impala. Therefore, we stick with this
ad hoc approach until the UDF/UDA interfaces are updated in
the future releases.
Callers of these FunctionContext functions are also updated to
handle potential failed allocations instead of operating on
NULL pointers. The query status will be polled at various
locations and terminate the query.
This patch also fixes MemPool to handle the case in which malloc
may return NULL. It propagates the failure to the callers instead
of continuing to run with NULL pointers. In addition, errors during
aggregate functions' initialization are now properly propagated.
Change-Id: Icefda795cd685e5d0d8a518cbadd37f02ea5e733
Reviewed-on: http://gerrit.cloudera.org:8080/1445
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
connections
This patch does the following:
* Prevents Impala from starting if 'internal' Kerberos and SSL are
enabled at the same time.
* Changes the required configuration to enable 'internal' SSL to include
--ssl_client_ca_certificate. This allows 'external' SSL to be
configured without enabling 'internal' SSL.
Test are included for the first item. For the second, the appropriate
test is to try to connect with an internal SSL client to a non-SSL
server. However, this causes the connection to hang, which is not an
easy condition to detect in a test case.
Change-Id: I7fa545045fed57e161fb37898d5782937c710a0c
Reviewed-on: http://gerrit.cloudera.org:8080/1318
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/1323
This makes cases like IMPALA-2402 much easier to diagnose, since this
message will mostly occur if Impala tries to read a file that's not an
Avro data file.
Change-Id: I6504e668905ecc6964b77a6fe0cfc9c7511fd5c0
Reviewed-on: http://gerrit.cloudera.org:8080/1202
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Stream::ReadBytes() could fail by other reasons than
'stale metadata'. Adding Errorcode Check to make sure
Impala return proper error message.
It also fixes IMPALA-2488 metadata.test_stale_metadata
fails on non-hdfs filesystem.
Change-Id: I9a25df3fb49f721bf68d1b07f42a96ce170abbaa
Reviewed-on: http://gerrit.cloudera.org:8080/1166
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins