test_low_mem_limit_orderby_all is flaking if test_mem_limit equals 100
and 120 in test vector. The minimum mem_limit to run this test is 120MB
+ 30MB = 150MB. Thus, this test vector expect one of
MEM_LIMIT_ERROR_MSGS will be thrown because mem_limit (test_mem_limit)
is not enough.
Parquet scan under this low mem_limit sometimes throws "Couldn't skip
rows in column" error instead. This possibly indicate memory exhaustion
happen while reading parquet page index or late materialization (see
IMPALA-5843, IMPALA-9873, IMPALA-11134). This patch attempt to deflake
the test by adding "Couldn't skip rows in column" into
MEM_LIMIT_ERROR_MSGS.
Change-Id: I43a953bc19b40256e3a8fe473b1498bbe477c54d
Reviewed-on: http://gerrit.cloudera.org:8080/22932
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.
HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.
Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.
Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
tests.
Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ImpalaTestSuite.change_database is responsible to point impala client to
database under test. However, it left client pointing to that database
after the test without reverting them back to default database. This
patch does the reversal by changing ImpalaTestSuite.change_database to
use context manager.
This patch change the behavior of execute_query_using_client() and
execute_query_async_using_client(). They used to change database
according to the given vector parameter, but not anymore after this
patch. In practice, this behavior change does not affect many tests
because most queries going through these functions already use fully
qualified table name. Going forward, querying through function other
than run_test_case() should try to use fully qualified table name as
much as possible.
Retain behavior of ImpalaTestSuite._get_table_location() since there are
considerable number of tests relies on it (changing database when
called).
Removed unused test fixtures and fixed several flake8 issues in modified
test files.
Testing:
- Moved nested-types-subplan-single-node.test. This allows the test
framework to point to the right tpch_nested* database.
- Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will
be fixed in separate patch.
Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2
Reviewed-on: http://gerrit.cloudera.org:8080/22487
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13323 added WARNING log if, for independently declared
query option 'key',
vector.get_value('exec_option')['key'] != vector.get_value('key').
This patch eliminate such WARNING logs by fixing exec option declaration
in test_mem_usage_scaling.py and test_update_stress.py, the only test
producing the WARNING log. Here are the summary of the patch:
- Declare 'mem_limit' using add_exec_option_dimension helper function in
TestQueryMemLimitScaling so that 'mem_limit' dimension is not
silently ignored.
- Declare 'batch_size' using create_exec_option_dimension helper
function in TestIcebergV2UpdateStress to override the default
'exec_option' dimension (containing batch_size=0) that initialized by
ImpalaTestSuite.add_test_dimensions().
- Rename 'mem_limit' dimension to 'test_mem_limit' dimension for
subclasses of TestLowMemoryLimits. The final 'mem_limit' option
is still calculated from 'test_mem_limit' dimension.
- Change the LOG.warn() into pytest.fail() to prevent new tests from
repeating the same issue.
- Address few flake8 warnings and errors.
Testing:
- Pass exhaustive tests for test_mem_usage_scaling.py and
test_update_stress.py.
Change-Id: Ic34187782c51c6d6fc0a688c9c5f72bf0cb2d45c
Reviewed-on: http://gerrit.cloudera.org:8080/21733
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prior deflake attempt at IMPALA-12499 does not seem sufficient. There
are still sporadic failures happening in
test_hdfs_scanner_thread_non_reserved_bytes. This patch further attempt
to deflake it by:
- Injecting 100ms sleep every time scanner thread obtain new scan range.
- Running it serially.
- Skip it in dockerized environment.
This patch also fix small comment mistakes in hdfs-scan-node.cc.
Testing:
- Loop and pass the test 100 times in local minicluster environment.
Change-Id: I5715cf16c87ff0de51afd2fa778c5b591409d376
Reviewed-on: http://gerrit.cloudera.org:8080/20640
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11068 added three new tests into
hdfs-scanner-thread-mem-scaling.test. The first one is failing
intermittently, most likely due to fragment right above the scan does
not pull row batches fast enough. This patch attempt to deflake the
tests by replacing it with simple count start query. The three test
cases is now contained in its own
test_hdfs_scanner_thread_non_reserved_bytes and will be skipped for
sanitized build.
Testing:
- Loop and pass test_hdfs_scanner_thread_non_reserved_bytes a hundred
times.
Change-Id: I7c99b2ef70b71e148cedb19037e2d99702966d6e
Reviewed-on: http://gerrit.cloudera.org:8080/20593
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.
Python 2:
range(0,5) == [0,1,2,3,4]
True
Python 3:
range(0,5) == [0,1,2,3,4]
False
The fix is to wrap locations with list(). i.e.
Python 3:
list(range(0,5)) == [0,1,2,3,4]
True
Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).
Most of the changes were done via these futurize fixes:
- libfuturize.fixes.fix_xrange_with_import
- lib2to3.fixes.fix_map
- lib2to3.fixes.fix_filter
This eliminates the pylint warnings:
- xrange-builtin
- range-builtin-not-iterating
- map-builtin-not-iterating
- zip-builtin-not-iterating
- filter-builtin-not-iterating
- reduce-builtin
- deprecated-itertools-function
Testing:
- Ran core job
Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Enables tests guarded by SkipIfNotHdfsMinicluster to run on Ozone as
well as HDFS. Plans are still skipped for Ozone because there's
Ozone-specific text in the plan output.
Updates explain output to allow for Ozone, which has a block size of
256MB instead of 128MB. One of the partitions read in test_explain is
~180MB, straddling the difference between Ozone and HDFS.
Testing: ran affected tests with Ozone.
Change-Id: I6b06ceacf951dbc966aa409cf24a310c9676fe7f
Reviewed-on: http://gerrit.cloudera.org:8080/19250
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
PlanNode does not consider some factors when estimating memory,
this will cause a large error rate
AggregationNode
1.MemoryEstimate = Ndv * (AvgRowSize + SizeOfBucket)
2.When estimating the Ndv of merge aggregation, Ndv should be
divided only once.
3.If there is no grouping exprs, MemoryEstimate =
MIN_PLAIN_AGG_MEM
SortNode
1.MemoryEstimate = Cardinality * AvgRowSize. Memory used when
there is enough memory
HashJoinNode
1.MemoryEstimate= DataRows + Buckets + DuplicateNodes,
DataRows = RightTableCardinality * AvgRowSize,
Buckets= roundUpToPowerOf2(RightTableCardinality) *
SizeOfBucket,
DuplicateNodes = (RightTableCardinality - RightNdv) *
SizeOfDuplicateNode
KuduScanNode
1.MemoryEstimate = Columns * BytesPerColumn * MaxScannerThreads,
Columns are scanned in query, not all the columns of the table
UnitTest
1.CardinalityTest adds test cases to test memory estimation.
Modify existing test cases related to memory estimation
Change-Id: Ic01db168ff2c6d6de33ee553a8175599f035d7a1
Reviewed-on: http://gerrit.cloudera.org:8080/16842
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.
Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.
Change in assertions:
PlannerTest#testAcidTableScans
PlannerTest#testBloomFilterAssignment
PlannerTest#testConstantFolding
PlannerTest#testFkPkJoinDetection
PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled
PlannerTest#testKuduSelectivity
PlannerTest#testMaxRowSize
PlannerTest#testMinMaxRuntimeFilters
PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled
PlannerTest#testMtDopValidation
PlannerTest#testParquetFiltering
PlannerTest#testParquetFilteringDisabled
PlannerTest#testPartitionPruning
PlannerTest#testPreaggBytesLimit
PlannerTest#testResourceRequirements
PlannerTest#testRuntimeFilterQueryOptions
PlannerTest#testSortExprMaterialization
PlannerTest#testSpillableBufferSizing
PlannerTest#testTableSample
PlannerTest#testTpch
PlannerTest#testKuduTpch
PlannerTest#testTpchNested
PlannerTest#testUnion
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation
Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
query_test/test_udfs.py::TestUdfExecution::test_mem_limits
Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling
Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter
query_test/test_scanners.py::TestWideRow::test_wide_row
Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only)
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
shell/test_shell_client.py::TestShellClient::test_fetch_size
Testing:
- Pass exhaustive tests.
Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Reviewed-on: http://gerrit.cloudera.org:8080/16755
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.
The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
the default webserver port. It failed previously because it
relied on start-impala-cluster.py setting -webserver_port
for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
it.
* Fix some tests that had 'localhost' hardcoded - instead it should
be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
the same for all dockerised impalads.
Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:
./buildall.sh -noclean -notests -ninja
ninja -j $IMPALA_BUILD_THREADS docker_images
export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
export FE_TEST=false
export BE_TEST=false
export JDBC_TEST=false
export CLUSTER_TEST=false
./bin/run-all-tests.sh
Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds test coverage for some cases where queries are currently
expected to fail with out-of-memory:
* memory limit exceeded in exchange node
* aggregation with large var-len intermediate values
* top N with large limit
* hash join with many duplicates on right side
* analytic with a large window that needs to be buffered in-memory
Note that it's not always totally deterministic where the query hits
'memory limit exceeded' so we don't include the node ID or name in the
expected error message.
Testing:
* ran exhaustive tests
* looped modified tests locally overnight
Change-Id: Icd1a7eb97837b742a967c260cafb5a7f4f45412e
Reviewed-on: http://gerrit.cloudera.org:8080/11564
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The issue was that the row batch queue could grow a lot if the consumer
was slow.
Also add an additional test to exercise the OOM code path in Kudu for
completeness.
Testing:
Added sleep to kudu-scan-node.cc that reproduced the problem.
Looped modified test to flush out flakiness.
Change-Id: Ic4a95b6b6d96a447df68ef4912a86f1e11f219ca
Reviewed-on: http://gerrit.cloudera.org:8080/11285
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This restores some of the heuristics removed in IMPALA-4835 that can
help scans from hitting OOM conditions. The heuristics are implemented
at the query level rather than in each scan node in isolation.
Introduce a ScannerMemLimiter class that belongs to the QueryState that
tracks the amount of memory estimated to be consumed for all scanner
threads running for the query on the current backend.
Also check soft memory limits to see if scanner threads should be
started or the current scanner thread should stop.
The long-term plan is to switch to the MT scan node implementations.
When that happens this code can be removed. In the meantime this
code is imperfect but will help avoid OOM in many scenarios.
Testing:
Added regression tests for HDFS and Kudu where we previously could
run out of memory with a low mem_limit.
Manual testing:
* Ran query tests with --thread_creation_fault_injection=true for a
bit, confirmed no crashes.
* ran single-node stress test for Kudu and Parquet for 10-20 min each.
Change-Id: Ib9907fa8c4d2b0b85f67f4f160899c1c258ad82b
Reviewed-on: http://gerrit.cloudera.org:8080/11103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
https://goo.gl/N9LgQt summarises the memory problems I'm trying to solve
here.
Limit the number of enqueued row batches to a number of bytes,
instead of limiting the total number of batches. This helps
avoid pathologically high memory consumption for wide rows where the #
batches limit does not effectively limit the memory consumption.
The bytes limit only lowers the effective capacity of the queue
for wider rows, typically 150 bytes or wider. These are the
cases when we want to reduce the queue's capacity.
E.g. on a system with 10 disks, the previous sizing gave a queue
of 100 batches. If we assume rows with 10x16 byte columns, then
100 batches is ~16MB of data.
Remove RowBatchQueueCapacity counter that is less relevant now
and was not correctly initialised.
Testing:
Added some basic unit tests.
Add regression test that fails reliably before this change.
Ran exhaustive build.
Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681
Reviewed-on: http://gerrit.cloudera.org:8080/10977
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Moved a number of tests with tuned mem_limits. In some cases
this required separating the tests from non-tuned functional
tests.
TestQueryMemLimit used very high and very low limits only, so seemed
safe to run in all configurations.
Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662
Reviewed-on: http://gerrit.cloudera.org:8080/10370
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This has two related changes.
IMPALA-6679: defer scanner reservation increases
------------------------------------------------
When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.
For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.
We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.
IMPALA-6678: estimate Parquet column size for reservation
---------------------------------------------------------
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
can increase the reservation so we can do "efficient" 8MB I/Os for
large columns.
* The ideal reservation is not available - query performance is affected
because we can't overlap I/O and compute as much and may do smaller
(probably 4MB I/Os). However, we should avoid pathological behaviour
like tiny I/Os.
When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.
The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).
Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.
Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.
Added targeted planner and query tests for reservation calculations and
increases.
Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the following squashed patches that were reverted.
I will fix the known issues with some follow-on patches.
======================================================================
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.
* Remove the free buffer cache (which will be replaced by buffer pool's
own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
interfaces - previously callers of ReturnBuffer() were fudging
the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
'is_cancelled_'.
* Clarify the logic around calling Close() for the last
BufferDescriptor.
-> There appeared to be an implicit assumption that buffers would be
freed in the order they were returned from the scan range, so that
the "eos" buffer was returned last. Instead just count the number
of outstanding buffers to detect the last one.
-> Touching the is_cancelled_ field without holding a lock was hard to
reason about - violated locking rules and it was unclear that it
was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
inline at the callsites.
This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.
Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight
======================================================================
IMPALA-4835: Part 2: Allocate scan range buffers upfront
This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.
One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.
There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.
One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.
I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.
Testing:
* Ran core and exhaustive tests.
======================================================================
IMPALA-4835: Part 3: switch I/O buffers to buffer pool
This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.
* The planner reserves enough memory to run a single scanner per
scan node.
* The multi-threaded scan node must increase reservation before
spinning up more threads.
* The scanner implementations must be careful to stay within their
assigned reservation.
The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.
Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.
I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.
Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
since the algorithm is non-trivial.
Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.
Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00
Reviewed-on: http://gerrit.cloudera.org:8080/9679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit"
This reverts commit 25bcb258df.
Revert "IMPALA-6588: don't add empty list of ranges in text scan"
This reverts commit d57fbec6f6.
Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool"
This reverts commit 24b4ed0b29.
Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront"
This reverts commit 5699b59d0c.
Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation"
This reverts commit 65680dc421.
Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb
Reviewed-on: http://gerrit.cloudera.org:8080/9480
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Testing:
I could reproduce the failure locally after a few iterations before the
change. After the change, I cannot reproduce it.
Change-Id: I8c721a154e7f8fbb19d043e03fd001990be3f5fd
Reviewed-on: http://gerrit.cloudera.org:8080/9449
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Skip most mem_usage_scaling tests, which are tuned for the memory
requirements on 3 daemons. Also skip test_sort_reservation_usage,
which similarly is tuned for 3 daemons.
Fix the other test to wrap the HDFS path correctly.
Testing:
Ran the modified tests by hand against the minicluster to confirm
they still ran as expected. Running full set of local tests.
Change-Id: I76086ed695bf78e3e0f2745c1964dac8330d6c19
Reviewed-on: http://gerrit.cloudera.org:8080/9463
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.
* The planner reserves enough memory to run a single scanner per
scan node.
* The multi-threaded scan node must increase reservation before
spinning up more threads.
* The scanner implementations must be careful to stay within their
assigned reservation.
The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.
Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal with this, the reservation in the frontend is based on a
heuristic involving the file size and # columns. The Parquet scanner
can then divvy up reservation to columns based on the size of column
data on disk.
I adjusted how the 'mem_limit' is divided between buffer pool and non
buffer pool memory for low mem_limits to account for the increase in
buffer pool memory.
Testing:
* Added more planner tests to cover reservation calcs for scan node.
* Test scanners for all file formats with the reservation denial debug
action, to test behaviour when the scanners hit reservation limits.
* Updated memory and buffer pool limits for tests.
* Added unit tests for dividing reservation between columns in parquet,
since the algorithm is non-trivial.
Perf:
I ran TPC-H and targeted perf locally comparing with master. Both
showed small improvements of a few percent and no regressions of
note. Cluster perf tests showed no significant change.
Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786
Reviewed-on: http://gerrit.cloudera.org:8080/8966
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, tuple pointers of a row batch are allocated from
the heap via malloc() and tuple data is allocated from the
MemPool associated with the RowBatch. This change converts
the allocations of tuple pointers and tuple data to using
BufferPool for row batches allocated from KrpcDataStreamRecvr.
The primary motivation for this change is to take advantage of
the fact that buffers allocated from BufferPool always go back
to the per-core arena they came from when they are freed. This
alleviates the TCMalloc imbalance between the RPC service threads
and the fragment execution threads. As described in IMPALA-5518,
row batches are always allocated from the service threads' TCMalloc
cache and placed into the fragment execution threads' TCMalloc cache
when they're freed. This leads to underflow and overflow in those
threads' caches and high contention for the spinlock of the central
free list. With BufferPool, the memory always went back to its
originating arena so this kind of imbalance is less likely to occur.
This also dovetails with the long term plan to put most allocations
under BufferPool and have each operators in the plan reserved
appropriate amount of memory before execution.
Note that the proper reservation mechanism of the exchange node
hasn't yet been implemented in this change so the buffer pool client
handle used for allocating buffers has an ad-hoc set-up of no reservation
limit and using root reservation tracker as parent. This needs to be
fixed as part of IMPALA-6524. The default buffer pool limit is also
bumped to 85% to account for the extra usage from the exchange nodes.
The minimum buffer size is also lowered to 8KB to reduce amount of memory
wastage as a row batch's tuple pointers / tuple data can sometimes be
much smaller than 64KB.
Testing done: Debug core build.
Change-Id: If4b1a45f68b9df0d3b539511e15aff15700246f2
Reviewed-on: http://gerrit.cloudera.org:8080/9344
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
In this commit we enable Decimal_V2 by default. We also update the
expected results in many of our tests.
Testing:
Ran an exhaustive test which almost passed. Updated the few failed
tests in it.
Cherry-pick: not for 2.x
Change-Id: Ibbdd05bf986b7947f106b396017faa3a0bd87fd7
Reviewed-on: http://gerrit.cloudera.org:8080/9062
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Rejects queries during admission control if:
* the largest (across all backends) min buffer reservation is
greater than the query mem_limit or buffer_pool_limit
* the sum of the min buffer reservations across the cluster
is larger than the pool max mem resources
There are some other interesting cases to consider later:
* every per-backend min buffer reservation is less than the
associated backend's process mem_limit; the current
admission control code doesn't know about other backend's
proc mem_limits.
Also reduces minimum non-reservation memory (IMPALA-5810).
See the JIRA for experimental results that show this
slightly improves min memory requirements for small queries.
One reason to tweak this is to compensate for the fact that
BufferedBlockMgr didn't count small buffers against the
BlockMgr limit, but BufferPool counts all buffers against
it.
Testing:
* Adds new test cases in test_admission_controller.py
* Adds BE tests in reservation-tracker-test for the
reservation-util code.
Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b
Reviewed-on: http://gerrit.cloudera.org:8080/7678
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This adds most of the end-to-end tests described in the test plan.
See http://goo.gl/v3Strz.
* End-to-end test for disk spill encryption.
* Admission control test for the case when acquiring initial
reservation fails.
* Initial reservation acquire failure test
* scratch_limit tests for Join, Agg, Sort, Analytic
* Memory usage scaling tests for Join, Agg, Sort, Analytic
Also splits out the slow sort queries in test_spilling and moves them
to exhaustive so the individual tests run faster and have better
parallelism.
Testing:
Ran all the core tests. Will do a full exhaustive run before
committing.
Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f
Reviewed-on: http://gerrit.cloudera.org:8080/7552
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.
min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.
Port ExecNodes to use BufferPool:
* Each ExecNode has to claim its reservation during Open()
* Port Sorter to use BufferPool.
* Switch from BufferedTupleStream to BufferedTupleStreamV2
* Port HashTable to use BufferPool via a Suballocator.
This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)
TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.
Testing:
* Updated tests to reflect new memory requirements
Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Reviewed-on: http://gerrit.cloudera.org:8080/5801
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
I ran the stress test binary search locally and it produced a slightly
higher number for Q18 than the hardcoded value. This is enough to move
it above one of the thresholds, so may reduce flakiness.
Testing:
I wasn't able to reproduce the flakiness locally, so can't confirm
this fixes it.
Change-Id: I1ffa969061a52730c5147d142dcd2e3cb3626590
Reviewed-on: http://gerrit.cloudera.org:8080/7512
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.
Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.
Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Reviewed-on: http://gerrit.cloudera.org:8080/6877
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-
git grep -l 'TestDimension' | xargs \
sed -i 's/TestDimension/ImpalaTestDimension/g'
git grep -l 'TestMatrix' | xargs \
sed -i 's/TestMatrix/ImpalaTestMatrix/g'
git grep -l 'TestVector' | xargs \
sed -i 's/TestVector/ImpalaTestVector/g'
The tests all passed in an exhaustive run on the upstream jenkins
server:
http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/
Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
The test is flaky due to nondeterministic memory consumption under ASAN.
Xfail until we have more concrete guarantees on mem usage.
Change-Id: Ieefcb8f8ecc179f483f6d06af80c814fe0ef728e
Reviewed-on: http://gerrit.cloudera.org:8080/2770
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The test has not xfailed in a long time, so we believe that various
memory usage fixes have fixed the flakiness.
Change-Id: Idff06791e9d880cc8ddf54c0c977a556d3701bea
Reviewed-on: http://gerrit.cloudera.org:8080/2442
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
1. Improved join cardinality estimation.
For each equi join predicate we try to determine whether it is
a foreign/primary key (FK/PK) join condition, and either use a
special FK/PK estimation or a generic estimation method. We
maintain the minimum cardinality for each method separately,
and finally return in order of preference:
- the FK/PK estimate, if there was at least one FP/PK predicate
- the generic estimate, if there was at least one predicate with
sufficient stats
- otherwise, we optimistically assume a FK/PK join with a join
selectivity of 1, and return the left-hand size cardinality
2. More robust handling of conjuncts with unknown selectivities,
and conjuncts that are not independent. Uses exponential backoff.
3. More accurate broadcast vs. partitioned join cost estimation.
We now account for the 4 byte per-tuple overhead when serializing
rows over an exchange. This change is especially helpful in cases
where one side of the join has no materialized slots, i.e., it
has a row size of 0, and an exchange used to appear free.
We are obviously not done with improving join cardinality estimates.
This patch is merely a step in the right direction, in particular,
the code and behavior are now more explicit and easier to reason about
than before, and better reflects the original intent (i.e., fixes the
IMPALA-976 bug).
Change-Id: I00d8e8230e2844cb807d128d82b35ee78db7d774
Reviewed-on: http://gerrit.cloudera.org:8080/1668
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This change only affects behaviour when the query is expected to succeed
at the given memory limit and it instead fails with memory limit
exceeded. In this case the test is xfailed.
Also remove unnecessary semicolons in Python file.
Change-Id: Ifae64b2653ee3ab7b59d27b6abbb5215db838190
Reviewed-on: http://gerrit.cloudera.org:8080/1737
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
PAGG and PHJ were using an all-or-nothing approach wrt spilling. In
particular, they were trying to switch to IO-sized buffers for both
streams (aggregated and unaggregated in PAGG; build and probe in PHJ)
of every partition (currently 16 partitions for a total of 32
streams), even if some of the streams had very few rows, they were
empty or simply they would not spill so there was no need to allocate
IO-buffers for them. That was increasing the min mem needed by those
operators in many queries.
This patch decouples the decision to switch to IO-buffers for each
stream of each partition. Streams will switch to IO-sized buffers
whenever the rows they contain do not fit in the first two small
buffers (64KB and 512KB respectively). When we decide to spill a
partition, we switch to IO buffers both streams.
With these change many streams of PAGG and PHJ nodes do not need to
use IO-sized buffers, reducing the min mem requirement. For example,
below is the min mem needed (in MBs) for some of the TPC-H queries.
Some need half or less mem from the mem they needed before:
TPC-H Q3: 645 -> 240
TPC-H Q5: 375 -> 245
TPC-H Q7: 685 -> 265
TPC-H Q8: 740 -> 250
TPC-H Q9: 650 -> 400
TPC-H Q18: 1100 -> 425
TPC-H Q20: 420 -> 250
TPC-H Q21: 975 -> 620
To make this small buffer optimization to work, we had to fix
IMPALA-2352. That is, the AllocateRow() call of
PAGG::ConstructIntermediateTuple() could return unsuccessfully just
because the small buffers of the stream were exhausted. In that case,
previously we would treat it as an indication that there is no memory
left, start spilling a partition and switching all stream to
IO-buffes. Now we make a best effort, trying to first
SwitchToIoffers() and if that is successful, we re-attempt the
AllocateRow() call. See IMPALA-2352 for more details.
Another change is that now SwitchToIoBuffers() will reset the flag
using_small_buffers_ back to false, in case we are in a very low
memory situation and it fails to get a buffer. That allows us to
retry calling SwitchToIoBuffers() once we free up some space. See
IMPALA-2330 for more details.
With the above fixes we should also have fixed IMPALA-2241 and
IMPALA-2271 that are essentially stream::using_small_buffers_-related
DCHECKs.
This patch adds all 22 TPC-H queries in test_mem_usage_scaling test
and updates the per-query min mem limits in it. Additionally, it adds
a new aggregation test that uses the TPC-H dataset for larger
aggregations (TestTPCHAggregationQueries). It also removes some
dead test code.
Change-Id: Ia8ccd0b76f6d37562be21fd4539aedbc2a864d38
Reviewed-on: http://gerrit.cloudera.org:8080/818
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
Conflicts:
tests/query_test/test_aggregation.py
There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that
the state of PHJ was PROCESSING_PROBE. It looks like we can hit the
same dcheck when we are in REPARTITIONING phase.
This patch fixes this dcheck. It also adds tpc-ds q53 in the
test_mem_usage_scaling test (along with the needed refactoring in this
test) because tpc-ds q53 hit this dcheck in an endurance test.
Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28
Reviewed-on: http://gerrit.cloudera.org:8080/893
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
The HdfsParquetScanner would exit with the wrong error that it read
fewer rows than what it was stated in the metadata of the file, when
the ReadValue() call would fail with memory limit exceeded error.
One of the effects of this wrong error reporting it was that tests
like test_mem_usage_scaling would some times fail, especially under
ASAN.
With this patch the parquet scanner checks whether memory limit was
exceeded before checking the difference between the number of rows
read and the number of expected rows according to metadata.
This patch also adds another value in test_mem_usage_scaling test,
that value (20MB) would normally trigger this false negative error.
Change-Id: Iad008d7af1993b88ac4dc055f595cfdbc62a6b79
Reviewed-on: http://gerrit.cloudera.org:8080/557
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The beeswax interface in the test infrastructure was not
closing queries that encountered exceptions. This was
problematic because failed queries would remain open, and
due to IMPALA-2060, resources wouldn't be released. If
admission control or RM is enabled, the test run may
eventually fail if resources continue to be held.
Regardless, failed queries should be closed.
Change-Id: I5077023b1d95d1ce45a92009666448fdc6e83542
Reviewed-on: http://gerrit.cloudera.org:8080/530
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
In the case that the BlockingJoinNode runs the build asynchrously in one
fragment instance but not another, a deadlock between the instances is
possible (see IMPALA-1863 for the details). To avoid this deadlock
potential, close the bulid side child on error, which will deregister
any datastream receivers from that side, breaking the cycle that leads
to the deadlock.
Change-Id: I2de06615897b4bcaa5855449a98984f11c948dc4
Reviewed-on: http://gerrit.cloudera.org:8080/242
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Conflicts:
be/src/exec/blocking-join-node.cc
This patch removes the logic from the python test file, it should really live in the
code that sets up the test-warehouse.
Change-Id: Id04dc90c7ab813af2f347ec79e9e43d76de794a2
Reviewed-on: http://gerrit.cloudera.org:8080/224
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins