After inserting all of its input into its Aggregators,
StreamingAggregationNode performs some cleanup, such as calling
InputDone() on each Aggregator.
Previously, StreamingAggregationNode only checked that all of the
child's batches had been fetched before doing this cleanup, which
causes problems if the final child batch isn't processed fully in a
single GetNext() call. In this case, multiple calls to InputDone()
lead to a DCHECK failure.
The solution is to only perform the cleanup once the final child batch
has been fully processed.
Testing:
- Added an e2e test with a query that hits this condition.
Change-Id: I851007a60472d0e53081c076c863c866c516677c
Reviewed-on: http://gerrit.cloudera.org:8080/11626
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes one of the tests in test_resource_limits that expects a
query to run for more than 2 seconds but currently fails because it
sometimes completes earlier than that.
Change-Id: I2ba7080f62f0af3e16ef6c304463ebf78dec1b0c
Reviewed-on: http://gerrit.cloudera.org:8080/11741
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A spilling test when run on test build with hdfs erasure coding turned
on hits an out of memory error on the hdfs scan node. This happens
because the test is tuned for a regular 3 node minicluster without
hdfs erasure coding. Fix is to simply increase the memory limit on
the test to accommodate this difference yet keep it small enough to
achieve desired spilling on the hash join node.
Testing:
Ran it on an EC enabled minicluster to make sure it works
Change-Id: I207569822ba7388e78936d25e2311fa09c7a1b9a
Reviewed-on: http://gerrit.cloudera.org:8080/11740
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
StringMinMaxFilter uses a MemPool to allocate space for StringBuffers.
Previously, the MemPool was created by RuntimeFilterBank and passed to
each StringMinMaxFilter. In queries with multiple StringMinMaxFilters
being generated by the same fragment instance, this leads to
overlapping use of the MemPool by different threads, which is
incorrect as MemPools are not thread-safe.
The solution is to have each StringMinMaxFilter create its own
MemPool.
This patch also documents MemPool as not thread-safe and introduces a
DFAKE_MUTEX to help enforce correct usage. Doing this requires
modifying our CMakeLists.txt to pass '-DNDEBUG' to clang only in
RELEASE builds, so that the DFAKE_MUTEX will be present in the
compiled IR for DEBUG builds.
Testing:
- I have been unable to repro the actual crash despite trying a large
variety of different things. However, with additional logging added
its clear that the MemPool is being used concurrently, which is
incorrect.
- Added an e2e test that covers the potential issue. It hits the
DFAKE_MUTEX with a sleep added to MemPool::Allocate.
- Ran a full exhaustive build in both DEBUG and RELEASE.
Change-Id: I751cad7e6b75c9d95d7ad029bbd1e52fe09e8a29
Reviewed-on: http://gerrit.cloudera.org:8080/11650
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds the command line flag enable_parquet_page_index_writing
to the Impala daemon that switches Impala's ability of writing the
Parquet page index. By default the flag is false, i.e. Impala doesn't
write the page index.
This flag is only temporary, we plan to remove it once Impala is able to
read the page index and has better testing around it.
Because of this change I had to move test_parquet_page_index.py to the
custom_cluster test suite since I need to set this command line flag
in order to test the functionality. I also merged most of the test cases
because we don't want to restart the cluster too many times.
I removed 'num_data_pages_' from BaseColumnWriter since it was rather
confusing and didn't provide any measurable performance improvement.
This commit fixes the ASAN error produced by the first IMPALA-7644
commit which was reverted later.
Change-Id: Ib4a9098a2085a385351477c715ae245d83bf1c72
Reviewed-on: http://gerrit.cloudera.org:8080/11694
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.
This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.
I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.
Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
objects, which only have microsecond precision. Therefore result
timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
VARCHAR.
I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.
Limitations:
* Not all of the same methods are implemented as for beeswax, so some
tests that have more complicated interactions with the client will not
work with HS2 yet.
* We don't have a way to get the affected row count for inserts.
I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.
Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.
Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds the command line flag enable_parquet_page_index_writing
to the Impala daemon that switches Impala's ability of writing the
Parquet page index. By default the flag is false, i.e. Impala doesn't
write the page index.
This flag is only temporary, we plan to remove it once Impala is able to
read the page index and has better testing around it.
Because of this change I had to move test_parquet_page_index.py to the
custom_cluset test suite since I need to set this command line flag
in order to test the functionality. I also merged most of the test cases
because we don't want to restart the cluster too many times.
Change-Id: If9994882aa59cbaf3ae464100caa8211598287bc
Reviewed-on: http://gerrit.cloudera.org:8080/11563
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the broken SHOW GRANT USER ON <object> that always
shows an empty result due to incorrect comparison between TPrivilege for
the filter vs TPrivilege for the actual privilege that should not
consider the "grantoption".
Testing:
- Added new E2E tests
- Ran all FE tests
- Ran all authorization E2E tests
Change-Id: I7adc403caddd18e5a954cf6affd5d1d555b9f5f0
Reviewed-on: http://gerrit.cloudera.org:8080/11598
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
per host memory limit for a query
With this patch the per host memory limit of a query is automatically
set using the mem_limit set in the query options and the mem_estimate
calculated by the planner based on the following pseudo code:
if mem_limit is set in query options:
use that and if 'clamp-mem-limit-query-option' is true:
enforce the min/max query mem limits defined in the pool config.
else:
mem_limit = max(mem_estiamte,
min_mem_limit_required_to_accomodate_largest_initial_reservation)
finally, enforce min/max query mem limits defined in the pool
config on this value.
This calculated mem limit will also be used for admission accounting
and consequently for admission control. Moreover, three new pool
configuration options have been added to enable this behaviour:
"min-query-mem-limit" & "max-query-mem-limit" => help
clamp the per host memory limit for a query. If both these limits
are not configured, then the estimates from planning are not used
as a memory limit and only used for making admission decisions.
Moreover the estimates will no longer have a lower bound based
on the largest initial reservation.
"clamp-mem-limit-query-option" => if false, the mem_limit defined in
the query options is used directly and the max/min query mem limits
are not enforced on it.
Testing:
Added e2e test cases.
Added frontend tests for changes to RequestPoolService.
Successfully passed exhaustive tests.
Change-Id: Ifec00141651982f5975803c2165b7d7a10ebeaa6
Reviewed-on: http://gerrit.cloudera.org:8080/11157
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds test coverage for some cases where queries are currently
expected to fail with out-of-memory:
* memory limit exceeded in exchange node
* aggregation with large var-len intermediate values
* top N with large limit
* hash join with many duplicates on right side
* analytic with a large window that needs to be buffered in-memory
Note that it's not always totally deterministic where the query hits
'memory limit exceeded' so we don't include the node ID or name in the
expected error message.
Testing:
* ran exhaustive tests
* looped modified tests locally overnight
Change-Id: Icd1a7eb97837b742a967c260cafb5a7f4f45412e
Reviewed-on: http://gerrit.cloudera.org:8080/11564
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds memory estimates for kudu scan nodes based on
empirically derived estimates for the scan's memory consumption
that were added in IMPALA-7096.
Testing:
Modified resource requirements planner test.
Change-Id: If9bb52530271b0bff91311a67d222a2e9fac1229
Reviewed-on: http://gerrit.cloudera.org:8080/11440
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this fix Impala did not check whether a timestamp's time part
is out of the valid [0, 24 hour) range when reading Parquet files,
so these timestamps were memcopied as they were to slots, leading to
results like:
1970-01-01 -00:00:00.000000001
1970-01-01 24:00:00
Different parts of Impala treat these timestamp differently:
- string conversion leads to invalid representation that cannot be
converted back to timestamp
- timezone conversions handle the overflowing time part and give
a valid timestamp result (at least since CCTZ, I did not check
older versions of Impala)
- Parquet writing inserts these timestamp as they are, so the
resulting Parquet file will also contain corrupt timestamps
The fix adds a check that converts these corrupt timestamps to NULL,
similarly to the handling of timestamp outside the [1400..10000)
range. A new error code is added for this case. If both the date
and the time part is corrupt, then error about corrupt time is
returned.
Testing:
- added a new scanner test that reads a corrupted Parquet file
with edge values
Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
Reviewed-on: http://gerrit.cloudera.org:8080/11521
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements the same function as Hive UDF get_json_object.
We reuse RapidJson to parse the json string. In order to track the
memory used in RapidJson, we wrap FunctionContext into an allocator.
get_json_object accepts two parameters: a json string and a selector
(json path). We parse the json string into a Document tree and then
perform BFS according to the selector. For example, to process
get_json_object('[{\"a\":1}, {\"a\":2}, {\"a\":3}]', '$[*].a'),
we first perform '$[*]' to extract all the items in the root array.
Then we get a queue consists of {a:1},{a:2},{a:3} and perform '.a'
selector on all values in the queue. The final results is 1,2,3 in the
queue. As there're multiple results, they should be encapsulated into
an array. The output results is a string of '[1,2,3]'.
More examples can be found in expr-test.cc.
Test:
* Add unit tests in expr-test
* Add e2e tests in exprs.test
* Add tests in test_alloc_fail.py to check handling of out of memory
Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
Reviewed-on: http://gerrit.cloudera.org:8080/10950
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the SHOW GRANT USER statement to show all privileges
granted to a user, either directly via object ownership, or granted
through a role via a group the user belongs to. The output for SHOW
GRANT USER will have two additional columns for privilege name and
privilege type so the user can know where the privilege comes from.
Truncated sample showing two columns that are different from role:
+----------------+----------------+--------+----------+-...
| principal_type | principal_name | scope | database | ...
+----------------+----------------+--------+----------+-...
| USER | foo | table | foo_db | ...
| ROLE | foo_role | server | | ...
+----------------+----------------+--------+----------+-...
Testing:
- Create new custom cluster test with custom group mapping.
- Ran FE and custom cluster tests.
Change-Id: Ie9f6c88f5569e1c414ceb8a86e7b013eaa3ecde1
Reviewed-on: http://gerrit.cloudera.org:8080/11531
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently in DECIMAL V2 mode, typeof(9.9 % 3) is DECIMAL(2,1) and typeof(mod(9.9, 3)) is
DECIMAL(4,1), while both are expected to be DECIMAL(2,1). This jira fixes V2 mode by
replacing "mod" with "%" at parser stage thus they share the same code path afterwards.
Testing:
Added unit tests and done real cluster testing.
Change-Id: Ib0067da04083859ffbf662a29007154461bea2ba
Reviewed-on: http://gerrit.cloudera.org:8080/11443
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Currently for hbase scan nodes we use a constant estimate of 1GB which
is generally a gross over-estimation. This patch improves upon those
estimates by using huerestics based on how hbase rows are stored and
fetched and how the scanners interact with the internal memory pool.
Testing:
Added/Modified resource requirements planner test.
Added a junit test for the estimation logic.
Change-Id: I583545c3f5e454854f111871c5fbc4f108ae4bff
Reviewed-on: http://gerrit.cloudera.org:8080/11306
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The estimate for memory consumption for this scan is 9 columns * 384kb
per column = 3.375mb. So if we set the mem_limit to 6.5mb, we should
still not get more than one scanner thread, but we can avoid hitting
out-of-memory.
The issue in the JIRA was queued row batches. With this change, and
num_scanner_threads=2, there should be max 12 row batches
(10 in the queue, 2 in the scanner threads about to be enqueued)
and based on the column stats I'd estimate that each row batch is
around 200kb, so this change should provide significantly more headroom.
Change-Id: I6d992cc076bc8678089f765bdffe92e877e9d229
Reviewed-on: http://gerrit.cloudera.org:8080/11513
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for having multiple aggregate functions in a
single SELECT block that use DISTINCT over different sets of columns.
Planner design:
- The existing tree-based plan shape with a two-phased
aggregation is maintained.
- Existing plans are not changed.
- Aggregates are grouped into 'aggregation classes' based on their
expressions in the distinct portion which may be empty for
non-distinct aggregates.
- The aggregation framework is generalized to simultaneously process
multiple aggregation classes within the tree-based plan. This
process splits the results of different aggregation classes into
separate rows, so a final aggregation is needed to transpose the
results into the desired form.
- Main challenge: Each aggregation class consumes and produces
different tuples, so conceptually a union-type of tuples flows
through the runtime. The tuple union is represented by a TupleRow
with one tuple per aggregation class. Only one tuple in such a
TupleRow is non-NULL.
- Backend exec nodes in the aggregation plan will be aware of this
tuple-union either explicitly in their implementation or by relying
on expressions that distinguish the aggregation classes.
- To distinguish the aggregation classes, e.g. in hash exchanges,
CASE expressions are crafted to hash/group on the appropriate slots.
Deferred FE work:
- Beautify/condense the long CASE exprs
- Push applicable conjuncts into individual aggregators before
the transposition step
- Added a few testing TODOs to reduce the size of this patch
- Decide whether we want to change existing plans to the new model
Execution design:
- Previous patches separated out aggregation logic from the exec node
into Aggregators. This is extended to support multiple Aggregators
per node, with different grouping and aggregating functions.
- There is a fast path for aggregations with only one aggregator,
which leaves the execution essentially unchanged from before.
- When there are multiple aggregators, the first aggregation node in
the plan replicates its input to each aggregator. The output of this
step is rows where only a single tuple is non-null, corresponding to
the aggregator that produced the row.
- A new expr is introduced, ValidTupleId, which takes one of these
rows and returns which tuple is non-null.
- For additional aggregation nodes, the input is split apart into
'mini-batches' according to which aggregator the row corresponds to.
Testing:
- Added analyzer and planner tests
- Added end-to-end queries tests
- Ran hdfs/core tests
- Added support in the query generator and ran in a loop.
Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Reviewed-on: http://gerrit.cloudera.org:8080/10771
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, HdfsTableSink::computeResourceProfile() didn't account
for clustering while estimating the memory requirement of an
insert fragment. This change ensures that the resource estimates
produced account for the fact that clustered inserts produce a
single partition at a time.
Testing: Modified testResourceRequirements PlannerTest to account
for clustering while generating insert plans.
Change-Id: I75f8baf5fc3e1c357edf6d0cebd1e5dbafc8a3a8
Reviewed-on: http://gerrit.cloudera.org:8080/11485
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In every execution of an Impala query, one of the impalad daemons acts
as the coordinator node. In some cases, such as when using a proxy, a
user cannot predict which host will act as the coordinator. To aid in
diagnosis, we provide a sql function which returns the name of the host
on which the coordinator is running.
EXTERNAL DESCRIPTION:
Add a builtin function called coordinator(), which returns the name of
the host which is running the impalad that is acting as the coordinator
for the current query.
TESTING:
- Added a basic unit test for the new function.
- Added a unit test which simulates the case when coord_address is
unset.
- Added a query that uses coordinator() to exprs.test
- Hand tested in a development deployment.
- Ran regression tests and got a clean run.
Change-Id: I94d6e2664ba659b48df53c5c06f67b502c533e47
Reviewed-on: http://gerrit.cloudera.org:8080/11459
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Running this test on an actual cluster results in a failure,
since the pool name differs. Removing the name allows this
test to pass on clusters. Also tested that minicluster tests
still pass.
Change-Id: I1529c040520a1d8e7ca47960c76028b2579c8d03
Reviewed-on: http://gerrit.cloudera.org:8080/11476
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
GroupingAggregator::Reset() doesn't call Close() on output_partition_,
which can lead to hitting a DCHECK(is_closed) in the Partition
destructor.
Testing:
- Added an e2e regression test.
Change-Id: I6db8ec8479b18b5ed681d7ac438480711ab7a1ba
Reviewed-on: http://gerrit.cloudera.org:8080/11446
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The error message described in IMPALA-6442 incorrectly reported the file offset where the
Parquet footer starts, as if the offset is counted from the file end instead of from the
file beginning. The fix changed the reported file offset to be counted from the beginning
of the Parquet file.
Testing:
Create a small table that contains one row of data with a single column that's bigint and
store it as Parquet. Manually changed the footer size field to be
1) smaller than the original footer size by 1, to trigger the error message fixed by
this jira to be printed, to verify that the fix functions correctly;
2) bigger than the file size, thus to trigger another related error message to be
printed.
Change-Id: I35235e99ea9ceb0d31961dd3b8069f7194f5a2de
Reviewed-on: http://gerrit.cloudera.org:8080/11379
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements support for primitive type widening on parquet
tables. It only supports conversion to those types without any loss of
precision.
- tinyint (INT32) -> smallint (INT32), int (INT32), bigint (INT64),
double (DOUBLE)
- smallint (INT32) -> int (INT32), bigint (INT64), double (DOUBLE)
- int (INT32) -> bigint (INT64), double (DOUBLE)
- float (FLOAT) -> double (DOUBLE)
Testing:
- Added BE test
- Added E2E test
- Ran core tests
Change-Id: If93394b035c64cf6fc5f37b54d29c034cc1f86e4
Reviewed-on: http://gerrit.cloudera.org:8080/11268
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We had quite a few tests that created a table and used
"hdfs dfs -copyFromLocal" to copy data files to the
warehouse directory for this table.
This operation needs some boilerplate code that I
refactored to the new functions called
create_table_from_parquet() and
create_table_and_copy_files().
Change-Id: Ie00a4561825facf8abe2e8e74a6b6e93194f416f
Reviewed-on: http://gerrit.cloudera.org:8080/11127
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If result.ptr allocation fails for some reason inside the StringVal
constructor, we still overwrite result.len and continue.
This change checks that the StringVal pointer is not NULL before
dereferencing it, and returns NULL if it is.
Testing: Added a test case of the to_date() function to
alloc-fail-init.test to leverage the fault injector
--stress_fn_ctx_alloc.
Change-Id: I14cfb29a592885bb2f39958c8644f93db5220a68
Reviewed-on: http://gerrit.cloudera.org:8080/11286
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The issue was that the row batch queue could grow a lot if the consumer
was slow.
Also add an additional test to exercise the OOM code path in Kudu for
completeness.
Testing:
Added sleep to kudu-scan-node.cc that reproduced the problem.
Looped modified test to flush out flakiness.
Change-Id: Ic4a95b6b6d96a447df68ef4912a86f1e11f219ca
Reviewed-on: http://gerrit.cloudera.org:8080/11285
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, when creating a new database, the CatalogOpExecutor would
create an HMS Database object, issue the HMS createDatabase call, and
then create a Catalog entry from that same Database object. The
resulting Catalog entry would be missing certain fields that are
auto-created by the HMS itself, most importantly the location field.
The code for CTAS seems to have contained a workaround for this issue
ever since catalogd was first introduced: rather than using the location
stored in the Db object, it would re-fetch the Database from HMS.
Now that this is fixed, that workaround could be removed and some code
simplified.
A new test verifies that a newly-created database has the appropriate
location, and existing CTAS tests verify that functionality didn't
regress.
Change-Id: I13df31cee1e5768b073e0e35c4c16ebf1892be23
Reviewed-on: http://gerrit.cloudera.org:8080/11229
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This restores some of the heuristics removed in IMPALA-4835 that can
help scans from hitting OOM conditions. The heuristics are implemented
at the query level rather than in each scan node in isolation.
Introduce a ScannerMemLimiter class that belongs to the QueryState that
tracks the amount of memory estimated to be consumed for all scanner
threads running for the query on the current backend.
Also check soft memory limits to see if scanner threads should be
started or the current scanner thread should stop.
The long-term plan is to switch to the MT scan node implementations.
When that happens this code can be removed. In the meantime this
code is imperfect but will help avoid OOM in many scenarios.
Testing:
Added regression tests for HDFS and Kudu where we previously could
run out of memory with a low mem_limit.
Manual testing:
* Ran query tests with --thread_creation_fault_injection=true for a
bit, confirmed no crashes.
* ran single-node stress test for Kudu and Parquet for 10-20 min each.
Change-Id: Ib9907fa8c4d2b0b85f67f4f160899c1c258ad82b
Reviewed-on: http://gerrit.cloudera.org:8080/11103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test started running into IMPALA-7446, maybe because of
a timing change. This appears to have always been possible.
The fix is to reduce the memory requirement of the test.
IMPALA-2256 is no longer really possible because the
BufferedTupleStream code was simplified to avoid the 32-bit
row index limitation, so we're not losing important coverage
on the current code with this change.
Testing:
Ran test in a loop to confirm it did not OOM.
Change-Id: I9d9480cad6bf8222abe990e7046498a0531e2849
Reviewed-on: http://gerrit.cloudera.org:8080/11223
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds some informational output to explain plans and
sends the information to the backend.
The idea is that this will make it easier to explain how Impala's
pipelined execution works and also enable future work on profile
analysis that can more intelligently group plan nodes.
Tests:
* Updated planner tests to include new output.
Change-Id: I1d10eb14d997242f445e5c5fc5362d5410370721
Reviewed-on: http://gerrit.cloudera.org:8080/10848
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for aggregate resource limits at runtime, specified
via query options. If a query exceeds a limit it is terminated. The
checks are periodic so the query may go somewhat over the limits.
SCAN_BYTES_LIMIT is exposed as an advanced query option.
CPU_LIMIT_S is hidden as a development query option because it is flawed
- the CPU user/sys time is only updated upon thread completion, so in
many cases the limit will not take effect until well after the resources
have been used. IMPALA-7318 tracks enabling this.
Query profile is updated to include query wide and per backend metrics
for CPU and scanned bytes. Example from "select count(*) from
tpch_parquet.lineitem":
Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB)
Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB)
Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms)
Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
- NumBackends: 3 (3)
- NumFragmentInstances: 4 (4)
- NumFragments: 2 (2)
- TotalBytesRead: 300.00 KB (307200)
- TotalCpuTime: 96.000ms
Testing:
Added tests for various permutations for CPU_LIMIT_S and
SCAN_BYTES_LIMIT
Based on a previous patch by Mostafa Mokhtar
<mmokhtar@cloudera.com>
Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9
Reviewed-on: http://gerrit.cloudera.org:8080/11081
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The new test added by IMPALA-7308 failed on local-filesystem builds
because the warehouse path is not directly at /test-warehouse. This
fix prefixes the path appropriately with $FILESYSTEM_PREFIX
Additionally, the fix for IMPALA-5542 made a similar mistake
constructing a path on the Python side of a test case. Fixed by using
the get_fs_path function.
Change-Id: I6922e24a74576d0d000e8e2645a235868583c1e1
Reviewed-on: http://gerrit.cloudera.org:8080/11164
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The core of the fix is in ProcessChildBatches(), where we copy
'prev_input_tuple_' to 'prev_input_tuple_pool_' and reset
the child batch *before* the call to child(0)->GetNext(). This
solves a couple of problems:
* prev_input_tuple_ may be referencing memory from the child
that had the needs_deep_copy() flag set and therefore will
be freed or recycled when calling child(0)->GetNext() again.
* 'prev_child_batch_' may have been holding onto resources that
the child had flushed, which means they need to be freed before
the next GetNext() call.
Also refactors the logic around child_cmp_row_ to make the variable
lifetime and data flow clearer.
Testing:
Add regression test. The test passes both with this patch alone and with
IMPALA-7333 reapplied.
Change-Id: I09eb6213d47287f2addb72f8c1304085d2d48c55
Reviewed-on: http://gerrit.cloudera.org:8080/11155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for loading Avro-formatted tables in LocalCatalog. In
the case that the table properties indicate a table is Avro-formatted,
the semantics are identical to the existing catalog implementation:
- if an explicit avro schema is specified, it overrides the schema
provided by the HMS
- if no explicit avro schema is specified, one is inferred, and then the
inferred schema takes the place of the one provided by the HMS (thus
promoting columns like TINYINT to INT)
- on COMPUTE STATS, if any discrepancy is discovered between the HMS
schema and the inferred schema, an error is emitted.
The semantics for LocalCatalog are slightly different in the case of
tables which have not been configured as Avro format on the table level:
The existing implementation has the behavior that, when a table is
loaded, all partitions are inspected, and, if any partition is
discovered with Avro format, the above rules are applied. This has some
very unexpected results, described in an earlier email to
dev@impala.apache.org [1]. To summarize that email thread, the existing
behavior was decided to be unintuitive and inconsistent with Hive.
Additionally, this behavior requires loading all partitions up-front,
which gets in the goal of lazy/granular metadata loading in
LocalCatalog.
Thus, the LocalCatalog implementation differs as follows:
- the "schema override" behavior ONLY occurs if the Avro file format has
been selected at a table level.
- if an Avro partition is added to a non-Avro table, and that partition
has a schema that isn't compatible with the table's schema, an error
will occur on read.
The thread additionally discusses adding an error message on "alter" to
prevent users from adding an Avro partition to a table with an
incompatible schema. To keep the scope of this patch minimal, that is
not yet implemented here. I filed IMPALA-7309 to change the behavior of
the existing catalog implementation to match.
A new test verifies the behavior, set to 'xfail' when running on the
existing catalog implementation.
[1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E
Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69
Reviewed-on: http://gerrit.cloudera.org:8080/10970
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>
Adds the resource estimates for key benchmark workloads:
TPC-H, TPC-DS, TPC-H Nested and TPC-H Kudu to the planner
test so that we can track changes in resource requirements
and estimates for these queries.
Also don't show decimal places for MB and KB estimates. The
estimates are not accurate to that level and displaying
extra precision has some disadvantages:
* It communicates to readers that the estimates have a high level of
precision.
* It increases the odds of small variations in file sizes, etc
causing test failures.
Also fixed a regex in the stress test that didn't escape the decimal
point correctly.
Testing:
Ran core tests.
Change-Id: I6a9f836699200ea87fb03bf36abad0e23949ac26
Reviewed-on: http://gerrit.cloudera.org:8080/11087
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As part of IMPALA-110, all of the logic of performing aggregations was
refactored out of the aggregation ExecNode and into Aggregators. Each
Aggregator manages its own memory, so a DCHECK was added in
AggregationNode::Close to ensure that no allocations were
made from the regular ExecNode mem pools.
This DCHECK is violated if the node was assigned conjuncts that
allocate mem in Prepare - even though the conjuncts are evaluated in
the Aggregator, we still initialize them in ExecNode::Prepare.
This patch solves the problem by creating a copy of the TPlanNode
without the conjuncts to pass to ExecNode. In the future, when
TAggregator is introduced, we can get rid of this by directly
assigning conjuncts to Aggregators.
Note that this doesn't affect StreamingAggregationNode, which never
has conjuncts assigned to it, but this patch fixes an incorrect DCHECK
that enforces this.
Testing:
- Added a regression test for this case.
Change-Id: I65ae00ac23a62ab9f4a7ff06ac9ad5cece80c39d
Reviewed-on: http://gerrit.cloudera.org:8080/11132
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a new query option "timezone" which
defines the timezone used for utc<->local conversions.
The main goal is to simplify testing, but I think that
some users may also find it useful so it is added as a
"general" query option.
Examples:
set timezone=UTC;
set timezone="Europe/Budapest"
The timezones are validated, but as query options are not
sent to the coordinator immediately, the error checking
will only happen when running a query.
Leading/trailing " and 'characters are stripped because the
/ character cannot be entered unquoted in some contexts.
Currently the timezone has effect in the following cases:
-function now()
-conversions between unix time and timestamp if flag
use_local_tz_for_unix_timestamp_conversions is true
-reading parquet timestamps written by Hive if flag
convert_legacy_hive_parquet_utc_timestamps is true
In the near future Parquet timestamps's isAdjustedToUTC
property will be supported, which will decide whether
to do utc->local conversion on a per file+column basis.
This conversion will be also affected.
Testing:
- Extended test_local_tz_conversion.py to actually
test utc<->local conversion. Until now the effect
of flag use_local_tz_for_unix_timestamp_conversions
was practically untested.
- Added a shell test to check that the default of the
query option is the system's timezone.
- Added a shell test to check timezone validation.
Change-Id: I73de86eff096e1c581d3b56a0d9330d686f77272
Reviewed-on: http://gerrit.cloudera.org:8080/11064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The Decimal type in Parquet is a logical type. That means
the Parquet file stores some physical/primitive type that
is annotated by the DECIMAL tag to make it behave like
decimals.
The allowed physical types for decimals are INT32, INT64,
FIXED, and BINARY. Before this commit Impala could only
read decimals stored as FIXED or BINARY.
Spark decided to write decimals as INT32 or INT64 when
their precision allows it:
(1 <= precision <= 9) ==> INT32
(10 <= precision <= 18) ==> INT64
I updated our column readers to accept INT32 and INT64
as valid physical types for decimals.
Testing:
* extended parquet-plain-test.cc
* added Parquet files generated by Spark 2.3.1
and updated test_scanners.py
Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
Reviewed-on: http://gerrit.cloudera.org:8080/11000
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, the planner used the getMajorityFormat to estimate
the memory requirements of its partitions. Additionally, before
IMPALA-6625 was merged, the majority format for a multi-format
table with no numerical majority was calculated using a HashMap,
thus producing non deterministic results. This change ensures that
the memory estimate is deterministic and always based on partition
that has the maximum memory requirement.
Testing: Ran all PlannerTests. Also, modified plans of scans with
multiple partitions to ensure that the memory estimate produced
corresponds to the partition with the maximum requirement.
Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b
Reviewed-on: http://gerrit.cloudera.org:8080/11001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The handling of sync markers after processing a block was broken - eos_
was set if the sync marker straddles the boundary. The expected
behaviour (documented by comments) in this case is that the current
scanner should process the next block, if there is one.
If you look at the logic before the IMPALA-3905 change in commit
931bf49cd9, it split the checking
of eosr() and eof() similar to this patch.
Testing:
Add regression tests that scans a large table with a variety of
different scan range lengths, with some randomisation to exercise
different edge cases. This reliably triggered the bug.
Change-Id: I49a70a4925b0271204b8eea4f980299d7654805a
Reviewed-on: http://gerrit.cloudera.org:8080/11062
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from
IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
changes in this patch, there is no code change for the shims. The shims
for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
implementation.
Testing:
- Ran core and exhaustive tests
Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
Reviewed-on: http://gerrit.cloudera.org:8080/10940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some downstream tests run this with a fair-scheduler.xml set that, while
not changing admission control behavior, does change the name of the
pool. Omit the pool name to permit that downstream test to succeed.
Testing:
- local with change in minicluster
- downstream in environment as well
Change-Id: I3fe6beb169dc6bfefabde9dc7a4632c1a5e63fa7
Reviewed-on: http://gerrit.cloudera.org:8080/10942
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A recent change, IMPALA-110 (part 2), refactored
PartitionedAggregationNode into several classes, including a new type
'Aggregator'. During this refactor, code that makes local allocations
while evaluating exprs was moved from the ExecNode (now
AggregationNode/StreamingAggregationNode) into the Aggregators, but
code related to cleaning these allocations up (ie QueryMaintenance())
was not, resulting in some queries using an excessive amount of
memory.
This patch removes all calls to QueryMaintenance() from the exec nodes
and moves them into the Aggregators.
Testing:
- Added new test cases with a mem limit that fails if the expr
allocations aren't released in a timely manner.
- Passed a full exhaustive run.
Change-Id: I4dac2bb0a15cdd7315ee15608bae409c125c82f5
Reviewed-on: http://gerrit.cloudera.org:8080/10871
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.
Testing was carried out independently on parquet and non-parquet
scans:
1. Parquet scans were tested via the existing parquet-filtering
planner test. Additionally, a new test
[parquet-filtering-disabled] was added to ensure that the
explain plan generated skips parquet predicates based on the
query options.
2. Non-parquet scans were tested manually to ensure that the
functions to compute parquet conjucts were not invoked.
Additional test cases were added to the parquet-filtering
planner test to scan non parquet tables and ensure that the
plans do not contain conjuncts based on parquet statistics.
3. A parquet partition was added to the alltypesmixedformat
table in the functional database. Planner tests were added
to ensure that Parquet conjuncts are constructed only when
the Parquet partition is included in the query.
Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>