When converting a decimal to a double, we incorrectly used the powf()
function in the backend, which returns a float instead of a double.
This caused us to lose precision.
We fix the problem by replacing the powf() function with a pow()
function, which returns a double.
Testing:
- Added an EE test.
Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3
Reviewed-on: http://gerrit.cloudera.org:8080/8547
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.
Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.
Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Reviewed-on: http://gerrit.cloudera.org:8080/8417
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.
Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.
Performance:
Initial perf testing using tpcds_1000 shows no regression.
Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Reviewed-on: http://gerrit.cloudera.org:8080/7822
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins
Currently a "select *" query in test_inline_view_limit fails during
exhaustive testing because Impala returns columns from HBase tables
in a different order (IMPALA-886) than the one expected. This fix
ensures the column order is consistent by specifying the output
columns in the right order in the select query.
Testing:
Tested locally, with and without exhaustive exploration strategy.
Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330
Reviewed-on: http://gerrit.cloudera.org:8080/8409
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
'Test case 16' in test_row_filters has been failing occasionaly on
ASAN as the runtime filters are not generated within the specified
RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase
RUNTIME_FILTER_WAIT_TIME_MS.
This patch updates all of the tests in test_row_filters to use the
same timeout, which is set to a higher value for ASAN builds.
Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Reviewed-on: http://gerrit.cloudera.org:8080/8358
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Testing:
Added test case to verify that CopyRows in select node is successfully
codegened.
Improved test coverage for select node with limit.
Performance:
Queries used (num_nodes set to 1):
500 Predicates: select * from (select * from tpch_parquet.lineitem
limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 10000 and
l_linenumber > 1 and l_comment >'foo0' .... and l_comment >'foo500'
order by l_orderkey limit 10;
1 Predicate: select * from (select * from tpch_parquet.lineitem
limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10;
+--------------+-----------------------------------------------------+
| | 500 Predicates | 1 Predicate |
| +------------+-------------+------------+-------------+
| | After | Before | After | Before |
+--------------+------------+-------------+------------+-------------+
| Select Node | 12s385ms | 1m1s | 234ms | 797ms |
| Codegen time | 2s619ms | 1s962ms | 200ms | 181ms |
+--------------+------------+-------------+------------+-------------+
Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e
Reviewed-on: http://gerrit.cloudera.org:8080/8196
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins
Before this patch, decimal operations would never produce an error.
Division by and modulo zero would result in a NULL.
In this patch, we change this behavior so that we raise an error
instead of returning a NULL. We also modify the format of the decimal
expr tests format to also include an error field.
Testing:
- Added several expr and end to end tests.
Change-Id: If7a7131e657fcdd293ade78d62f851dac0f1e3eb
Reviewed-on: http://gerrit.cloudera.org:8080/8344
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros()
are incorrect only in case of the last second of 1399, because these
sub-second values are rounded first towards 1400-01-01 00:00:00, which
is accepted as a valid date, and the sub-second part is subtracted
afterwards, leading to a date outside the valid interval. The maximum
case, 9999-12-31 59:59:59 is a bit different, because as I understand,
with nanosecond precision posix times, the maximum value is actually
10000-01-01. 00:00:00 minus 1 nanosec.
TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues
both <1400 and 10000<=.
These timestamps can cause problems, because most code assumes that if
HasDate/HasTime is true, then it really is a valid timestamp.
To fix this, the posix times are checked in the constructor of
TimestampValue, and if it is outside the valid interval,both time_
and date_ are set to not_a_date_time.
Test:
select cast(-17987443200-0.1 as timestamp);
This query no longer crashes, but returns NULL, similarly to other
< 1400 timestamps.
Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Reviewed-on: http://gerrit.cloudera.org:8080/7954
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds multiple query option validation testcases to
be/src/service/query-options-test.cc
The test cases include parsing edge cases, bondary values, special
cases for some options and some testcases moved from
testdata/workloads/functional-query/queries/QueryTest/set.test
This patch also fixes a bug generating wrong error message for
query option RUNTIME_FILTER_WAIT_TIME_MS.
Change-Id: I510e02bb0776673d8cbfc22b903831882c6908d7
Reviewed-on: http://gerrit.cloudera.org:8080/7805
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Parquet splits with multi columns are marked as completed by using
HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types
as column codec types. Thus the number of parquet splits are the real count
multiplies number of materialized columns.
Furthermore, according to the Parquet definition, it allows mixed compression
codecs on different columns. This's handled in this patch as well. A parquet file
using gzip and snappy compression codec will be reported as:
FileFormats: PARQUET/(GZIP,SNAPPY):1
This patch introduces a compression types set for the above cases.
Testing:
Add end-to-end tests handling parquet files with all columns compressed in
snappy, and handling parquet files with multi compression codec.
Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Reviewed-on: http://gerrit.cloudera.org:8080/8147
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This is also a step towards IMPALA-2399 (remove QueryMaintenance()).
"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.
A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.
The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
MemPool::Clear()
More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
StringFunctions::Replace() do the doubling itself to avoid
the need for a special interface. Worst-case this doubles
the memory requirements for Replace() since n / 2 + n / 4
+ n / 8 + .... bytes of memory could be wasted instead of recycled
for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
to come directly from the output RowBatch's MemPool. This is
also potentially applicable to other places where we currently
copy out strings from local allocations, e.g.
AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
FunctionContext layer so that it retains the old behaviour even
though allocations do not all come from FreePools.
The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.
Testing:
* ran exhaustive and ASAN
Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Reviewed-on: http://gerrit.cloudera.org:8080/8025
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The query 'SET <option>=""' will now unset an option within the session,
reverting it to its default state.
This change became necessary when "SET" started returning an empty
string for unset options which don't have a default. The test
infrastructure (impala_test_suite.py) resets options to what it thinks
is its defaults, and, when this broke, some ASAN builds started to fail,
presumably due to a timing issue with how we re-use connections between
tests.
Previously, SessionState copied over the default options from the server
when the session was created and then mutated that. To support unsetting
options at the session layer, this change keeps a pointer to the default
server settings, keeps separately the mutations, and overlays the
options each time they're requested. Similarly, for configuration
overlays that happen per-query, the overlay is now done explicitly,
because empty per-query overlay values (key=..., value="") now have no effect.
Because "set key=''" is ambiguous between "set to the empty string" and
"unset", it's now impossible to set to the empty string, at the session
layer, an option that is configured at a previous layer. In practice,
this is just debug_action and request_pool. debug_action is essentially
an internal tool. For request_pool, this means that setting the default
request_pool via impalad command line is now a bad idea, as it can't
be cleared at a per-session level. For request_pool, the correct
course of action for users is to use placement rules, and to have a
default placement rule.
Testing:
* Added a simple test that triggered this side-effect without this code.
Specifically, "impala-python infra/python/env/bin/py.test tests/metadata/test_set.py -s"
with the modified set.test triggers.
* Amended tests/custom_cluster/test_admission_controller.py; it was
useful for testing these code paths.
* Added cases to query-options-test to check behavior for both
defaulted and non-defaulted values.
* Added a custom cluster test that checks that overlays are
working against
* Ran an ASAN build where this was triggering previously.
Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Reviewed-on: http://gerrit.cloudera.org:8080/8070
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Currently a database is not visible to a user that only has column
level privileges for tables in that database. This patch will make
the database visible, which is the expected behavior in this case.
Testing: added a test case to verify the same.
Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a
Reviewed-on: http://gerrit.cloudera.org:8080/8168
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins
(Re-applies reverted commit 387bde0639.
The commit broke ASAN tests due to a race in how test infrastructure
re-uses connections. The fix for that is in an adjacent commit.)
When converting TQueryOptions to a map<string,string>, we now convert
unset options to the empty string. Within TQueryOptions we have optional
options (like mt_dop or compression_codec) with no default specified. In
this case, the user was seeing 0 for numeric types and the first enum
option for enumeration types (e.g., "NONE" in the compression case).
This was confusing as the implementation handles this "null" case
differently (e.g., using SNAPPY as the default codec in the case
reported in the JIRA).
When running "set" in impala-shell, the difference is as
follows:
- BUFFER_POOL_LIMIT: [0]
+ BUFFER_POOL_LIMIT: []
- COMPRESSION_CODEC: [NONE]
+ COMPRESSION_CODEC: []
- MT_DOP: [0]
+ MT_DOP: []
- RESERVATION_REQUEST_TIMEOUT: [0]
+ RESERVATION_REQUEST_TIMEOUT: []
- SEQ_COMPRESSION_MODE: [0]
+ SEQ_COMPRESSION_MODE: []
- V_CPU_CORES: [0]
+ V_CPU_CORES: []
Obviously, the empty string is a valid value for a string-typed option, where
it will be impossible to tell the difference between "unset" and "set to empty
string." Today, there are two string-typed options: debug_string defaults to ""
and request_pool has no default. An alternative would have been to use
a special token like "_unset" or to introduce a new field in the beeswax
Thrift ConfigVariable struct. I think the empty string approach
is clearest.
The other users of this state, which I believe are HiveServer2's OpenSession()
call and HiveServer2's response to a "SET" query are affected. They
benefit from the same fix, and a new test has been added to test_hs2.py.
I did a mild refactoring in the HS2 tests to write a helper method
for the very common pattern of excecuting a query.
Testing:
* Manual testing with impala-shell
* Modified impala-shell tests to check this explicitly for one case.
* Modified HS2 test to check this as well as the SET k=v statement,
which looked otherwise untested.
Change-Id: I29f5d8ab874cb1338077f16019a9537766cac0c4
Reviewed-on: http://gerrit.cloudera.org:8080/8096
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
test_catalogd_timeout sets a Kudu operation timeout of 1ms and then
performs various Kudu operations which it expects to fail due to a
timeout.
Since the test was written, things have sped up - for example, Impala
used to create a new Kudu client for each operation, but that was
changed in IMPALA-5167, such that the operations now occasionally
complete quickly enough that they don't timeout.
There's not really any way to rewrite this test to ensure that it
won't be flaky, so the patch removes it.
Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Reviewed-on: http://gerrit.cloudera.org:8080/8154
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
No tests were added/dropped or modified. They are consolidated into
fewer .test files.
Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Reviewed-on: http://gerrit.cloudera.org:8080/8153
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.
This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
since they will always be 0, and it renames the 'IntialRunsCreated'
counter to 'RunsCreated' since the 'Initial' refers to the fact that
in a regular sort those runs may be spilled or merged.
It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.
Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
SortType: Partial
ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)
Testing:
- Manually ran several sorting queries and inspected their profiles
- Updated a kudu_insert test that relied on the 'SpilledRuns' counter
to be 0 for a partial sort.
Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Reviewed-on: http://gerrit.cloudera.org:8080/8123
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds a new expression to represent the following boolean predicate:
<expr> IS [NOT] (TRUE | FALSE | UNKNOWN)
The expression is expanded in the parser to istrue/false for the checks
against true and false respectively and to isnull for the check against
unknown. Compared to the other approaches (rewrites, extended backend expr),
this change is the simplest. Main downside is that error messages are
in terms of the lowered expression.
Testing:
- fe: parser, tosql, analyze exprs
- e2e: query exprs
Change-Id: I9d5fba65ef6c87dfc55a25d2c45246f74eb48c40
Reviewed-on: http://gerrit.cloudera.org:8080/8122
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The internal representation of the most negative number
in two's complement requires 1 more bit to represent the
positive version. This means ABS() must promote integer
types to the next highest width.
Change-Id: I86cc880e78258d5f90471bd8af4caeb4305eed77
Reviewed-on: http://gerrit.cloudera.org:8080/8004
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Avoid running the problematic query with short delays. This combination
doesn't add coverage - the short delay was meant to test behaviour when
multiple batches were sent, but there are deliberately no batches sent
with this query.
Testing:
Ran a build against Isilon, which succeeded. Ran the test in a loop
locally overnight.
Change-Id: Ia75c42be2de600344de7af5a917d7843880ea6de
Reviewed-on: http://gerrit.cloudera.org:8080/8111
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
If a scan range is skipped at runtime the scan node skips reading
the range and never figures out the underlying compression codec used
to compress the files. In such a scenario we default the compression
codec to NONE which can be misleading. This change marks these files
as filtered in the scan node profile
e.g. - File Formats: TEXT/NONE:364 TEXT/NONE(Skipped):1460
Change-Id: I797916505f62e568f4159e07099481b8ff571da2
Reviewed-on: http://gerrit.cloudera.org:8080/7245
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Due to re-use of connections in the test infrastructure, this commit
is causing ASAN failures in the builds. That is being worked out
as part of IMPALA-5908, but, in the meanwhile, it's prudent
to revert.
This reverts commit 387bde0639.
Change-Id: I41bc8ab0f1df45bbd311030981a7c18989c2edc8
Reviewed-on: http://gerrit.cloudera.org:8080/8087
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
The error path where delivery of "eos" fails now behaves
the same as if delivery of a row batch fails.
Testing:
Added a timeout test where the leaf fragments return 0 rows. Before
the change this reproduced the hang.
Change-Id: Ib370ebe44e3bb34d3f0fb9f05aa6386eb91c8645
Reviewed-on: http://gerrit.cloudera.org:8080/8005
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This patch fixes a bug that fails a precondition check when generating
runtime filter plans. The lhs and rhs or join predicate might have
different types when the eq predicate function accepts wildcard-typed
parameters. In this case in existing code the types of source and target
expr will be found mismatch and an exception will be thrown when
generating runtime filters. This patch tries to cast target expr to be
of the same type as source expr. A testcase is added to joins.test
Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6
Reviewed-on: http://gerrit.cloudera.org:8080/7949
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
When converting TQueryOptions to a map<string,string>, we now convert
unset options to the empty string. Within TQueryOptions we have optional
options (like mt_dop or compression_codec) with no default specified. In
this case, the user was seeing 0 for numeric types and the first enum
option for enumeration types (e.g., "NONE" in the compression case).
This was confusing as the implementation handles this "null" case
differently (e.g., using SNAPPY as the default codec in the case
reported in the JIRA).
When running "set" in impala-shell, the difference is as
follows:
- BUFFER_POOL_LIMIT: [0]
+ BUFFER_POOL_LIMIT: []
- COMPRESSION_CODEC: [NONE]
+ COMPRESSION_CODEC: []
- MT_DOP: [0]
+ MT_DOP: []
- RESERVATION_REQUEST_TIMEOUT: [0]
+ RESERVATION_REQUEST_TIMEOUT: []
- SEQ_COMPRESSION_MODE: [0]
+ SEQ_COMPRESSION_MODE: []
- V_CPU_CORES: [0]
+ V_CPU_CORES: []
Obviously, the empty string is a valid value for a string-typed option, where
it will be impossible to tell the difference between "unset" and "set to empty
string." Today, there are two string-typed options: debug_string defaults to ""
and request_pool has no default. An alternative would have been to use
a special token like "_unset" or to introduce a new field in the beeswax
Thrift ConfigVariable struct. I think the empty string approach
is clearest.
The other users of this state, which I believe are HiveServer2's OpenSession()
call and HiveServer2's response to a "SET" query are affected. They
benefit from the same fix, and a new test has been added to test_hs2.py.
I did a mild refactoring in the HS2 tests to write a helper method
for the very common pattern of excecuting a query.
Testing:
* Manual testing with impala-shell
* Modified impala-shell tests to check this explicitly for one case.
* Modified HS2 test to check this as well as the SET k=v statement,
which looked otherwise untested.
Change-Id: I86bc06a58d67b099da911293202dae9e844c439b
Reviewed-on: http://gerrit.cloudera.org:8080/7886
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This patch removes most code introduced in IMPALA-1711, which triggers
an error message and causes catalog inconsistency when a partitioned
table is moved to a different database.
IMPALA-1711 is a workaround for HIVE-9720, in which column stats is not
properly updated when a table is moved across databases. Hive-9720 has
been resolved in Hive 1.2.0 so the workaround is no longer needed.
A test case moving a partitioned table to a different database is added
to alter-table.test.
Change-Id: I0ca7063ca1aa9faceed9568d22740d91b6dc20d3
Reviewed-on: http://gerrit.cloudera.org:8080/7857
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This fixes the parquet scanner to free local allocations in
runtime filter contexts for every batch.
Testing:
Added a regression test that runs out of memory before this fix.
Ran core and ASAN builds.
Change-Id: Iecdda6af12d5ca578f7d2cb393e9cb9f49438f09
Reviewed-on: http://gerrit.cloudera.org:8080/7931
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
KuduPartitionExpr takes input rows and sends them to Kudu to determine
the partition the rows correspond to in a particular table's
partitioning scheme. This is then used to partition and sort rows
before sending them to Kudu when performing an INSERT.
If the input types are not the same as (but are compatible with) the
types of the columns in the table, we need to cast the input rows.
KuduPartitionExpr.analyze() actually already does this, but the casts
are dropped for the sort step during the INSERT in most cases.
As a result, attempting to insert a string value into a Kudu timestamp
column causes a crash.
Inserting a numeric value into a different but compatibly typed col
(eg. tinyint into an int col) will cause the sort during a Kudu INSERT
to operate on garbage values, potentially degrading performance and
causing INSERTs to fail due to Kudu timeouts (see IMPALA-3742).
Testing:
- Added an e2e test in kudu_insert.test
Change-Id: I44cf31e46a77f3e7c92cf6b9112653808a001705
Reviewed-on: http://gerrit.cloudera.org:8080/7922
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The calculation in the planner failed to account for the behaviour of
Suballocator, which needs to obtain at least one buffer to allocate any
memory.
Testing:
Added a regression test that caused a crash before the fix.
Updated planner tests.
Was able to run local stress test binary search to completion (it
previously crashed).
Change-Id: I870fbe2f1da01c6123d3716a1198376f9a454c3b
Reviewed-on: http://gerrit.cloudera.org:8080/7871
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
For a series of partitioned joins within the same fragment we must
cast the sender partition exprs of exchanges to compatible types.
Otherwise, the hashes generated for identical partition values may
differ among senders leading to wrong results.
The bug was that this casting process was only performed for
fragments that are hash-partitioned. However, a union produces a
fragment with RANDOM partition, but the union could still contain
partitioned joins whose senders need to be cast appropriately. The
fix is to add casts regardless of the fragment's data partition.
Testing:
- Core/hdfs run passed
- Added a new regresion test
Change-Id: I0aa801bcad8c2324d848349c7967d949224404e0
Reviewed-on: http://gerrit.cloudera.org:8080/7884
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Add a targeted test that confirms that setting the query option will
force spilling.
Testing:
Ran test_spilling locally.
Change-Id: Ida6b55b2dee0779b1739af5d75943518ec40d6ce
Reviewed-on: http://gerrit.cloudera.org:8080/7809
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
The error message returned when a query is rejected due to
insufficient buffer memory is misleading. It recommended a
mem_limit which would be high enough, but changing the
mem_limit may result in changing the plan, which may result
in further changes to the buffer memory requirement.
In particular, this can happen when the planner compares the
expected hash table size to the mem_limit, and decides to
choose a partitioned join over a broadcast join.
While we might consider other code changes to improve this,
for now lets just be clear in the error message.
Testing:
* Adds tests that verify the expected behavior with the new
error message.
Change-Id: I3dc3517195508d86078a8a4b537ae7d2f52fbcb7
Reviewed-on: http://gerrit.cloudera.org:8080/7834
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
Before this change the preaggregation was frequently disabled when
running under some memory pressure, e.g. if the aggregation is at the
end of a pipeline of joins and those joins eat up all the memory.
This can result in huge performance degradation since all rows must then
be sent over the network.
This change always reserves 16 * (buffer size + 64kb) bytes per
preaggregation so that it is always able to build some hash tables and
reduce the input somewhat.
This has two parts:
* Changing the frontend reservation calculation
* Removing dead code in the backend that handled the case when the
initial partitions and hash tables could not be allocated.
Testing:
Passes exhaustive tests.
Change-Id: I2b544f9ec1a906719e2bbb074743926b95a3a573
Reviewed-on: http://gerrit.cloudera.org:8080/7739
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
* Test for disable_unsafe_spills
* Test for buffer size > I/O size (--read_size)
Change-Id: I03de00394bb6bbcf381250f816e22a4b987f1135
Reviewed-on: http://gerrit.cloudera.org:8080/7787
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-4672: Part 2 regressed NAAJ by tightening up the spilling
invariants (e.g. can't unpin with spilling disabled) but we
didn't have tests for spilling NAAJs that could detect the
regression. This patch adds those tests, fixes the regressions,
and improves NAAJ by reliably spilling the probe side and not
trying to bring the whole probe side into memory.
The changes are:
* All null-aware streams start off in memory and are only unpinned if
spilling is enabled.
* The null-aware build partition can be spilled in the same way as hash
partitions.
* Probe streams are unpinned whenever there is memory pressure - if
spilling is enabled and either a build partition is spilled or
appending to the probe stream fails.
* Spilled probe streams are not re-pinned in EvaluateNullProbe().
Instead we just iterate over the rows of the stream.
Testing:
Add query tests where the three different buckets of rows are large
enough to spill: the build and probe of the null-aware partition and the
null probe rows.
Test both spilling and in-memory (with spilling disabled) cases.
Change-Id: Ie2e60eb4dd32bd287a31479a6232400df65964c1
Reviewed-on: http://gerrit.cloudera.org:8080/7367
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Fixed a bug where impala crashes during execution of an aggregation
query using nondeterministic grouping expressions. This happens when
it tries to rebuild a spilled partition that can fit in memory and rows
get re-hashed to a partition other than the spilled one due to the use
of nondeterministic expressions.
Testing:
Added a query test to verify successful execution.
Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23
Reviewed-on: http://gerrit.cloudera.org:8080/7714
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Adds support for a "max_row_size" query option that instructs Impala
to reserve enough memory to process rows of the specified size. For
spilling operators, the planner reserves enough memory to process
rows of this size. The advantage of this compared to simply
specifying larger values for min_spillable_buffer_size and
default_spillable_buffer_size is that operators may be able to
handler larger rows without increasing the size of all their
buffers.
The default value is 512KB. I picked that number because it doesn't
increase minimum reservations *too* much even with smaller buffers
like 64kb but should be large enough for almost all reasonable
workloads.
This is implemented in the aggs and joins using the variable page size
support added to BufferedTupleStream in an earlier commit. The synopsis
is that each stream requires reservation for one default-sized page
per read and write iterator, and temporarily requires reservation
for a max-sized page when reading or writing larger pages. The
max-sized write reservation is released immediately after the row
is appended and the max-size read reservation is released after
advancing to the next row.
The sorter and analytic simply use max-sized buffers for all pages
in the stream.
Testing:
Updated existing planner tests to reflect default max_row_size. Added
new planner tests to test the effect of the query option.
Added "set" test to check validation of query option.
Added end-to-end tests exercising spilling operators with large rows
with and without spilling induced by SET_DENY_RESERVATION_PROBABILITY.
Change-Id: Ic70f6dddbcef124bb4b329ffa2e42a74a1826570
Reviewed-on: http://gerrit.cloudera.org:8080/7629
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Rejects queries during admission control if:
* the largest (across all backends) min buffer reservation is
greater than the query mem_limit or buffer_pool_limit
* the sum of the min buffer reservations across the cluster
is larger than the pool max mem resources
There are some other interesting cases to consider later:
* every per-backend min buffer reservation is less than the
associated backend's process mem_limit; the current
admission control code doesn't know about other backend's
proc mem_limits.
Also reduces minimum non-reservation memory (IMPALA-5810).
See the JIRA for experimental results that show this
slightly improves min memory requirements for small queries.
One reason to tweak this is to compensate for the fact that
BufferedBlockMgr didn't count small buffers against the
BlockMgr limit, but BufferPool counts all buffers against
it.
Testing:
* Adds new test cases in test_admission_controller.py
* Adds BE tests in reservation-tracker-test for the
reservation-util code.
Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b
Reviewed-on: http://gerrit.cloudera.org:8080/7678
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
There was a bug in the BE evaluation logic of the
TupleIsNullPredicate which could lead to wrong results
for certain plan shapes.
A TupleIsNullPredicate should evaluate to true only if
all specified tuples are NULL. This was always the intent
of the FE and is also documented in the BE as the required
behavior.
Testing:
- Added regression test
- Core tests passed
Change-Id: Id659f849a68d88cfe22c65dd1747dd6d6a916163
Reviewed-on: http://gerrit.cloudera.org:8080/7737
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
The original test case accessed the 'pos' field of nested
collections. The query results could vary when reloading
the data because the order of items within a nested
collection is not necessarily the same accross loads.
This patch reformulates the test to avoid 'pos'.
Change-Id: I32e47f0845da8b27652faaceae834e025ecff42a
Reviewed-on: http://gerrit.cloudera.org:8080/7708
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
When an in-memory blocking aggregation or join is in the GetNext()
phase where it is outputting accumulated rows then we expect
memory consumption to monotonically decrease because no more
rows will be accumulated in memory.
This change adds support to release unused reservation and makes
use of it for in-memory aggregations and sorts.
We don't release memory for operators with spilled data, since they
may need the reservation to bring it back into memory. We also
don't release memory in subplans, since it will probably be used
in a later iteration of the subplan.
Testing:
Updated spilling test that now requires less memory.
Ran stress test binary search on tpch_parquet. No changes, except
Q18 now requires 325MB instead of 450MB to execute without spilling.
Ran query with two sorts in the same pipeline and watched /memz to
confirm that the first node in the pipeline was incrementally releasing
memory. Added a regression test based on this experiment.
Added a backend test to directly test reservation decreasing.
Change-Id: I6f4d0ad127d5fcd14b9821a7c127eec11d98692f
Reviewed-on: http://gerrit.cloudera.org:8080/7619
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Before this change, the per-host reservation size was computed
by the Planner. However, scheduling happens after planning,
so the Planner must assume that all fragments run on all
hosts, and the reservation size is likely much larger than
it needs to be.
This moves the computation of the per-host reservation size
to the BE where it can be computed more precisely. This also
includes a number of plan/profile changes.
Change-Id: Idbcd1e9b1be14edc4017b4907e83f9d56059fbac
Reviewed-on: http://gerrit.cloudera.org:8080/7630
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This patch makes it possible to create unpartitioned, managed Kudu
tables from Impala, by making the 'PARTITION BY' clause of 'CREATE
TABLE... STORED AS KUDU' optional:
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type
[kudu_column_attribute ...]
[COMMENT 'col_comment']
[, ...]
[PRIMARY KEY (col_name[, ...])]
)
[PARTITION BY kudu_partition_clause]
[COMMENT 'table_comment']
STORED AS KUDU
[TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
Kudu represents this as a table that is range partitioned on no
columns.
Because unpartitioned Kudu tables are inefficient for large data
sizes, and because the syntax doesn't make it explicit that the table
will be unpartitioned, there is a warning issued to encourage users
to created partitioned tables.
This patch also converts the tpch_kudu.nation and tpch_kudu.region
tables to be unpartitioned, as they are very small.
Testing:
- Updated analysis tests.
- Added e2e test that creates unpartitioned table and inserts into it.
Change-Id: I281f173dbec1484eb13434d53ea581a0f245358a
Reviewed-on: http://gerrit.cloudera.org:8080/7446
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This adds most of the end-to-end tests described in the test plan.
See http://goo.gl/v3Strz.
* End-to-end test for disk spill encryption.
* Admission control test for the case when acquiring initial
reservation fails.
* Initial reservation acquire failure test
* scratch_limit tests for Join, Agg, Sort, Analytic
* Memory usage scaling tests for Join, Agg, Sort, Analytic
Also splits out the slow sort queries in test_spilling and moves them
to exhaustive so the individual tests run faster and have better
parallelism.
Testing:
Ran all the core tests. Will do a full exhaustive run before
committing.
Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f
Reviewed-on: http://gerrit.cloudera.org:8080/7552
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.
min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.
Port ExecNodes to use BufferPool:
* Each ExecNode has to claim its reservation during Open()
* Port Sorter to use BufferPool.
* Switch from BufferedTupleStream to BufferedTupleStreamV2
* Port HashTable to use BufferPool via a Suballocator.
This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)
TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.
Testing:
* Updated tests to reflect new memory requirements
Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Reviewed-on: http://gerrit.cloudera.org:8080/5801
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
A recent change, IMPALA-5016, added an expr rewrite rule to simplfy
coalesce(). This rule eliminates the coalesce() when its first
parameter (that isn't constant null) is a SlotRef pointing to a
SlotDescriptor that is non-nullable (for example because it is from
a non-nullable Kudu column or because it is from an HDFS partition
column with no null partitions), under the assumption that the SlotRef
could never have a null value.
This assumption is violated when the SlotRef is the output of an
outer join, leading to incorrect results being returned. The problem
is that the nullability of a SlotDescriptor (which determines whether
there is a null indicator bit in the tuple for that slot) is a
slightly different property than the nullability of a SlotRef pointing
to that SlotDescriptor (since the SlotRef can still be NULL if the
entire tuple is NULL).
This patch removes the portion of the rewrite rule that considers
the nullability of the SlotDescriptor. This means that we're missing
out on some optimizations opportunities and we should revisit this in
a way that works with outer joins (IMPALA-5753)
Testing:
- Updated FE tests.
- Added regression tests to exprs.test
Change-Id: I1ca6df949f9d416ab207016236dbcb5886295337
Reviewed-on: http://gerrit.cloudera.org:8080/7567
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This is intended to be merged at the same time as Part 2 but is
separated out to make the change more reviewable. Part 2 assumes
that it does not need special logic to handle this mode (e.g.
because the old aggs and joins don't use reservation).
Disable the --enable_partitioned_{aggregation,hash_join} options
and remove all product and test code associated with them.
Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Reviewed-on: http://gerrit.cloudera.org:8080/7102
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
In order to conform to the ISO SQL specifications, this patch allows
the 'order by' clause to be optional for first_value() and last_value()
analytical functions.
Testing:
1. Added Analyzer tests
2. Added Planner tests which checks that a sort node is not used if both
'order by' and 'partition by' clauses are not used. Also, checks that
if only 'partition by' clause is used then the sorting is done only on
the partition column.
3. Added a query test that checks that the input is not reordered by
these analytic functions if 'order by' clause is not used
Change-Id: I5a3a56833ac062839629353ea240b361bc727d96
Reviewed-on: http://gerrit.cloudera.org:8080/7502
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins