The bug:
Evaluating !empty() predicates at non-scan nodes interacts
poorly with our BE projection of collection slots. For example,
rows could incorrectly be filtered if a !empty() predicate is
assigned to a plan node that comes after the unnest of the
collection that also performs the projection.
The fix:
This patch reworks the generation of !empty() predicates
introduced in IMPALA-2663 for correctness purposes.
The predicates are generated in cases where we can ensure that
they will be assigned only by the parent scan, and no other
plan node.
The conditions are as follows:
- collection table ref is relative and non-correlated
- collection table ref represents the rhs of an inner/cross/semi join
- collection table ref's parent tuple is not outer joined
Change-Id: Ie975ce139a103285c4e9f93c59ce1f1d2aa71767
Reviewed-on: http://gerrit.cloudera.org:8080/2399
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Silvius Rus <srus@cloudera.com>
Tested-by: Internal Jenkins
Add a custom cluster test that tests for delays in registering data
stream receivers. We add a stress option to artificially delay this
registration to ensure that it can be handled correctly.
Change-Id: Id5f5746b6023c301bacfa305c525846cdde822c9
Reviewed-on: http://gerrit.cloudera.org:8080/2306
Tested-by: Internal Jenkins
Reviewed-by: Silvius Rus <srus@cloudera.com>
Fix a bug in which Impala only reads the first stream
of a multi-stream bz2/gzip file.
Changes the bz2 decoder to read the file in a streaming
fashion rather than reading the entire file into memory
before it can be decompressed.
Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8
(cherry picked from commit b6d0b4e059329633dc50f1f73ebe35b7ac317a8e)
Reviewed-on: http://gerrit.cloudera.org:8080/2219
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
We need to skip queries that select from tables wiht nested types is
running with the old aggs and joins. To achieve this, move the failing
test to a separate test and use the skip decorator.
Change-Id: Iaf1351c711b524be66a99084657926909425cbff
Reviewed-on: http://gerrit.cloudera.org:8080/2272
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
After this patch structs can be parsed/created with field names
that are regular identifiers or keywords, even if unquoted.
This fix is needed for parsing type strings stored in the
Hive Metastore which could contain unquoted identifiers that
correspond to Impala keywords.
The parser changes required an upgrade of Cup and its Maven plugin.
In the old version, the generated parser would not compile because
of a giant method that exceeded the JVM maximum allowed size for a
single method.
Change-Id: Ic989c7afd034216f6db4c8f9f3901c025cceb524
Reviewed-on: http://gerrit.cloudera.org:8080/2249
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit adds a new feature to persist hive/java udfs across
catalog restarts. IMPALA-1748 already added this for non-java
udfs by storing them in parameters map of the Db object and
reading them back at catalog startup. However we follow a
different approach for hive udfs by converting them to Hive's
function format and adding them as hive functions to the metastore.
This makes it possible to share udfs between hive and Impala as the
udfs added from one service are accessible to other. This commit
takes care of format conversions between hive and impala and user
can just add function once in either of the services.
Background: Hive and impala treat udfs differently. Hive resolves the
evaluate function in the udf class at runtime depending on the data
types of the input arguments. So user can add one function by name and
can pass any arguments to it as long as there is a compatible evaluate
function in the udf class. However Impala takes the input types of the
udf as a part of function definition (that maps to only one evaluate
function) and loads the function only for those set of input argument
types. If we have multiple 'evaluate' methods, we need to add multiple
functions one for each of them.
This commit adds new variants of CREATE | DROP FUNCTIONS to Impala which
lets the user to create and drop hive/java udfs without input argument
types or return types. Catalog takes care of loading/dropping the udf
signatures corresponding to each "evaluate" method in the udf symbol
class. The syntax is as follows,
CREATE FUNCTION [IF NOT EXISTS] <function name> <function_opts>
DROP FUNCTION [IF EXISTS] <function name>
Examples:
CREATE FUNCTION IF NOT EXISTS foo location '/path/to/jar' SYMBOL='TestUdf';
CREATE FUNCTION bar location '/path/to/jar' SYMBOL='TestUdf2';
DROP FUNCTION foo;
DROP FUNCTION IF EXISTS bar;
The older way of creating hive/java udfs with specific signature is still supported,
however they are *not* persisted across restarts. So a restart of catalog can
wipe them out. Additionally this commit also loads all the compatible java udfs
added outside of Impala and they needn't be separately loaded. One thing
to note here is that the functions added using the new CREATE FUNCTION
can only be dropped using the new DROP FUNCTION syntax (without
signature). The same rule applies for the java udfs added using the old
CREATE FUNCTION syntax (with signature).
Change-Id: If31ed3d5ac4192e3bc2d57610a9a0bbe1f62b42d
Reviewed-on: http://gerrit.cloudera.org:8080/2250
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
We do not have exceptions enabled for codegen'd code, so exceptions
thrown by functions called by codegen'd functions cannot be caught by
the codegen'd functions. TimestampValue::UnixTimeToPtime() has a
try/catch around boost::posix_time::ptime_from_tm(), but since it was
inlined into the TimestampFunctions::FromUnix() IR the try/catch
didn't work. This patch moves the UnixTimeToPtime() implementation to
the .cc file so it doesn't get included in the IR. It does the same
for TimestampParser::Parse() in case it gets inlined into IR code as
well.
Change-Id: Ic0af73629e1e3b6bf18cbf5d832973712b068527
Reviewed-on: http://gerrit.cloudera.org:8080/2210
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Hive allows udfs with primitive data types as return values (along
with Writables) and input arguments. This commmit adds this support
for Impala.
Change-Id: I2ec24eab5a824772a8618d7fb97ae5c7ea2a0e39
Reviewed-on: http://gerrit.cloudera.org:8080/2207
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Test files in testdata/workloads/functional-query/queries/QueryTest
are parsed by test_file_parser.py, which used to ignore everything
before the first ==== line as a file header. This change fixes all
affected files.
This change also modifies the test file parser to forbid headers
starting with what looks like a subsection title ('----'), which
should prevent the reintroduction of similar errors in the future.
Change-Id: Iaa1bc5ffd02782e24289c7843dcb35401c334519
Reviewed-on: http://gerrit.cloudera.org:8080/2220
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
AllocBuffer() and CopyStringVal() are two helper functions used by
various UDAs to allocate buffers for StringVal during their Init()
and Update() functions. Previously, these functions assumed that
the buffer length is always greater than 0. That turned out to be
an invalid assumption. This change removes this assumption and
handles zero-length StringVal by initializing its 'ptr' to NULL and
'len' to 0. A new test is also added to exercise this case.
Change-Id: Ia1e4140376c65ca3c734c40ecc3cce15b8bf2d3f
Reviewed-on: http://gerrit.cloudera.org:8080/2211
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This test query is supposed to check the error path for when a
collection buffer cannot be allocated. However, it's flaky because the
collection allocations are not very big (< 2KB), so it's possible for
a different operator to trigger OOM.
I think the correct solution is to create a test file that contains
very large collections, so a large collection allocation will trigger
OOM, rather than many small collection allocations. For now though,
let's disable the specific collection allocation check to unblock the
build, even though we risk losing coverage.
Change-Id: Iab4c9b605186926c522cf692246a37882fbdfcdb
Reviewed-on: http://gerrit.cloudera.org:8080/2208
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
A regression test for IMPALA-2265, IMPALA-2559 expected a
query to fail with an OOM but the mem limit is now too high.
This reduces the mem limit of the test case to be as low as
it can be without failing to set up the operators.
Change-Id: I056c3ad4067e5466e3690c3b4d597b9815a7a234
Reviewed-on: http://gerrit.cloudera.org:8080/2186
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 45ba3109e752dfdeefdf5627a5d57079f73b24c9)
The test was recently reenabled in commit
71a0a7d998702781ae44270f8c742b10c34c0efc.
Continue running the test but loosen the memory limit and don't check
the runtime profile. The memory limits for this set of tests needs
revisiting in any case.
Change-Id: I195e8ad3b67c8ff85d5d15c2646a13f5feb57553
Reviewed-on: http://gerrit.cloudera.org:8080/2183
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 51632f39a45ba9deac9b86bbdb14ff10cbee35ac)
A couple of tests were disabled because of IMPALA-1305. Now that the fix
is in, those tests can be reenabled. I ran them in a loop to make sure
that they weren't flaky.
Also fix the spelling mistake in the file name.
Change-Id: I1bfcc619911a92d93b871be3a14852aa11f78da9
Reviewed-on: http://gerrit.cloudera.org:8080/2150
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch implements codegen support for aggregations with decimal
input and intermediate type. For the following benchmark query:
SELECT l_discount, count(*) AS cnt
FROM biglineitem
GROUP BY l_discount
HAVING cnt > 9999999999999
Query time went from 8.85s to 3.74s (2.4x faster).
Change-Id: I25934fcd6324e5bf1fa6859496107bf2ec68b8d3
Reviewed-on: http://gerrit.cloudera.org:8080/2050
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
The bug: In SingleNodePlanner.createInlineViewPlan() we need to wrap some
exprs with TupleIsNullPredicates to preserve correctness if the inline view
is outer joined. The bug was that we used to perform this wrapping on
the rhs of the inline view's smap, and not the final output smap after
those rhs exprs have been resolved against the physical output of the
inline view's plan root. As a result, the TupleIsNullWrapping did not
work correctly for deeply nested inline views with exprs that require
wrapping at various nesting levels.
The fix: Resolve the exprs against the physical output of the inline view's
plan root before performing the TupleIsNullPredicate wrapping.
Change-Id: I183bba6a36bf5e19a88687ed8c82977ae769ddf4
Reviewed-on: http://gerrit.cloudera.org:8080/2092
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
test_alloc_fail_update() aims to stress memory allocation
failure in the Update(), Serialize() and/or Finalize() functions
of UDAs. However, this test included some UDFs which allocated
memory in their Init() functions and not during their Update()
functions. This change removes those UDFs from the test.
Change-Id: I1ecc7e838e34ebc9ea3c878fee8ea2497b5fa23e
Reviewed-on: http://gerrit.cloudera.org:8080/2005
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
IMPALA-2862 fixed parsing for regular expressions in the result
verifier. This change fixes a test that had a broken regular expression,
which was not caught by the exhaustive test suite.
I search for tests with a similar issue but couldn't find any:
git grep "regex:[^,]\+'"
Change-Id: I3aaca6bdfdc1eaab715929aa5fc6b64e6c969656
Reviewed-on: http://gerrit.cloudera.org:8080/2089
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
When the query option OPTIMIZE_PARTITION_KEY_SCANS is true, we may
acquire the partition key values from the metadata and generate a
union node containing constant expressions only. There is a bug in
the planner when generating the union node as it skips evaluating
the constant expressions for unmaterialized slots but union node
expects an entry in the constant expression lists for each slot
in the tuple descriptor even if the slot is not materialized.
This change fixes the problem by inserting a dummy null values
in the constant expression list for unmaterialized slots and lets
the union node filter them out. A test is also added to verify
the fix.
Change-Id: I9ed49dca0101b96bd9b20e6d1e5b1d56f654e911
Reviewed-on: http://gerrit.cloudera.org:8080/2067
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug was that we were not updating child_row_idx_
when the output batch was at capacity, leading us to
double count that last child_row_idx_, and incorrectly
returning extra rows.
Change-Id: I85b2f1c146861ec7756887b0d2c574365d90233e
Reviewed-on: http://gerrit.cloudera.org:8080/2044
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug: In order to preserve tuple nullability information through analytic sorts
we materialize the relevant expressions that contain TupleIsNullPredicates, and
with appropriate changes to the analytic sort's output smap. However, in some
edge cases, we incorrectly materialized an expr with a TupleIsNullPredicate
that could not be evaluated at that sort node because the tuple ids referenced
by the expr were not produced by the sort's input. For example, this scenario
was possible when a constant expr was wrapped in a TupleIsNullPredicate, and our
isBoundByTupleIds() check failed to filter out the expr from materialization
at the analytic sort.
The fix: Our existing code in the AnalyticPlanner already does the right thing.
We were simply missing the implementation of TupleIsNullPredicate.isBoundByTupleIds().
Change-Id: I72774f698545220922dd8ffbfa514aa87d26f97d
Reviewed-on: http://gerrit.cloudera.org:8080/2008
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This commit fixes an issue where wrong results are returned if an EXISTS subquery
contains a HAVING clause and non-equality correlated binary predicates. This case does
not have a valid rewrite as the HAVING clause needs to be applied after the correlated
predicates have been evaluated. With this fix, we detect cases like this and throw an
AnalysisException.
Change-Id: I159f956e2b01f408601829b5d2afcf11d76bedcd
Reviewed-on: http://gerrit.cloudera.org:8080/1927
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Test results can be verified using regular expressions. The extraction
of the regular expression substring from the expected test results had a
bug where only the first character of an expression was considered. This
lead to wrong but undetected test results.
Change-Id: Ia670da6e0758455a86dc44744b96b9465d890af3
Reviewed-on: http://gerrit.cloudera.org:8080/1818
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
This commit improves the performance of DDL statements on tables with
large number of partitions. Previously, the catalog would force-reload
the entire table metadata during the execution of DDL and insert
statements, causing significant delays for tables with large number of
partitions. With this commit the catalog is reusing any cached table
entries to partially reload table metadata for only those partitions
that have been modified. With this change we've improved the performance
of some DDL and insert statements by at least 4-5X.
This commit also adds basic table-level locking to protect table
metadata from concurrent DDL operations.
Preliminary performance measurements
-----------------------------------
Workload: insert into table partition () select ... limit 10
Iterations: 10
Num partitions OLD (avg time sec) NEW (avg time sec)
1K 1.15 0.45
5K 3.65 0.9
10K 5.75 1.38
15K 10.1 2.02
30K 25.4 4.46
Workload: alter table partition() set location...
Iterations: 10
Num partitions OLD (avg time sec) NEW (avg time sec)
1K 0.8 0.47
5K 4.3 0.71
10K 7.1 1.2
15K 13.2 1.8
30K 26.8 3.4
Change-Id: I4da7fb6df0a71162b0cb60e6025a4019cb9572bf
Reviewed-on: http://gerrit.cloudera.org:8080/1706
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
We cannot run certain nested types queries with the legacy joins/aggs,
so to fix a build I just moved a recently added test into a different
.test file that already does not run with legacy joins/agggs.
Change-Id: I0ec0e61535ad01333129bd49beca4aa481f04d74
Reviewed-on: http://gerrit.cloudera.org:8080/1918
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
This commit fixes a DDL test that was failing because some newly added test cases were
using a database that had been dropped by another test case. The temporary fix is to use
fully qualified names for the specified tables.
Change-Id: I3bb022e2497283faeb84c85f922cda95beca2a32
Reviewed-on: http://gerrit.cloudera.org:8080/1909
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
MathFunctions::RandPrepare() allocates a 4-bytes seed and
stores it in the FunctionContext's thread local state.
However, it was never freed. This change fixes the problem
by adding a close function for Rand() so it has a chance to
free the seed. A new test is also added to verify the fix.
Change-Id: Ibcc2e1ca0d052b86defe80aad471f9fdaac5a453
Reviewed-on: http://gerrit.cloudera.org:8080/1855
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
A failed test case inside a test file will leave the rest of
the test cases in the file unexecuted. Some test cases may
modify some query options such as memory limit and then
restore them in the subsequent test cases in the same file.
The failure of those test cases will leave the query options
modified, causing cascading failures to other test cases
which aren't expected to be run with the modified query
options (e.g. lowered memory limit). This problem may lead
to broken builds which are recorded in IMPALA-2724 and
IMPALA-2824.
This change fixes the problem above by checking if a test
case modifies any query option and if so, restore those
modified query options to their default values. This change
makes the assumption that a test should not modify an option
specified in its test vector so it's safe to restore the
modified query options to their default values.
Change-Id: Ib88d1dcb6a65183e1afc8eef0c764179a9f6a8ce
Reviewed-on: http://gerrit.cloudera.org:8080/1774
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
When multiplying double and decimal values, we used to cast all doubles
to decimals before doing the multiplication. Due to the precision of two
decimals being added during multiplication, the effective value range of
the resulting decimal type could become very small and overflows could
happen.
This change switches the behavior to cast to double precision types when
at least one of the input operands is of type float or double. In such
cases we will not have exact results in general and we assume the user
would normally not expect exact results from an inherently inexact
datatype.
Change-Id: Idd28c5471506c68a860beb0778d98c8d25825f9f
Reviewed-on: http://gerrit.cloudera.org:8080/1820
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
SHOW CREATE TABLE already outputs information for views. As a
convenience, this patch adds SHOW CREATE VIEW as an alias for SHOW
CREATE TABLE.
Switch some SHOW CREATE VIEW tests to use SHOW CREATE VIEW and add
additional test for SHOW CREATE VIEW on a table so that expected
behaviour is tested.
Change-Id: I9925e0789573e9b097a2ef52b5023964dcf8f32c
Reviewed-on: http://gerrit.cloudera.org:8080/1661
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This changes implements support for PARTITIONED BY clauses in CTAS
statements. The syntax and semantics follow the PARTITION feature of
insert from select statements: inside the PARTITIONED BY (...) column
list the user must specify names of the columns to partition by. These
column names must appear in that particular order at the end of the
select statement. A remapping between columns of the source and
destination tables is not possible, because the destination table does
not yet exist. Specifying static values for the partition columns is
also not possible, as their type needs to be deduced from columns in the
select statement. Example:
CREATE TABLE t (a DOUBLE, b INT);
INSERT INTO t VALUES (1.5, 3);
CREATE TABLE p PARTITIONED BY (b) AS SELECT a, b FROM t;
This change also contains a fix for setting the PYTHONPATH environment
variable correctly, so you can run single python tests from the command
line.
Change-Id: I5f61854d36d1ee30cfcd1c6b2b3eb971f6cf4b2f
Reviewed-on: http://gerrit.cloudera.org:8080/1740
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Impala could crash or return wrong result if it uses codegend
avro decoding function to scan avro file that has different
schema than table schema. With AVRO-1617 fix, we make sure
Impala doesn't use codegen if table schema has less columns
than file schema.
Change-Id: I268419e421404ad6b084482dee417634f17ecf60
Reviewed-on: http://gerrit.cloudera.org:8080/1696
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
Enforces that the planner treats IS NOT DISTINCT FROM as eligible for
hash joins, but does not find the minimum spanning tree of
equivalences for use in optimizing query plans; this is left as future
work.
Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0
Reviewed-on: http://gerrit.cloudera.org:8080/710
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
input_stream_ is now set to NULL on eos, but execution can
continue through the function to l753 which touches
input_stream_ without checking for NULL. This occurs
occasionally when the transfer of memory from the mem pool
to the output row batch occurs at eos.
Change-Id: I8ca88ef10d48e19cfde7f3c6de9512eefcae561e
Reviewed-on: http://gerrit.cloudera.org:8080/1757
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Fixes tests in show.test that executes 'show files' on the
'insert_string_partitioned' table in our functional db. The expected
output relied on modifications to 'insert_string_partitioned' made
in test_insert.py. Tests should not rely on the overall ordering
of test execution.
This patch also fixes 'show files' to produce a consistently ordered
output.
Change-Id: Ic736b94b70677b0e3f4f8a9838ffdfdde2ba17ab
Reviewed-on: http://gerrit.cloudera.org:8080/1748
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The original purpose of the escapechartesttable was to test
Impala's behavior on text tables that have the same character
as line terminator and escape character. Recent changes in
Hive have made creating such a table impossible because
1) Only newline is allowed as the line terminator
2) Newline is forbidden as the escape character
See HIVE-11785 for details on the Hive changes.
This commit removes escapechartesttable and all associated
tests, but does not add the same enforcement rules as Hive.
These enforcement rules should be added in a follow-on change.
Change-Id: I2bd9755f4c2cc3d7dfd8d67c3759885951550f08
Reviewed-on: http://gerrit.cloudera.org:8080/1690
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
This commit fixes a bug where catalog incorrectly converts all sentry
authorizables to lower case. Since hdfs URIs are case sensitive,
this bug can result in incorrect grants when the URIs have uppercase
letters.
Change-Id: I642c34d9046729dc904cc45871c7d7959ae828bc
Reviewed-on: http://gerrit.cloudera.org:8080/1675
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
FunctionContext::Allocate(), FunctionContextImpl::AllocateLocal()
and FunctionContext::Reallocate() allocate memory without taking
memory limits into account. The problem is that these functions
invoke FreePool::Allocate() which may call MemPool::Allocate()
that doesn't check against the memory limits. This patch fixes
the problem by making these FunctionContext functions check for
memory limits and set an error in the FunctionContext object if
memory limits are exceeded.
An alternative would be for these functions to call
MemPool::TryAllocate() instead and return NULL if memory limits
are exceeded. However, this may break some existing external
UDAs which don't check for allocation failures, leading to
unexpected crashes of Impala. Therefore, we stick with this
ad hoc approach until the UDF/UDA interfaces are updated in
the future releases.
Callers of these FunctionContext functions are also updated to
handle potential failed allocations instead of operating on
NULL pointers. The query status will be polled at various
locations and terminate the query.
This patch also fixes MemPool to handle the case in which malloc
may return NULL. It propagates the failure to the callers instead
of continuing to run with NULL pointers. In addition, errors during
aggregate functions' initialization are now properly propagated.
Change-Id: Icefda795cd685e5d0d8a518cbadd37f02ea5e733
Reviewed-on: http://gerrit.cloudera.org:8080/1445
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
SHOW CREATE TABLE now supports views. It returns a CREATE VIEW statement
with column names and the original sql statement.
Authorization allows SHOW CREATE TABLE to be run on view if the user has
VIEW_METADATA privilege on the view and SELECT privilege on all
underlying views and table.
E.g. "SHOW CREATE TABLE some_view" returns output of form:
CREATE VIEW a_database.some_view (id, bool_col, tinyint_col) AS
SELECT id, bool_col, tinyint_col FROM functional.alltypes
Change-Id: Id633af2f5c1f5b0e01c13ed85c4bf9c045dc0666
Reviewed-on: http://gerrit.cloudera.org:8080/713
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
As part of change, refactor catalog and frontend functions to return
TDatabase/Db objects instead of just the string names of databases -
this required a lot of method/variable renamings.
Add test for creating database with comment. Modify existing tests
that assumed only a single column in SHOW DATABASES results.
Change-Id: I400e99b0aa60df24e7f051040074e2ab184163bf
Reviewed-on: http://gerrit.cloudera.org:8080/620
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
When building hash tables for the build side in partitioned
hash join or aggreagtion, we will evaluate the build or probe
side expressions to compute the hash values for each TupleRow.
Evaluation of certain expressions (e.g. CastToChar) requires
"local" memory allocation. "Local" memory allocation is supposed
to be freed after processing each row batch.
However, the calls to free local allocations are missing in
PartitionedHashJoinNode::BuildHashTableInternal() and
PartitionedAggregationNode::ProcessStream(). This causes all
"local" memory allocation to accumulate potentially for the
entire duration of the query or until GetNext() is called.
This may lead to unnecessary memory allocation failure as
memory limit is exceeded.
This patch calls ExecNode::FreeLocalAllocations() at least once
per row-batch when building hash tables. It also adds the missing
checks for the query status in the loop building hash tables.
Please note that QueryMaintenance() isn't called due to its
overhead in memory limit checks.
Change-Id: Idbeab043a45b0aaf6b6a8c560882bd1474a1216d
Reviewed-on: http://gerrit.cloudera.org:8080/1448
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This patch reduces memory usage of scanners by adjusting how batch
capacity is checked and handled and by freeing unneeded memory.
Change RowBatch::AtCapacity(MemPool) so that batches with no rows
cannot hold onto an unbounded amount of memory - instead they
will pass these batches up operator tree so that the resources
can be freed.
The Parquet scanner also only checked capacity every 1024 rows.
With large rows (e.g. nested collections), it can overrun the
intended 8mb limit. It also didn't include the MemPool usage
in its checks. After the change the scanner will produce smaller
batches if rows contain large nested collections or strings.
I benchmarked this with a scan of the nested TPC-H customers
tables. The row batch sized decrease from ~16MB to ~8MB. If the
nested collections were larger this would be more drastic.
Also pass at capacity up the tree if no rows passed the conjuncts in
the DataSourceScanNode and Parquet scanner so that resources can be
freed.
HdfsTableSink is modified to avoid the incorrect assumption that a batch
only has 0 rows at eos. It is also refactored to pass a related flag as
an argument to make the semantics clearer.
Two simple benchmarks (one column and many columns) shows no change
in scanner performance:
> set num_scanner_threads=1;
> select count(l_orderkey) from biglineitem;
> select count(l_orderkey), count(l_partkey), count(l_suppkey),
count(l_returnflag), count(l_quantity), count(l_linenumber),
count(l_extendedprice), count(l_linestatus), count(l_shipdate),
count(l_commitdate) from biglineitem;
Change-Id: I3b79671ffd3af50a2dc20c643b06cc353ba13503
Reviewed-on: http://gerrit.cloudera.org:8080/1239
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This commmit implements DESCRIBE DATABASE [FORMATTED|EXTENDED] <db_name>.
Without FORMATTED|EXTENDED this statement only prints database's location and
comment. With FORMATTED|EXTENDED it will output all the properties of the database
(e.g. OWNER and PARAMETERS). Currently we only retrieve privileges stored in
hive metastore.
This commit also implements DESCRIBE EXTENDED <table>, which is the same
as DESCRIBE FORMATTED <table> for consistency purpose.
Change-Id: I2a101ec0e3d27b344fcb521eb00e5bdbcbac8986
Reviewed-on: http://gerrit.cloudera.org:8080/804
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins