QueryExecState::FetchRowsInternal() doesn't check the query state after evaluating the
select statement expressions with GetRowValue(). These means that, e.g., UDFs that call
SetError() in the select list will not fail the query.
Change-Id: I120d7abbee2a3ed5c5c66ec0a3a9b6e9a6ab10bf
Reviewed-on: http://gerrit.cloudera.org:8080/815
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
There was a bug with projecting collection-typed slots in the UnnestNode by setting
them to NULL. The problem was that the same tuple/slot could be referenced by multiple
input rows. As a result, all unnests after the first that operate on the same collection
value would incorrectly return an empty row batch because the slot had been set to NULL
by the first unnesting.
The fix is to ignore the null bit when retrieving a collection-typed slot's value in
the UnnestNode. We still set the null bit after retrieving the value for projection.
This solution purposely ignores the conventional NULL semantics of slots. It is a
temporary hack which must be removed eventually.
We rely on the producer of collection-typed slot values (scan node) to write an empty
array value into such slots when the they are NULL in addition to setting the null bit.
Change-Id: Ie6dc671b3d031f1dfe4d95090b1b6987c2c974da
Reviewed-on: http://gerrit.cloudera.org:8080/859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Fixes:
1. Change the planner to not invert null-aware anti join because there is
only a left version. Also, always use a hash join because the
nested-loop join does not support that join mode.
2. Fix PartitionedJoinNode::Reset() and related calls to make the join
usable in subplans with the left null-aware anti join mode.
Change-Id: I8da50747f6a0412c5858fd32b9498f58ed779712
Reviewed-on: http://gerrit.cloudera.org:8080/847
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
The NLJ node did not follow the expected protocol when need_to_return
is set on a row batch, which means that memory referenced by a rowbatch
can be freed or reused the next time GetNext() is called on the child.
This patch changes the NLJ node to follow the protocol by deep copying
all build side row batches when the need_to_return_ flag is set on the
row batches. This prevents the row batches from referencing memory that
may be freed or reused.
Reenable test that was disabled because of IMPALA-2332 since this was
the root cause.
Change-Id: Idcbb8df12c292b9e2b243e1cef5bdfc1366898d1
Reviewed-on: http://gerrit.cloudera.org:8080/810
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The bug:
There was a MemPool in AnalyticEvalNode with a dual purpose:
(1) Allocate temporary tuples.
(2) Back the FunctionContexts of the aggregate function evaluators.
FunctionContexts use FreePools to do their own memory management using a
pointer-based structure that is stored in the memory blocks themselves.
When calling AnalyticEvalNode::Reset() we reset that mem pool backing
that pointer-based structure. Those pointers were then clobbered by
subsequent allocations (and writes) for temporary tuples, ultimately
resulting in the FreePool incorrectly reporting a double free
while doing a Finalize() of an aggregate function.
The fix:
While there are several other ways to address this issue, I chose to
use a different MemPool for the FunctionContexts because that seemed
to be the most sane and minimally invasive fix. That MemPool is not
reset during AnalyticEvalNode::Reset() because the memory is
ultimately managed by the FreePools of the FunctionContexts.
Change-Id: I42fd60785d3c6dec93436cd9ca64de58d1b15c7e
Reviewed-on: http://gerrit.cloudera.org:8080/857
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug was a simple oversight where copied the array data, but forgot
to update the pointer of the corresponding ArrayValue.
Change-Id: Ib6ec0380f66194efc7ea3eb989535652eb8b526f
Reviewed-on: http://gerrit.cloudera.org:8080/855
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The dedup logic in row batch serialisation incorrectly assumed that two
distinct tuples must have two distinct memory addresses. This is not
true if one tuple has zero length.
Update the serialisation logic to check for this case and insert a
NULL.
Adds a unit test that exercises this bug prior to the fix and a query
test that also hit a DCHECK prior to the fix.
Change-Id: If163274b3a6c10f8ac6b6bc90eee9ec95830b7dd
Reviewed-on: http://gerrit.cloudera.org:8080/849
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
This DCHECK condition was overly strict - a non-nullable tuple pointer
can be NULL if the tuple is zero bytes long (since there is no memory
backing the tuple).
Adds a test query that hit the DCHECK.
Change-Id: I16e8bc0db747b83c239de931a4fc9677d5c85ae6
Reviewed-on: http://gerrit.cloudera.org:8080/836
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Tuples are set to NULL for representing non-matches of outer joins.
During planning, the FE identifies at which nodes in the plan which
tuples can be NULL or not. In the BE, codegen uses the tuple nullability
information to remove the runtime NULL checking of tuples.
The bug here was that the tuple nullability information was not preserved
through SubplanNodes, so subsequent nodes could get a SEGV in codegen'd
parts of the execution when dereferencing a NULL tuple because the NULL
check was incorrectly optimized out.
Change-Id: I4356537c0a7153ec1247cc74b6b7952ed9e3d884
Reviewed-on: http://gerrit.cloudera.org:8080/827
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch adds basic end-to-end functional tests for nested types:
1. For exercising the Reset() of exec nodes when inside a subplan.
2. For asserting correct behavior when row batches with collection-typed
slots flow through exec nodes.
Most cases are covered, but there are a few known issues that prevent
full coverage. The remaining tests will be added as part of the fixes
for those existing JIRAs.
Change-Id: I0140c1a32cb5edd189f283c68a24de8484b3f434
Reviewed-on: http://gerrit.cloudera.org:8080/823
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The hash join and tuple stream code was not handling correctly the
case of joins whose right side had very high cardinality but where
tuple had zero footprint. Any such join with more than 16M tuples
on the right side would crash. In particular, if the tuple footprint
is zero, an infinite number of rows fit in one block. But according to
the old way we were iterating over the rows of the stream, we would
increment by 1 the idx to get the next "row" eventually overflowing
and hitting dcheck.
Another, second, problem was the calculation of the size of the hash
table in such where the footprint of tuples is zero. In such case,
a hash table of minimum size would suffice. Instead we would try to
create a very large hash table to fit the large number of tuples,
resulting to OOM errors.
This patch fixes the two problems by having specific calculation of
the next idx in the stream as well as the size of the hash table in
case the stream contains tuples with zero footprint.
Change-Id: I12469b9c63581fcbc78c87200de7797eac3428c9
Reviewed-on: http://gerrit.cloudera.org:8080/811
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
This patch makes the ownership of the memory backing the tuple pointers of
a RowBatch dependent on whether the legacy joins and aggs are enabled:
By default, the memory is malloc'd and owned by the RowBatch:
If enable_partitioned_hash_join=true and enable_partitioned_aggregation=true
then the memory is owned by the RowBatch and is freed upon its destruction.
This mode is more performant especially with SubplanNodes in the ExecNode tree
because the tuple pointers are not transferred and do not have to be re-created
in every Reset().
Memory is allocated from MemPool:
Otherwise, the memory is allocated from the RowBatch's tuple pool. As a result,
the pointer memory is transferred just like tuple data, and must be re-created
in Reset(). This mode is required for the legacy join and agg which rely on the
tuple pointers being allocated from the RowBatch's tuple pool, so they can
acquire ownership of the tuple pointers.
Performance impact for nested types:
Initial cluster runs and profiling on nested TPCH identified excessive
malloc/frees as a major performance bottleneck. This change paves the way
for further optimizations which yielded a 2x improvement in response time
for most nested TPCH queries.
Change-Id: I4ac58b18058ce46b4db89fbe117b0bcad19e9ee7
Reviewed-on: http://gerrit.cloudera.org:8080/807
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This caused failures on the non-partitioned agg/join tests and the ASAN
test. Remove the query that failed to unbreak the build. This query can
be readded when IMPALA-2207 is fixed (for the ASAN) test and the cause of
the other failure is diagnosed and fixed.
Change-Id: Idb1ca951e0de05aee3c1237392fff74ddd756ed7
In some cases in the NLJ node eos_ wasn't set even though the limit was
reached. This prevented the limit from being handled correctly before
returning rows to the caller of GetNext(). This could result in either
too many rows being returned, or a crash when the row batch size was
set to an invalid negative number.
The fix is to always check for whether the limit was reached before
returning from GetNext().
Change-Id: I660e774787870213ada9f2d3e6f10953d9937022
Reviewed-on: http://gerrit.cloudera.org:8080/797
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The first attempt in b2b9a10dda942c7e4f2af01be28e819f71de146f was wrong,
but the good news is that the validation checks caught the problem.
Problem with original approach:
In the SubplanNode we used to set the collection-typed slots of the
current row to NULL after the subplan invocation for the current row
was completed. The problem is that we may have already returned rows
from the SubplanNode which still have the non-NULL slots, and some exec
node consumers may have copied the data.
Fixed approach:
We now set a collection-slot to NULL in the UnnestNode that flattens it
immediately after evaluating the corresponding SlotRef, before returning
any rows from the UnnestNode. Setting the slot to NULL as early as possible
ensures that all rows returned by the containing SubplanNode will have the
slot set to NULL.
Change-Id: Ie942d9b2c835589ed9d41c68a831795bbbff2895
Reviewed-on: http://gerrit.cloudera.org:8080/803
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
Collection-typed slots are expensive to copy, e.g., during data
exchanges or when writing into a buffered-tuple-stream. Even worse,
such slots could be duplicated many times after unnesting in a
subplan. To alleviate this problem, this patch implements a
poor man's projection where collection-typed slots are set to NULL
inside the SubplanNode that flattens them.
The FE guarantees that the contents of an array-typed slot are never
referenced outside of the single UnnestNode that access them, so when
returning eos in UnnestNode::GetNext() we also set the unnested array
slot to NULL to avoid those expensive copies in downstream exec nodes.
The FE provides that guarantee by creating a new slot in the parent
scan for every relative CollectionTableRef. For example, for a table
't' with a collection-typed column 'c' the following query would have
two separate slots in the tuple of 't', one for 'c1' and one for 'c2':
select * from t, t.c c1, t.c c2
Change-Id: I90e5b86463019c9ed810c299945c831c744ff563
Reviewed-on: http://gerrit.cloudera.org:8080/763
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
We will not provide full nested types support if any of these options
are set:
--enable_partitioned_aggregation=false
--enable_partitioned_hash_join=false
Change-Id: I0f8607914faf9691d5f7b1a4327609fefba22e56
Reviewed-on: http://gerrit.cloudera.org:8080/792
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
By not using THREAD_LOCAL for its state, btrim() invocations in
multi-threaded contexts (i.e. pushed to the scanner) would have threads
trampling over each other's bitset used to check for trimmed characters.
Testing:
See new test in expr.test:
select count(*) from functional.alltpyes where btrim(string_col, string_col) != ""
.. should give 0 results, but would give > 0 with this bug.
Change-Id: I595e25b1d4fb7c76b846fce837b4ec140f47d43c
Reviewed-on: http://gerrit.cloudera.org:8080/748
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
Implement Tuple-to-Tuple DeepCopy for collections. Add query test
that uses the TOP-N node, which deep copies tuples in this way.
Confirmed that the query test failed before this fix.
Change-Id: I3fea860d8251038d7b5eb85c77973939abe9dbf8
Reviewed-on: http://gerrit.cloudera.org:8080/757
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Invalid test file format caused tpch tests to fail.
Change-Id: Ibf523d071bb14db72689e39645fd1724897543c7
Reviewed-on: http://gerrit.cloudera.org:8080/766
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
HDFS acknowledges writes when the first replica is written.
As a result, the estimated memory requirements for an Impala
query may vary depending on how many replicas existed at the
time of table loading. This racey behavior caused a few tests
to sometimes fail due to different actual and expected memory
requirements.
The fix is to exclude the explain header from the expected results.
Change-Id: Ifb13de937a104a48960d35745df521de66596837
Reviewed-on: http://gerrit.cloudera.org:8080/762
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
is exhausted
This commit fixes an issue where BlockingJoinNode will incorrectly set
eos_ flag to true when the probe side is exhausted without considering
the join mode that is executed. This would cause the NestedLoopJoinNode to
sometimes return wrong results when a right-outer, right-anti or
full-outer join mode is used. This issue appeared in nested TPC-H Q22.
Change-Id: I01f2118d4db3d8739201d5c3f475f5b7e328555a
Reviewed-on: http://gerrit.cloudera.org:8080/753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
After this patch we get correct results for nested TPCH Q13.
The bug: Since we were not properly handling AtCapacity() of the output
batch in SubplanNode, we sometimes passed a row batch that was already
at capacity into GetNext() on the second child of the SubplanNode.
In this particular case, that batch was passed into the NestedLoopJoinNode
which may return incomplete results if the output batch is already
at capacity (e.g., ProcessUnmatchedBuildRows() was not called).
The fix is to return from SuplanNode::GetNext() if the output batch
is at capacity due to resources being tranferred to it from the input
batch used to fetch from the first child.
Change-Id: Ib97821e8457867dc0d00fd37149a3f0a75872297
Reviewed-on: http://gerrit.cloudera.org:8080/742
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Example:
WITH t(c1, c2) AS (SELECT int_col, bool_col FROM functional.alltypes)
SELECT * FROM t
This will create a local view with the 'int_col' and 'bool_col' columns labeled as 'c1'
and 'c2'. If the number of labels is less than the number of columns, then the remaining
columns in the local view will be labeled as the corresponding columns in the query
statement. Therefore, this is also a valid query (only 'int_col' will be labeled as
'c1'):
WITH t(c1) AS (SELECT int_col, bool_col FROM functional.alltypes)
SELECT * FROM t
Change-Id: Ie3a559ca9eaf95c6980c5695a49f02010c42899b
Reviewed-on: http://gerrit.cloudera.org:8080/717
Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com>
Tested-by: Internal Jenkins
This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.
This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).
I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.
Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
grant_revoke_no_insert.test
This commit updates the test file of grant/revoke statements running
against S3 to include column-level privileges.
Change-Id: Ia21595740fd37c88040d9a692444c6009591a188
Reviewed-on: http://gerrit.cloudera.org:8080/735
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Use select query instead of insert query to verify constant expression
on partition column.
Change-Id: I442111225e8df29bcc5fe89500d023559bb1c1fb
Reviewed-on: http://gerrit.cloudera.org:8080/707
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This commit adds partial support for column-level authorization in
Impala using the Sentry Service. The following changes are included:
* Added support for parsing and analyzing GRANT/REVOKE statements with column-level
privileges. The supporting syntax is:
- GRANT SELECT (<col_names>) ON TABLE <table_name>
TO [ROLE] <role_name> [WITH GRANT OPTION]
- REVOKE [GRANT OPTION FROM] SELECT (<col_names>) ON
TABLE <table_name> FROM [ROLE] <role_name>
* Added support for storing column-level privileges in the Catalog Service and updating
the Sentry Service when GRANT/REVOKE statements are executed.
* Modified the SHOW GRANT ROLE statement to include information about
column-level privileges.
Subsequent patches will add support for enforcing column-level
privileges in SQL queries and other statements.
Change-Id: I0fd9daa92cc5147cb6f4b25eb9651aab8bf3049f
Reviewed-on: http://gerrit.cloudera.org:8080/607
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
When the `numRows` parameter stored in the table properties is
errornously set to 0 and a number of non-empty files are present
the table statistics are considered to be corrupt.
To hint that there might be a problem, the explain statement will emit
an additional warning if it detects potentially corrupt table stats like
in the following example:
Estimated Per-Host Requirements: Memory=42.00MB VCores=1
WARNING: The following tables have potentially corrupt table and/or
column statistics.
compute_stats_db.corrupted
03:AGGREGATE [FINALIZE]
| output: count:merge(*)
|
02:EXCHANGE [UNPARTITIONED]
|
01:AGGREGATE
| output: count(*)
|
00:SCAN HDFS [compute_stats_db.corrupted]
partitions=1/2 files=1 size=24B
In addition, the small query optimization is disabled for such queries.
Change-Id: I0fa911f5132aa62195b854248663a94dcd8b14de
Reviewed-on: http://gerrit.cloudera.org:8080/689
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
The test file testdata/workloads/functional-query/queries/QueryTest/exprs.test had INSERT
statements in it, which are not supported on S3. This commit gets rid of those statements
and rewrites them with SELECT [...] FROM VALUES(...) so that the tests are compatible on
S3.
Change-Id: I25faacf9fae3780f627afee86dc8c1ede7f6e2a2
Reviewed-on: http://gerrit.cloudera.org:8080/670
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
The value of PARQUET_FILE_SIZE overflows when RoundUp() is called because this function
returns an int32. Even with this change, this value will still overflow when calling the
HDFS API since it is passed to hdfsOpenFile() as blocksize, which is an int32 parameter
(see HDFS-8949).
Changes:
- Return an error if PARQUET_FILE_SIZE is set to a value greater than or equal to 2GB.
- If PARQUET_FILE_SIZE is set in an Impala session to a value greater than or equal to
2GB, then every query will fail with an error message.
- If PARQUET_FILE_SIZE is changed to a value greater than or equal to 2GB as an impalad
argument, impalad will not start and log an error.
- Ceil(), RoundUp(), RoundDown() return int64.
Change-Id: Ie4f2551b72954e2a57db5594e4789e3f7434d578
Reviewed-on: http://gerrit.cloudera.org:8080/678
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com>
Tested-by: Internal Jenkins
Addressed JIRAs: IMPALA-1947 and IMPALA-1813
New Feature:
Adds support for creating an Avro table without an explicit
Avro schema with the following syntax.
CREATE TABLE <table_name> column_defs STORED AS AVRO
Fixes and Improvements:
This patch fixes and unifies the logic for reconciling differences between
an Avro table's Avro Schema and its column definitions. This reconciliation
logic is executed during Impala's CREATE TABLE and when loading a table's
metadata. Impala generally performs the schema reconciliation during table
creation, but Hive does not. In many cases, Hive's CREATE TABLE stores the
original column definitions in the HMS (in the StorageDescriptor) instead
of the reconciled column definitions.
The reconciliation logic considers the field/column names and follows this
conflict resolution policy which is similar to Hive's:
Mismatched number of columns -> Prefer Avro columns.
Mismatched name/type -> Prefer Avro column, except:
A CHAR/VARCHAR column definition maps to an Avro STRING, and is preserved
as a CHAR/VARCHAR in the reconciled schema.
Behavior for TIMESTAMP:
A TIMESTAMP column definition maps to an Avro STRING and is presented as a STRING
in the reconciled schema, because Avro has no binary TIMESTAMP representation.
As a result, no Avro table may have a TIMESTAMP column (existing behavior).
Change-Id: I8457354568b6049b2dd2794b65fadc06e619d648
Reviewed-on: http://gerrit.cloudera.org:8080/550
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The bug: When enforcing slot equivalences at an aggregation node, we used to
incorrectly assume that equivalences among grouping slots must have already been
enforced below the aggregation (e.g., in a scan). This assumption is correct if the
grouping slots are produced by simple SlotRef grouping exprs, because then there is
certainly a value transfer between the grouping slot and another slot below the
aggregation. However, for grouping slots with complex grouping exprs this assumption
is not correct, and as a result, we would incorrectly remove eq predicates bound by
gropuing slots with complex grouping exprs because we assumed they were redundant.
Ths fix is to enforce slot equivalences among grouping slots with complex grouping
exprs as usual, and not assume that they have already been enforced below the agg.
Change-Id: Idcd44acccb9326a35c9121025dc88c2c70c7c7c7
Reviewed-on: http://gerrit.cloudera.org:8080/656
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The current DESCRIBE prints the column type as a single string without
whitespace. As a result, the DESCRIBE output for tables with complex types
is basically unreadable/unusable, e.g., from the Impala shell.
This patch adds a prettyPrint() function to the FE Type and uses that
for generating a nicely formatted DESCRIBE output.
The output of DESCRIBE FORMATTED is intentionally not modified because
exact Hive-compatibility has been and presumably continues to be very
important to our users.
Change-Id: Ida810facdffd970948b837b83a60f9ddcd95f44d
Reviewed-on: http://gerrit.cloudera.org:8080/633
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Add support for creating a table based on a parquet file which contains arrays,
structs and/or maps.
Change-Id: I56259d53a3d9b82f318228e864c783b48a03f9ae
Reviewed-on: http://gerrit.cloudera.org:8080/582
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.
Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.
Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Previously the frontend rejected UDAs with different intermediate and
result type. The backend supports these, so this change enables support
in the frontend and adds tests.
This patch adds a test UDA function with different intermediate type and
a simple end-to-end test that exercises it. It modifies an existing
unused test UDA that used a currently unsupported intermediate type -
BufferVal.
Change-Id: I5675ec7f275ea698c24ea8e92de7f469a950df83
Reviewed-on: http://gerrit.cloudera.org:8080/655
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both.
The REGEXP_LIKE has an optional third parameter which if used, uses a different
'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate
options can be set in the RE2 library.
Added a patch for the RE2 library so that the 'dot matches all' option is exposed
via the RE2 class.
Fixed a bug in the case when the function to be evaluated for the WHERE clause
operates on constants, proper cleanup isn't guaranteed on certain edge cases.
Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b
Reviewed-on: http://gerrit.cloudera.org:8080/501
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
This patch fixes an issue where incorrect results are produced by a CTAS or IAS
that is fed from a QueryStmt that has outer-joined inline views with constants or
conditionals in the select list. The regression was introduced in this commit:
b8f642710ea9d311a7aca32611eaa7cac6cd86df
Now that the final expression substitution with TupleIsNullPredicate() wrapping
is performed in planning, the InsertStmt's result expressions should be taken
from the feeding QueryStmt's result expressions, and not the QueryStmt's
(already substituted) base table result expressions.
Change-Id: Iae29683638df01f140d0f74976cca8ca9ba0852d
Reviewed-on: http://gerrit.cloudera.org:8080/637
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
It turns out there is a variety of cases where boost incorrectly adds
intervals if the interval is at (or beyond) an edge case value. This
change defines a max interval and returns NULL if the user supplies
an interval beyond the max.
Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080
Reviewed-on: http://gerrit.cloudera.org:8080/492
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch resolves an issue where row count is not set to 0 when a partition spec is
used with 'compute incremental stats' on a partition that contains no data. The fix is
to populate the partition 'expected list' in the frontend with the partition spec, the
backend keeps track of which partitions had statistics generated. In the scenario where
no statistics are generated for a partition, the backend will fall back to the
'expected list' to zero out the statistics.
Change-Id: If4aac131dbe44e14a0477afa58e980da9e235d6b
Reviewed-on: http://gerrit.cloudera.org:8080/627
Reviewed-by: Christopher Channing <cchanning@cloudera.com>
Tested-by: Internal Jenkins