Commit Graph

523 Commits

Author SHA1 Message Date
Tim Armstrong
e151ebaa71 IMPALA-1001: Bit and byte manipulation functions
Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot,
countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright.
Interfaces and behavior follow Teradata documentation.

All bit* functions are compatible with DB2.  bitand only is compatible with Oracle.

Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3
2015-07-28 08:11:01 -07:00
Sailesh Mukil
8a01527bad IMPALA-2141: UnionNode::GetNext() doesn't check for query errors
When a UDF with constant parameters in the select list calls SetError(), it does not fail
the query. This is because UnionNode::GetNext() does not check for errors after
UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not
report the error.

Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a
2015-07-27 22:09:19 -07:00
Alex Behm
3ac341287c IMPALA-2088: Fix planning of empty union operands with analytics.
The check for ignoring empty union operands was simply misplaced.
This misplacement resulted in empty union operands not being
dropped if the containing UnionStmt had analytic functions.

Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3
Reviewed-on: http://gerrit.cloudera.org:8080/554
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-27 15:46:41 -07:00
Tim Armstrong
822cb8f5e2 IMPALA-1660: Netezza compatibility - factorial
Implements suffix n! operator for factorial and factorial function.

Slightly refactor operators in fe to share code between unary operators.

Based partially on work by Arthur Peng <arthur.peng@intel.com>.

Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b
Reviewed-on: http://gerrit.cloudera.org:8080/531
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2015-07-27 19:03:48 +00:00
Sailesh Mukil
c21c080a46 IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception.
Changed the way the function context error message is returned. Also, changed the
exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException
in case of an exception in registerConjuncts().

This commit follows from:

d497ba6cef

This is a new commit since the previous one was closed before making these changes.

Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18
Reviewed-on: http://gerrit.cloudera.org:8080/560
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-24 19:04:38 +00:00
Tim Armstrong
5990b43fe2 IMPALA-1898: Explicit aliases + ordinals analysis bug
Analysis errors occurred with select queries that combined ordinals
in the group by/order by clauses with select list aliases that
had the same name as a column in one of the underlying tables.

The root cause was a double substitution: e.g. the ordinal 1 in
a GROUP BY clause was replaced with the corresponding select list expression,
then a reference to column 'x' in an underlying table was replaced erroneously
with the select list expression with alias 'x'

Change-Id: I0f298290c58f18239e1ff83f0388d037c311f5fb
Reviewed-on: http://gerrit.cloudera.org:8080/542
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-07-22 21:23:36 +00:00
Sailesh Mukil
6d7bb76e87 IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is
not done before throwing exception.

When a builtin has an error (in the constant case), it is checked for but the state
cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the
constant case), the error does not propagate back up the stack due to a lack of error
checking in ScalarFnCall::Open() after it calls GetConstVal().

Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1
Reviewed-on: http://gerrit.cloudera.org:8080/537
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-07-21 04:01:39 +00:00
Casey Ching
a6d534682b IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic
Boost handles a couple of edge cases differently than other databases
such as Postgres and MySQL when adding year/month intervals to
timestamps. This change makes Impala consistent for the other databases.
The performance difference was not noticeable (<5% if any).

Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06
Reviewed-on: http://gerrit.cloudera.org:8080/489
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-07-20 10:16:54 +00:00
Shant Hovsepian
6d87fe090c Improve Hll estimate for small cardinalities.
Based on Google's HyperLogLog++ paper. Uses a bias correcting
interpolation as a sub algorithm for Hll estimates within a specific
range.

Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9
Reviewed-on: http://gerrit.cloudera.org:8080/415
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2015-07-16 19:38:17 +00:00
Ippokratis Pandis
7e9f8478e1 Removing duplicate query test
Change-Id: Ia8b33ca2a2eadae288acea4bd2111a1a974bc484
Reviewed-on: http://gerrit.cloudera.org:8080/526
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-15 03:28:36 +00:00
Ippokratis Pandis
e99c68fe52 IMPALA-2130: Wrong verification of Parquet file version
This patch corrects a mistake in the Parquet magic file number verification
and adds a test about it. Note that with this patch Impala may fail to read
Parquet files with wrong magic number that it used to read before.

Change-Id: Iff31accda1e1d541946ef1f750e38886ce4cb8d5
Reviewed-on: http://gerrit.cloudera.org:8080/515
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-14 02:52:02 +00:00
Martin Grund
51aa077448 IMPALA-2133: Properly unescape string value for HBase filters
This patch fixes the problem, that the Frontend would simply pass the
escaped value to the backend as an HBase filter and not the unescaped
one. Now queries including an escaped character will work as well.

Change-Id: I96e544973b523f3ef1abdec86ea1ec5596d9bee9
Reviewed-on: http://gerrit.cloudera.org:8080/520
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-07-13 18:38:39 +00:00
Ippokratis Pandis
4951f895e7 Nested Types: Reset() for partitioned hash join node
TODO: Need to modify Reset()'s functionality in case of NAAJs.

Change-Id: I7d0ea0dabd0b3404957e228bbaa51781c5fc34c0
Reviewed-on: http://gerrit.cloudera.org:8080/490
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-08 01:51:09 +00:00
Alex Behm
a274cfd787 Nested Types: Fix self-joining of collection table refs.
When referencing the same path in multiple CollectionTableRefs
(e.g., self-join on a nested collection), we used to register only a
single SlotDescriptor in the root tuple descriptor and share it among
those multiple CollectionTableRefs.
A collection-typed SlotDescriptor has a single item tuple descriptor,
set to the tuple descriptor of the corresponding CollectionTableRef.
Therefore, sharing a single collection-typed SlotDescriptor among
multiple CollectionTableRefs with the same path does not work
(the item tuple desc was arbitrarily set to the last CollectionTableRef's
tuple desc).

In order to maintain our assumed 1:1 relationship between a table ref
and a tuple descriptor, the siple fix for now is to give each
CollectionTableRef a new slot in the root tuple descriptor,
regardless of its path.

We could conceivably allow more intelligent sharing of tuple descriptors
for nested collections, but that change is too invasive for now.

Change-Id: I2135d026191f51d1daa741455a7e1b0f6905af1e
Reviewed-on: http://gerrit.cloudera.org:8080/495
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-07-01 06:56:28 +00:00
Ippokratis Pandis
f2c483802f Nested Types: Reset() for partitioned aggregation node
Change-Id: Ia5b4b9b3a7b8e9acb1b614c979cccca615fe2fbe
Reviewed-on: http://gerrit.cloudera.org:8080/480
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-07-01 00:43:55 +00:00
Ippokratis Pandis
c3a7916812 IMPALA-2065: Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()
If the build side of any partition of PHJ was very large we could end up
trying to Init() hash tables that are larger than 1GB. The result was
overflows (see IMPALA-1619) and eventually DCHECKS.
This patch returns false whenever we try to allocate memory in the
BufferedBlockMgr that it is larger than 1GB.

Change-Id: Id4590ea434bef4dca7dc3f137cfe7b638ae3d916
Reviewed-on: http://gerrit.cloudera.org:8080/465
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-06-27 01:17:50 +00:00
Alex Behm
569e86a60b Nested Types: Change ExecNode::Reset() to only clear state and not tuple data.
This patch changes the ExecNode::Reset() to:

Status ExecNode::Reset(RuntimeState* state);

The new Reset() should only clear the internal state of an exec node
in preparation for another Open()/GetNext()*. Reset() should not clear
memory backing rows returned by a node in GetNext() because those rows
could still be in flight.

Subplan Memory Management:
To ensure that the memory backing rows produced by the subplan tree of
a SubplanExecNode remains valid for the lifetime a row batch, we intend
to use our conventional transfer mechanism. That is, the ownership of
memory that is no longer used by an exec node is transferred to an output
row batch in GetNext() at a "convenient" point, typically at eos or when
the memory usage exceeds some threshold. Note that exec nodes may choose
not to transfer memory at eos to amortize the cost of memory allocation
over multiple Reset()/Open()/GetNext()* cycles.

To show the main ideas, this patch fixes transferring of tuple data ownership
in several places and implements Reset() for the following nodes:
- AnalyticEvalNode
- BlockingJoinNode
- CrossJoinNode
- SelectNode
- SortNode
- TopNNode
- UnionNode

To make the transfer of ownership work for SortNode a row batch can now also
own a list of BufferdBlockMgr::Block*.

Also included are basic query tests that are not meant to be exhaustive.
The tests are disabled for now because we cannot run them without several
other code changes. I have manually run the test queries on a branch
that has all necessary changes.

Change-Id: I3ac94b8dd7c7eb48f2e639ea297b447fbf443185
Reviewed-on: http://gerrit.cloudera.org:8080/454
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-06-23 07:43:22 +00:00
Dimitris Tsirogiannis
2c1f0a4942 IMPALA-1987: Fix TupleIsNullPredicate to return false if no tuples are
nullable.

This commit fixes the issue where an outer join returns wrong results if
the equi-join predicate contains a TupleIssNullPredicate expr.

Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4
Reviewed-on: http://gerrit.cloudera.org:8080/445
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-06-11 03:37:18 +00:00
ishaan
f327a53c70 Fix metadata/test_load.py to work with Isilon.
test_load was using /tmp as the staging directory, which did not cleaned up in Isilon,
leading to a build failure. This patch does the following:
  - use /test-warehouse as the staging directory.
  - replace calls to the hdfs commandline with calls to the in-house hdfs client.
  - cleanup the test file and remove duplicates.

Additionally, a new method is introduced in the hdfs client to simulate hdfs dfs -cp, i.e,
it does a get and a put to mimic the hdfs command line's semantics.

Change-Id: I0cc27ab00df5f5ec3138b995144ab45ad622605d
Reviewed-on: http://gerrit.cloudera.org:8080/431
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-06-05 00:52:14 +00:00
ishaan
dbc78aaa2c Enable isilon end to end tests for Impala.
This patch introduces changes to run tests against Isilon, combined with minor cleanup of
the test and client code.
For Isilon, it:
  - Populates the SkipIfIsilon class with appropriate pytest markers.
  - Introduces a new default for the hdfs client in order to connect to Isilon.
  - Cleans up a few test files take the underlying filesystem into account.
  - Cleans up the interface for metadata/test_insert_behaviour, query_test/test_ddl

On the client side, we introduce a wrapper around a few pywebhdfs's methods, specifically:
  - delete_file_dir does not throw an error if the file does not exist.
  - get_file_dir_status automatically strips the leading '/'

Change-Id: Ic630886e253e43b2daaf5adc8dedc0a271b0391f
Reviewed-on: http://gerrit.cloudera.org:8080/370
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-05-27 22:25:12 +00:00
Shant Hovsepian
69079411bf Improve distinctpc/sa for small cardinalities.
Improving the cardinality estimate for Flajolet and Martin's algorithm
used in distinctpc and distinctpcsa. The estimate for small cardinalities
is improved by providing a correction hinted to in the original paper.

We use the correction constant 1.75 proposed by Scheuermann et al
DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting
Sketches for Networking Applications]

Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f
Reviewed-on: http://gerrit.cloudera.org:8080/395
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-05-24 06:26:47 +00:00
Alex Behm
b3bb0ea525 Fix S3 build v2: Adjust expected SHOW TABLE STATS output.
Change-Id: Idc1f255a7d170e6083439220140c5eb895133b22
Reviewed-on: http://gerrit.cloudera.org:8080/382
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-05-16 02:47:01 +00:00
Casey Ching
ac0c075997 Parquet: Fix value def level when max def level is 0
When running with a release build, NULL would be returned when
reading values from required fields in parquet files (with a debug
build a DCHECK would be hit).

Previously when the max definition level for a field was 0 (which
happens if a field is required), the definition level for value was
incorrectly set to 1. The max definition level is related to nested
data and is defined to be the number of nullable fields that will be
encountered when traversing a path to reach the desired end field.
For example, if a nested schema has a path a.b.c.d where b and d are
nullable then the max def level is 2. A def level is attached to each
value to indicate the number of optional values that are present (in
the previous example an def level of 2 means both b and d are not
null). So having a def level for a value that is greater than the max
def level for a field should never happen.

Change-Id: Ia91a97cf79e672c420d10416c6817f0930dcc920
(cherry picked from commit cdd67e4c7fd62d5b08adfaa303d7bb2382e6932c)
Reviewed-on: http://gerrit.cloudera.org:8080/386
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-05-15 06:41:02 +00:00
Skye Wanderman-Milne
7801aa499f Use codegen to inject runtime constants in exprs
This patch introduces the function GetConstant(), which is used by
expr compute function and UDFs to access query constants. There is a
corresponding GetIrConstant() function that returns the IR versions of
the same constants. Currently the only implemented constants are the
expr's return type and argument types, but other constants can be
easily be added to these functions. Interpreted expr functions run
normally, but cross-compiled functions can be passed to
InlineConstants(), which looks for calls to GetConstant() and replaces
them with the result of calling GetIrConstant().

I used this technique in the decimal functions that previously were
not switching on the type at all. The performance of LeastGreatest()
after this patch is the same as it was before it switched on the type.

Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41
Reviewed-on: http://gerrit.cloudera.org:8080/352
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-05-15 02:24:04 +00:00
Alex Behm
5f54b2c4d3 Fix S3 build: Adjust expected SHOW TABLE STATS output.
Change-Id: I3fb0c551dfbe53aecd9c0bced3bc29d5a5fa41e5
Reviewed-on: http://gerrit.cloudera.org:8080/375
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-05-14 07:55:36 +00:00
zuowang
304d985523 IMPALA-1139: Implement TRUNCATE TABLE statement
Synopsis:  TRUNCATE [TABLE] [database.]table

TRUNCATE quickly removes all rows from a set of tables.
TRUNCATE also drops all table and column stats, but preserves
HMS partitions and HDFS directories.
You must have the INSERT privilege on a table to truncate it.
It requires taking the metastoreDdlLock before truncate tables.

Examples:
TRUNCATE TABLE t1;
TRUNCATE t1;

Change-Id: I546e4ee0279083f437cdf0e7487faad47957dbf6
Reviewed-on: http://gerrit.cloudera.org:8080/241
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-05-14 07:50:34 +00:00
Juan Yu
d1c263402e IMPALA-1973: Fixing crash when uninitialized, empty row is added in HdfsTextScanner
This patch fixes an issue when an uninitialized, empty row is falsely
added to the rowbatch. The uninitialized data inside this row leads
later on to a crash when the null byte is checked together with the
offsets (that contains garbage).

The fix is to not only check for the number of materialized columns, but
as well for the number of materialized partition key columns. Only if both are
empty and the parser has an unfinished tuple, add the empty row.

To accommodate for the last row, check in FinishScanRange() if there is an
unfinished tuple with materialized slots or materialized partition key. Write
the fields if necessary.

Change-Id: I2808cc228e62d048d917d3a6352d869d117597ab
(cherry picked from commit c1795a8b40d10fbb32d9051a0e7de5ebffc8a6bd)
Reviewed-on: http://gerrit.cloudera.org:8080/364
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-05-05 00:19:12 +00:00
Ippokratis Pandis
4d428440d8 IMPALA-1919: Avoid calling ProcessBatch with out_batch->AtCapacity in right joins
PHJ::GetNext() of RIGHT_OUTER, RIGHT_ANTI and FULL_OUTER joins that
had repartitioned were not checking whether the output batch reached
capacity at the OutputUnmatchedBuild() call. In case of repartitioned
joins where the list of build_partitions was exhausted and the output
batch has already reached capacity, we would call ProcessProbeBatch()
with a full output batch, resulting a DCHECK. This patch adds the
missing AtCapacity() check.

It also adds a new join test (tpch-out-joins) that uses the TPC-H
dataset and moves there some of the join tests that were using it.
Running join tests with the larger TPC-H dataset is needed, for
example, in order to trigger repartitions.

Change-Id: I4434ad0683e1b09f75a25b3eb870a817d4988370
Reviewed-on: http://gerrit.cloudera.org:8080/314
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-05-04 19:49:56 +00:00
Henry Robinson
f98a7bee46 IMPALA-1595: Fix exhaustive test failure
Change-Id: I49db59c936c105295b7159bb1c06558b127d4c4a
Reviewed-on: http://gerrit.cloudera.org:8080/353
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-04-23 19:46:31 +00:00
Dimitris Tsirogiannis
dd5ecb9deb IMPALA-1960: Illegal reference to non-materialized tuple when query has
an empty select-project-join block

This commit fixes an issue where an aggregation expr may reference a
non-materialized slot if the query contains an empty select-project-join
block. This fix ensures that all the exprs in an aggregation reference
materialized slots/tuples.

Change-Id: Ic2cc9818061b3f06ab1d1cebf4e604352c2df6d1
Reviewed-on: http://gerrit.cloudera.org:8080/348
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-21 23:29:14 +00:00
Henry Robinson
f22b8659fd IMPALA-1595: Add 'location' to SHOW [TABLE STATS|PARTITIONS] for HDFS tables
This patch adds a 'location' column to the output of SHOW TABLE STATS /
SHOW PARTITIONS. This helps users understand the effects of ALTER TABLE
SET LOCATION commands, particularly for partitions, and is easier to
identify than the output of DESCRIBE FORMATTED.

Some existing tests in alter-table.test have been updated to include
checking the location output before and after a SET LOCATION
command. The tests in show.test have also been updated to check for the
location; all other tests that use SHOW [TABLE STATS|PARTITIONS] use a
generic regex to avoid overly verbose tests.

Change-Id: I9d276f7b133c38c9319e0906397ca1c31cec95bb
Reviewed-on: http://gerrit.cloudera.org:8080/316
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-04-21 19:27:50 +00:00
Alex Behm
7067a5d94d IMPALA-1519: Fix wrapping of exprs via a TupleIsNullPredicate with analytics.
The bug:
Analytic functions introduced a few challenges in properly wrapping
exprs with TupleIsNullPredicates when substituting exprs from outer-joined
inline views.

1. The logical to physical tuple mapping during the plan generation of analytics
invalidated the tuple ids originally set in upstream TupleIsNullPredicates
introduced during analysis (e.g., in the result exprs).

2. TupleIsNullPredicates require specific tuple ids for evaluation.
Since sort nodes materializes a new tuple, it's impossible to evaluate
TupleIsNullPredicates referring to a sort's input after the sort.
Non-analytic sorts handle this case during analysis by materializing
the result of that select block. However, analytic sorts used to only materialize
the slots of materialized tuple ids of the input plan node.

The fixes:

1. Move the TupleIsNullPredicate wrapping from the inline-view analysis into
the inline-view planning. This avoids the original problem because all physical
output tuples are known during plan generation. This simple change has a few
subtle consequences: First, we must rely on the plan root's output smap for
substituting the final result exprs, and *not* use the top-level base table smap
generated during analysis. Second, during plan generation we must use an inline
view's smap (and *not* its base table smap) for generating the output smap of its
plan such that we can properly wrap the rhs exprs in TupleIsNullPredicates
at every level.
This change also fixes IMPALA-1946 by deferring the TupleIsNullWrapping to
planning time.

2. To preserve the information whether an input tuple was null or not at an
anlytic sort, we materialize TupleIsNullPredicates, which are then substituted
by a SlotRef into the sort's tuple in ancestor nodes.

This patch also cleans up and consolidates the code used for wrapping exprs into
TupleIsNullPredicate itself.

Change-Id: I5c6d142bdf9c99ece2a564e557d4ffe22ac90865
Reviewed-on: http://gerrit.cloudera.org:8080/317
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-04-14 23:33:20 +00:00
Dimitris Tsirogiannis
d8e5bbe2da IMPALA-1949: Analysis exception when a binary operator contain an IN
operator with values

This commit fixes an issue where a query is not successfully analyzed if an
IN operator with values appears in a binary predicate.

Change-Id: Ia3b83803a553b9a3b3489382fc53978a720c4b4f
Reviewed-on: http://gerrit.cloudera.org:8080/334
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-14 03:54:33 +00:00
Dimitris Tsirogiannis
4eceeacf16 IMPALA-1550: Invalid rewrite when EXISTS subqueries contain aggregate
functions

This commit fixes an issue where a [NOT] EXISTS subquery that contains
an aggregate function will sometimes be incorrectly rewritten into a
join, thereby returning incorrect results.

Change-Id: I18b211d76ee3de77d8061603ff5bb1fbceae2e60
Reviewed-on: http://gerrit.cloudera.org:8080/266
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-04-02 19:11:00 +00:00
Juan Yu
e121bc9b0a IMPALA-1476: Impala incorrectly handles text data missing a newline on the last line.
I did a local benchmark and there's minimal performance impact(<1%)

Change-Id: I8d84a145acad886c52587258b27d33cff96ea399
(cherry picked from commit 7e750ad5d90007cc85ebe493af4dce7a537ad7c0)
Reviewed-on: http://gerrit.cloudera.org:8080/189
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-03-20 19:58:50 -07:00
Skye Wanderman-Milne
9d6586cdb8 Addendum to IMPALA-1755 patch
This patch introduces SetLookup functionality for timestamp and
decimal types, as well addressing remaining code review comments.

Change-Id: Ied40d2d55adbdea891ff2ab97b30f0d3986645f9
Reviewed-on: http://gerrit.cloudera.org:8080/245
Tested-by: Internal Jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2015-03-20 14:37:23 -07:00
Matthew Jacobs
e8527ddb8e IMPALA-1888: FIRST_VALUE may produce incorrect results with preceding windows
Fixes a bug where FIRST_VALUE may produce incorrect results (or a DCHECK
failure in debug) when there is a window like "ROWS X PRECEDING Y PRECEDING",
such that X < Y and X > the size of a partition.

For windows with an end boundary that is PRECEDING (i.e.
the entire window is before a row), there is some special handling between
partitions, and the logic was not correct in some corner cases for FIRST_VALUE.

Change-Id: Ied5d440684e99dcaf60b47489c90300891f09b91
Reviewed-on: http://gerrit.cloudera.org:8080/236
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-20 14:37:19 -07:00
Skye Wanderman-Milne
5118c55a0a IMPALA-1810: IN predicate was not comparing DecimalVals correctly
The IN predicate wasn't using the decimal type when comparing decimal
values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a
query with a huge decimal IN predicate) and there is a ~5% performance
degradation with codegen enabled (surprisingly, there appears to be a
slight performance gain with codegen disabled). We should be able to
remove this penalty when we add constant injection via codegen.

Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd
Reviewed-on: http://gerrit.cloudera.org:8080/230
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-03-20 14:37:18 -07:00
Alex Behm
745e64a096 IMPALA-1837: Handle truncation when implicitly casting a literal to a decimal.
Implicit casting to decimals allows truncating digits from the left of the
decimal point (see TypesUtil). A literal that is implicitly cast to a decimal
with truncation is wrapped into a CastExpr so the BE can evaluate it and report
a warning. This behavior is consistent with casting/overflow of non-constant
exprs that return decimal.
IMPALA-1837: Without the CastExpr wrapping, such literals can exceed the max
expected byte size sent to the BE in toThrift().

Change-Id: Icd7b8751b39b8031832eec04bd8eac7d7000ddf8
Reviewed-on: http://gerrit.cloudera.org:8080/195
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 19:58:58 -07:00
Ippokratis Pandis
e36c436fa6 Adding tests with right joins and duplicates
Those tests were added as part of the new hash table implementation, as we didn't have
tests with right joins and duplicates (and other conjuncts) as well as aggregation
distinct queries with group bys on multiple columns. Adding them as a separate
patch, To improve testing coverage in the 2.2 release branch.

Change-Id: Id1b4f27fa6e587b2031635974ac9d2d39a1b015a
Reviewed-on: http://gerrit.cloudera.org:8080/193
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:40 -07:00
Dan Hecht
25b54eac1e S3: Fix test_multiple_filesystems.py
The filesizes changed slightly, causing the S3 CI build to fail.
Let's regex the file sizes in the compute stats expected results.

Change-Id: Ie95bdf3a253a28aa2b6f3deb281948780ca2cc6a
Reviewed-on: http://gerrit.cloudera.org:8080/200
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Dan Hecht <dhecht@cloudera.com>
2015-03-11 16:39:39 -07:00
Dan Hecht
2916132283 S3: enable more tests for S3
As needed, fix up file paths and other misc things to get
more test cases running against S3.

Change-Id: If4eaf9200f2abd17074080a37cd0225d977200ad
Reviewed-on: http://gerrit.cloudera.org:8080/167
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:39 -07:00
Alex Behm
adb19deece Re-enable tests that had been temporarily removed to unblock the full data load.
The following commits disabled tests to unblock the full data load:
a00a9a5e53f7a8e7a1e3c931ea0e4b7db21c6f00
bf29d06f2e53bb924d250275d51f5ccd1213531d

This patch re-enables those tests and adds new tests to guard against
regressions to HIVE-6308.

Unfortunately, we cannot completely remove the analysis check for HIVE-6308
in our code, because there is still one case where COMPUTE STATS will fail on
a Hive-created Avro table: If there is a mismatch in column names between
the Avro schema and the column defs given to a CREATE TABLE in Hive.

Change-Id: I81ae6b526db02fdfc634e09eeb9d12036e2adfdd
Reviewed-on: http://gerrit.cloudera.org:8080/180
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:38 -07:00
Dan Hecht
5aa8195534 S3: add end-to-end test for multiple filesystems
Verify DDL and queries when a table spans multiple filesystems
and across tables that live on different filesystems.

Change-Id: I4258bebae4a5a2758666f5c2e283eb2d205c995e
Reviewed-on: http://gerrit.cloudera.org:8080/166
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:38 -07:00
Matthew Jacobs
7216d09fe7 IMPALA-1808: AnalyticEvalNode cannot handle partition/order by exprs with NaN
Analytic function evaluation was broken when partition or order by exprs
evaluated to NaN (i.e. 0/0). We were relying on the comparison of the
current row with the previous row to be equal (i.e. x == x), but x != x
if x is NaN, and in the case of the very first row in the stream, some
logic breaks if x != x. The fix is to handle the very first row specially.

Change-Id: I1c33445d55a70c7f107f05eeadef272b7973ee11
Reviewed-on: http://gerrit.cloudera.org:8080/179
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-03-11 16:39:37 -07:00
Dimitris Tsirogiannis
c88d179413 IMPALA-1636: Generalize index-based partition pruning to allow constant
expressions

This commit enables fast partition pruning for cases where constant
expressions appear in binary or IN predicates. During partition pruning,
the constant expressions are evaluated in the BE and are replaced by the
computed results as LiteralExprs.

Change-Id: Ie8a2accf260391117559dc6c0a565f907c516478
Reviewed-on: http://gerrit.cloudera.org:8080/144
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-07 09:51:27 +00:00
Henry Robinson
146fe64a26 IMPALA-1615: Don't drop row count during DROP INCREMENTAL STATS
Change-Id: I1ae23ca9d70eeb58a3c7c8c59fb633832edcff58
Reviewed-on: http://gerrit.cloudera.org:8080/148
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 20:22:49 +00:00
Dan Hecht
41e3b6b61e S3: fix grant_revoke test to run against S3
1) Fix up locations to take FILESYSTEM_PREFIX into account
   so we can run the test against non-default FS.
2) Fix up results and catch sections.
3) Since S3 doesn't support INSERT, split the test into
   another version that expects different results for the
   INSERT part.  The rest of the test is identical, and
   we can remove this new .test file once INSERT is supported.

Change-Id: I50d21048b846aa985d1eefc50fc33bda05ebe509
Reviewed-on: http://gerrit.cloudera.org:8080/146
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 18:16:45 +00:00
zuowang
16792b28be IMPALA-1437: Implement SHOW FILES IN <table>
Query:SHOW FILES IN db.table

Result:
+---------------------------------------------+------+---------------------+
| path                                        | size | partition           |
+---------------------------------------------+------+---------------------+
| hdfs://namenode/path/to/partition/file1.dat | 128B | year=2010, month=11 |
| hdfs://namenode/path/to/partition/file2.dat | 256B | year=2010, month=12 |
| hdfs://namenode/path/to/partition/file3.dat | 1.3G | year=2011, month=1  |
+---------------------------------------------+------+---------------------+

Query:SHOW FILES IN db.table PARTITION(year=2010, month=12)

Result:
+---------------------------------------------+------+---------------------+
| path                                        | size | partition           |
+---------------------------------------------+------+---------------------+
| hdfs://namenode/path/to/partition/file2.dat | 256B | year=2010, month=12 |
+---------------------------------------------+------+---------------------+

Only support Hdfs tables. Will throw exceptions for other kinds of table.

Partition is optional. Will throw exception if specified partition cannot be found.

Change-Id: I6480ed87ab6cdfb02a60bffa72a8047a161f92ab
Reviewed-on: http://gerrit.cloudera.org:8080/19
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 05:13:50 +00:00
Matthew Jacobs
27209e4cb1 Fix exhaustive tests: move analytic fn tests using decimal_tbl to decimal.test
Change-Id: Iaaa5bd59b27d2db2736874e96d38cb823f6e4a56
Reviewed-on: http://gerrit.cloudera.org:8080/147
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-03-05 03:05:49 +00:00