Commit Graph

447 Commits

Author SHA1 Message Date
Matthew Jacobs
e004307bbe IMPALA-1419, IMPALA-1542: Fix NullLiteral to reset its type in resetAnalysisState
Queries with arithmetic exprs containing a NullLiteral child failed (IMPALA-1419)
or crashed (IMPALA-1542) because re-analysis of these exprs was incorrect.

Change-Id: Ice3461aed53863123bcf8f38af123d89ad3b7d6a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5429
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-11-26 14:29:48 -08:00
Alex Behm
4ad15bb2be IMPALA-1524: Materialize all tuples produced by an EmptySetNode.
Change-Id: I3b151ace464c67634104f84f7223c948fed8909e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5406
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c2959485a066b5c0b40e8b0790d526726236d0c9)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5409
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-11-25 23:21:02 -08:00
Skye Wanderman-Milne
390e773a44 rand() is not a constant expr
Also fixes a bug in Expr::DebugString()

Change-Id: I32b53072755781d0858481187864d2319b9ae1cb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5400
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6de9fab17a5032dd7c9d1ef6b8071703c67d223f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5425
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-11-25 18:38:27 -08:00
Skye Wanderman-Milne
8ad6ba9f8c IMPALA-1528: TupleIsNullPredicate is never constant
We were treating it as constant before since it has no children and we
didn't override Expr::IsConstant(). However, it's not constant since
it depends on the input tuple, which caused it to blow up when we
tried to evaluate it as a constant expr.

Change-Id: Ic2c3489ba605f03a7644e6ac9107d4310dd0aa7b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5399
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 10db8f1056e8887dc99b4a334283d4d37d5f757c)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5419
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-11-25 18:18:45 -08:00
Henry Robinson
44f57e5fb6 IMPALA-1122: Compute stats with partition granularity
This patch adds the ability to compute and drop column and table
statistics at partition granularity.

The following commands are added. Detail about the implementation
follows.

COMPUTE INCREMENTAL STATS <tbl_name> [PARTITION <partition_spec>]

This variant of COMPUTE STATS will, ultimately, do the same thing as the
traditional COMPUTE STATS statement, but does so by caching the
intermediate state of the computation for each partition in the Hive
MetaStore. If the PARTITION clause is added, the computation is
performed for only that partition. If the PARTITION clause is omitted,
incremental stats are updated only for those partitions with missing
incremental stats (e.g. one column does not have stats, or incremental
stats was never computed for this partition). In this patch, incremental
stats are only invalidated when a DROP STATS variant is executed. Future
patches can automatically invalidate the statistics after REFRESH or
INSERT queries, etc.

DROP INCREMENTAL STATS <tbl_name> PARTITION <part_spec>

This variant of DROP stats removes the incremental statistics for the
given table. It does *not* recalculate the statistics for the whole
table, so this should be used only to invalidate the intermediate state
for a partition which will shortly be subject to COMPUTE INCREMENTAL
STATS. The point of this variant is to allow users to notify Impala when
they believe a partition has changed significantly enough to warrant
recomputation of its statistics. It is not necessary for new partitions;
Impala will detect that they do not have any valid statistics.

--------

This is achieved by adapting the existing HLL UDA via swapping its
finalize method for a new one which returns the intermediate HLL
buckets, rather than aggregating and then disposing of them. This
intermediate state is then returned to Impala's catalog-op-executor.cc,
which then passes the intermediate state back to the frontend to be
ultimately stored in the HMS.

This intermediate state is computed on a per-partition basis by grouping
the input to the UDA by partition. Thus, the incremental computation
produces one row for each partition selected (the set of which might be
quite small, if there are few partitions without valid incremental
stats: this is the point of the new commands).

At the same time, the query coordinator aggregates the output of the UDA
to produce table-level statistics. This computation incorporates any
existing (and not re-computed) intermediate partition state which is
passed to the coordinator by the frontend. The resulting statistics are
saved to the table as normal.

Intermediate statistics are serialised to the HMS by writing a Thrift
structure's serialised form to the partition's 'parameters' map. There
is a schema-imposed limit of 4000 characters to the serialised string,
which is exacerbated by the fact that the Thrift representation must
first be base-64 encoded to avoid type errors in the HMS. The current
patch breaks the encoded structure into 4k chunks, and then recombines
them on read. The alltypes table (11 columns) takes about three of these
chunks. This may mean that incremental stats are not suitable for
particularly wide tables: these structures could be zipped before
encoding for some space savings. In the meantime, the NDV estimates are
run-length encoded (since they are generally sparse); this can result in
substantial space savings.

Change-Id: If82cf4753d19eb532265acb556f798b95fbb0f34
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4475
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5408
2014-11-25 09:13:37 -08:00
Dimitris Tsirogiannis
4b748ef5da IMPALA-1371: Predicate applied incorrectly when FULL OUTER JOIN is
present

This commit fixes the issue where predicates are not applied to the
correct query tree nodes when the query contains full outer joins. To
address this issue, we register information about the tuple ids that
are outer joined by full outer joins and use that information to guide
the assignment of predicates.

Change-Id: I854c05c159d86c0aaabfc12b7dd5c5982c5ece4b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5284
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-11-23 21:36:31 -08:00
Skye Wanderman-Milne
2bfb69523f IMPALA-1508: don't JIT TimestampFunctions::DateAddSub
For some reason, the try/catch added to fix IMPALA-1493 doesn't work
when we JIT the function. Fixing this in the JIT'd code will take some
time, so for now just don't JIT the function.

Change-Id: I7b2801027db0a9deb19b477c1a4ca0bdad77a825
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5383
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-11-23 21:36:03 -08:00
Dimitris Tsirogiannis
99ce6176be IMPALA-1387: On-clause conjuncts of anti joins must be evaluated by the
anti join.

This commit fixes the issue where conjuncts from the On-clause of an
anti join are not assigned to the anti join.

Change-Id: Id23f86b2979f996f46af90757b06a031855de0b8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5330
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5371
2014-11-21 12:08:21 -08:00
Alex Behm
c1d8c22862 IMPALA-1483: Substitute TupleIsNullPredicates to refer to physical analytic output.
The bug: TupleIsNullPredicates generated when substituting exprs against
outer-joined inline views containing analytic functions refer to the logical tuple
id of the analytics. These logical tuple ids are not materialized and should not be
referenced by any expr during BE evaluation, including TupleIsNullPredicates.

The fix: Substitute TupleIsNullPredicates referring to the logical analytic
output with TupleIsNullPredicates referring to the physical output.

Change-Id: I10bbd869279f01f15a83deeadc7675352c7daaf9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5317
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5362
2014-11-21 01:08:27 -08:00
Ippokratis Pandis
39e90bef8a Temp fix for IMPALA-1488: disable spilling hash tables with matches in right joins.
Right joins (right outer, right semi, right anti, and full outer) depend on the matched
flag for the build side, information stored in the hash tables. Thus, in right joins
if we spill a hash table that has matches, then we are going to lose this information
and return wrong results.

This patch adds a flag in the hash table which is set in case of right joins that had
at least one match. Then, the SpillPartition() algorithm won't spill partitions that
had hash table matches. If there are no partitions to spill the query will gracefully
fail with OOM.

Change-Id: I736400768529019bb10c2541de552d958eb90044
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5306
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5335
2014-11-20 13:58:44 -08:00
Ippokratis Pandis
87502f829c IMPALA-1471: Bug in spilling of PHJ that was affecting left anti and outer joins.
In cases where we had to spill the probe side of PHJs, we were not only appending
the probe row to the tuple stream to be spilled, but we were also getting into the
regular processing loop with the iterator set to End(). In the case of left anti
and left outer joins, the result was to incorrectly output this row, since it did not
have a match.

This bug had a small perf impact for all spilling joins because we were doing an
unnecessary loop for each probe row we had to spill.

This patch solves the problem by immediately going to the next probe row if the
current row is spilled. Additionally, it fixes a bug in the block mgr where there
was a code path we were not counting correctly the number of pinned buffers.
It also adds tpch-q21 in the set of queries to run in the spilling test.

Change-Id: I762f5c41fe468e4485a4b31dabe2e53f6b49ae24
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5313
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5334
2014-11-20 02:21:14 -08:00
Victor Bittorf
4339133887 Adding SEQUENCEFILE compressed record format
Currently we do not support per record compression for SEQUENCEFILE; we do support no
compression and block compression. Per record compression is typically very slow
(since the compressor is invoked per record in the table) and not widely used.

We chose to add support for per record compression as part of our effort to use Impala
for all of our testdata loading infrastructure. We have per record compressed tables
in testdata, so even though there is no customer demand for per record compression,
we need it to migrate our data loading off of Hive.

Change-Id: I6ea98ae0d31cceff8236b4b006c3a9fc00f64131
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5302
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f62a76f8d00b8dbc2846deb36ee5f65031ad846e)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5322
2014-11-19 17:21:36 -08:00
Nong Li
fa774bfb85 IMPALA-1392: Fix crash from UDFs that throw exceptions.
Change-Id: Ic8775d6344aba9655511f99c0a1760e8e148d0cf
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5243
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-11-17 15:03:14 -08:00
casey
24ce8cfada IMPALA-1456: Hive UDFs with String args would crash impalad
The wrong buffer was being used.

Change-Id: I18bf9040eaeda871d1d0baee2e276749a3a38615
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5185
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
2014-11-17 15:02:30 -08:00
casey
4915ea4ac9 IMPALA-1134: Use copyBytes() to get value from o.a.h.io.Text
This affects java UDFs. Previously it was possible that the length of
the string returned from a java udf didn't match the actual data. Per the
Text.getBytes() documentation "... only data up to getLength() is
valid.". Impala just needs to use copyBytes() which is a convenience
function for this situation. The same should be done for BytesWritable.

Before:

Query: select length(echo('12345678901234567890'))
+-------------------------------------------+
| length(java.echo('12345678901234567890')) |
+-------------------------------------------+
| 22                                        |
+-------------------------------------------+

After:

Query: select length(echo('12345678901234567890'))
+-------------------------------------------------+
| length(functional.echo('12345678901234567890')) |
+-------------------------------------------------+
| 20                                              |
+-------------------------------------------------+

Change-Id: If9671278df8abf7529d3bc470c5f9d037ac3da1b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4897
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
2014-11-17 15:02:24 -08:00
Victor Bittorf
3f75bd6735 Reintroduce SEQUENCEFILE writer tests
The sequence writer test had an issue with zlib on certain cluster machines, making
this a flaky test. This has passed several times locally and in private builds. This
re-enables the test because the failures could not be produced in private builds.

Change-Id: I0aeea3a2d000e711e5a84427a7b40592e1eef75b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5077
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-11-17 11:19:16 -08:00
casey
516d7483dd IMPALA-1300: Allow subqueries in UNION operands
This enables the existing subquery rewrite rules to rewrite UNION
statements. UNION rewriting is easily done by simply calling the
rewriter for each operand in the UNION. At least one TPC-DS query
requires this functionality (IMPALA-1365).

The more difficult case of a UNION within a subquery is still not
supported.

Change-Id: I7f83eed0eb8ae81565e629f09f6918a4ba86ee13
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4859
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
2014-11-17 11:19:09 -08:00
Alex Behm
7b6ecbeea5 Fix exhaustive test run: Modify test to produce identical results on HBase.
Change-Id: I7187f9aca63f61ea1686820b3cbec277240da191
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4866
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
2014-11-17 11:19:01 -08:00
Dan Hecht
4bf6a21a9e S3: Qualify DataSource paths
Impala qualifies all paths stored in the metastore except for the
DataSource jar path.  Use a qualified path here as well, which will
allow datasources to live on the non-default FS.

In CreateDataSrcStmt, use the post-analyzed qualified path rather than
the user passed string.  Then, fix CreateTableDataSrcStmt so that it
doesn't strip out the scheme://authority portion of the URI, but instead
uses the qualified path string directly.

Note that the metastore may still contain unqualified paths in
DataSource tables' properties that were generated by previous versions.
That's okay though since the backend won't assume all paths are
qualified in case other components generate (or have in the past)
metadata with unqualified paths.

Change-Id: I905d8f6a7bf1793cfccf720b6ab5dc845d7dd5fa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5201
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 86c75be01d0f5654291acdbc1c68f5a76915028c)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5239
2014-11-13 12:42:32 -08:00
Skye Wanderman-Milne
c693fbc48c Misc. diagnostic/debugging improvements
- Add number of files in table to query plan
- Add number of remote scan ranges to runtime profile
- Clean up logging in ClientCache

Change-Id: I0580fe435ac0a52548aedb4e0ffa875ce9b9dede
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5166
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-11-06 22:04:11 -08:00
Nong Li
e2d7fb6402 Some test case cleanup.
Change-Id: Ic29b7c1f5fd714a1e2cc41bf0e55c0d11c782862
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4791
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5090
Reviewed-by: Nong Li <nong@cloudera.com>
2014-11-03 22:33:08 -08:00
Matthew Jacobs
164687ad81 IMPALA-1357: Analysis of WithClause pollutes global state
The analysis of a with clause should have its own global state so the
local view(s) can be analyzed without polluting the global state of the
parent QueryStmt. This might not always matter, but in a complex query
involving a with clause that contained a subquery, re-analysis of the
WithClause after the subquery rewrite resulted in an invalid Exists
conjunct being registered in the parent analyzer's global state. The
Exists conjunct was assigned to a scan node which then failed a
pre-condition check.

Change-Id: Ib020787b2e1ff202d96fe1b92bd9740897ab32a0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4825
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 629a8652c5a290054a8e582cc5cb5768a3ee67a8)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5038
2014-10-30 16:50:00 -07:00
Martin Grund
6e0c1c26c9 IMPALA-1424: abs() function retains input type
This patch modifies the abs() built-in function so that it
retains the type of the input argument for the return type
in the same way as Postgres does.

Change-Id: I1750237b85bedbc3ce9d52330ac4d458b0aada3a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4980
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 424b359ab0a4f621f2865844c3293f2c80e0867f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4996
2014-10-28 08:07:21 -07:00
Skye Wanderman-Milne
4a722980e5 IMPALA-1401: raise MAX_PAGE_HEADER_SIZE and use scanner context to
stitch together header buffer

Change-Id: I4f33b90e845e9bef1ac929bf4ebb8e98eaff985c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4961
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c3a90183b2f03434a9604f3aa2ef6dd08c9ba97c)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4981
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-10-27 16:30:56 -07:00
Matthew Jacobs
56611601a3 IMPALA-1395: Add test case back, but commented out
Change-Id: I157db82dd016afd54a55512225e8cd6025ec161d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4936
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4943
2014-10-24 10:31:48 -07:00
Matthew Jacobs
aedf8e5fb8 IMPALA-1395: Remove slow test for IMPALA-1312 that breaks exhaustive runs
Removing the test case for IMPALA-1312 to unblock exhaustive runs. This query was
previously hitting a DCHECK failure in the BufferedTupleStream where the number of
pinned blocks wasn't being updated properly. With codegen enabled, this query took
~70sec. Without codegen, it took so long that the exhaustive runs would fail- I
found it took ~35min on my local machine.

IMPALA-1414 tracks investigating why this query is so slow.

Change-Id: I2bf8a8c51fc7ded0026e334636f9b2cc859ffdb2
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4931
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f8b7320e035549da4e4a6a99b87da97bc18be0ad)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4941
2014-10-24 03:47:45 -07:00
Martin Grund
e866765213 IMPALA-181: ORDER BY with Ordinals
In case of certain queries order by with ordinals would not work
properly. This is the case for all "select * " type of queries. Until
now, the ordinal substitution was based on the values from the select
list. However, these expression are not expanded in case of "*",
rather the list of result expressions and column lables is filled.

This patch simply changes the lookup of the expression from the select
list to the result list because only ordinals from the result can be
used as a sorting field.

Change-Id: I21d3c3da837307cae04f8a4be02ca31bdcfbcbdb
(cherry picked from commit 1b62c08552c19f1b0c2220d1568804e2eba7efac)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4920
Tested-by: jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2014-10-22 15:19:09 -07:00
Nong Li
86aebc7f8f IMPALA-1348: Fix NAAJ where the null partitions have streams with multiple blocks.
Change-Id: I892f3435814bd4fcddeb496017dbb60704f13419
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4728
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-10-14 12:01:53 -07:00
Henry Robinson
b6e91905ed IMPALA-1384: Fix show table stats test on exhaustive test run
Change-Id: I2f1033bc078906ce72a19099f214ab4e3cd9a936
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4824
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 0ead02755b6a65d408bed59df810114e26c0c397)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4830
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-10-11 22:46:05 -07:00
ishaan
23964c19af [CDH5] Fix bad merge in in spilling.test
Change-Id: Ia6e30cf5916c737088d8cb969e0167b9d69a599e
2014-10-08 23:19:02 -07:00
Nong Li
5845a02b6e IMPALA-1351: Update NAAJ stream to use io sized buffers and better error handling.
Since we only make one NULL-aware stream per NAAJ (as opposed to one per partition),
we do not care about the memory footprint on this tuple stream. For simplicity,
this will always use io-sized buffers.
Also, improving error handling in PHJ::ProcessProbeBatch(), as status_ was not being
set properly.

Disabling the regression test for this bug, as it takes too long to run. Need to find
a simpler query.

Change-Id: I7572f607199f38b1bc30ae208ece2832522342a1
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4770
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins

Conflicts:

	be/src/exec/partitioned-hash-join-node.cc
2014-10-07 16:52:05 -07:00
Nong Li
a2e7b05bb1 IMPALA-1332: Fix memory leak for FULL OUTER/RIGHT OUTER joins.
This can happen if not all rows are returned.

Change-Id: I4d54641b71c44faa85a2138d16f9dda1052317b5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4737
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-10-06 19:49:56 -07:00
Matthew Jacobs
652d4b4699 IMPALA-1234: Fix bugs when producing EmptySetNode
Fixes two issues that can occur when generating the plan for a
stmt with an empty result set (e.g. due to limit 0 or constant
predicates that evaluate to false):
 1) Unions with an inline view that produces an empty result set
    does not create the EmptySetNode for the correct stmt.
 2) An EmptySetNode may contain non-materialized tuples which
    will fail a precondition check when generating the thrift
    plan.

Change-Id: I1511c755be3a59fdb8934624fd08250323266d27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4744
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-10-06 19:49:50 -07:00
Skye Wanderman-Milne
b6204dff59 IMPALA-1340: removing implicit casts during expr substitution is not always safe
Union statements were sometimes losing necessary casts during
expression substitution, causing the backend union node to receive
slot refs that did not have the same types as the result tuple. Add a
flag to Expr.Substitute() to preserve the root expr types, which adds
back the casts after substitution.

Currently only the union node sets this flag to true, but there may be
other places that are incorrect.

Change-Id: I1b4d9846860ef9694ff0c089f79654b1746d687d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4777
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-10-06 17:47:37 -07:00
Nong Li
de31fa8e21 Disable spilling tests that are too flaky.
Change-Id: I4ac877c3fa8297d873c67f219bb0c75f0001562d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4731
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-10-06 15:18:56 -07:00
Alex Behm
3e7de9f304 IMPALA-1318: Joins should not return semi-joined tuples.
Change-Id: I93f5ddb8317af7794b5977e145805f9ff498d722
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4633
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-10-06 15:17:22 -07:00
Henry Robinson
6af7c8fe4a IMPALA-1330: Fix column types for SHOW {table, partition} STATS
Because we add 'total' to the last row in SHOW PARTITIONS, we set the
partition key columns to be string. At least, that's what the comment
said, but we didn't do that in fact.

This patch also corrects the column type for max width, which should be INT.

Change-Id: I787ab17be27f45107340119017e528c58a3daad3
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4678
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-10-06 15:16:56 -07:00
Victor Bittorf
7b244d34b6 IMPALA-1344: Fixed analytic aggregations with CHAR
The fix is to only register aggregates for string, not for CHAR or VARCHAR. The CHAR and
and VARCHAR types are implicitly cast to STRING for aggregation.

Also, fixed aggregate fn builtins that should not ignore distinct.

Change-Id: If4c1a2c6127360c2c8127a5c02949df74fafc85a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4717
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:16:50 -07:00
Victor Bittorf
a62500ee28 Changed CHAR & VARCHAR max length to match Hive.
Also modified the text of the analysis exception for lengths that are too long or
short because John said they were unclear.

Change-Id: I9427d5c39298aa8207672e50e10fe527c5076599
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4698
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:16:45 -07:00
Victor Bittorf
c29ed3761e IMPALA-1339: NULLs incorrectly hashed in groupby
Problem: hash table assumed all raw values were at most 16 bytes. This maximum was
increased to to support up to 128 bytes for CHARs.

Change-Id: I107c58b9a013d5db46ff5586bcdceee3961346e9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4701
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:16:36 -07:00
Nong Li
e08ffde009 PA/PHJ: Increase fanout to 32 and fix interaction with small buffers.
Small buffers introduced an issue that is exacerbated by the large fanout. A stream can
only be appended to forever once it has grabbed the initial io sized buffer. With small
buffers, we don't grab that at the beginning anymore and, before this patch, it is
grabbed when the stream first needs it. This means when one stream needs it, another
stream could have already grabbed it (meaning this stream is pinned with multiple
buffers).

This patch has all the streams grab an IO buffer as soon as the first stream needs an
io buffer. This guarantees that all streams get 1 before any get 2.

Change-Id: I1be1219fc5f1fa3ceedd4d5e76ae056c8bb8ff3d
2014-10-06 15:16:16 -07:00
Victor Bittorf
d5fd59e2ed IMPALA-1337: Aggregation failures for VARCHAR
The issue is that the aggregation node needed to use IsVarLen; previously
it assumed TYPE_STRING was the only variable length type.

Change-Id: I9545e8d405937a47b25c9042f97854851a448c6e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4690
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:14:51 -07:00
Victor Bittorf
f4626b03e6 IMPALA-1322: Fix related issue
There is an issue related to IMPALA-1322. The expression list when laying out memory
was being improperly index.

Change-Id: I2eef84a812b451d87ecb8afd304e765aff1f5a6b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4675
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:14:44 -07:00
Nong Li
3e632ef6ad Reduce min PA/PHJ mem requirement.
Update PA/PHJ to use small (< io sized buffers) initially. Without this we would
not be able to run at the QPS that we need just due to the buffering requirements
of these operators.

Change-Id: Ic8a777d147893567c9590fbab17f561eadb6ee19
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4623
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-10-06 15:14:10 -07:00
Victor Bittorf
794e70b0bd Fix CHAR/VARCHAR Aggregation
This fixes an issue where VARCHAR and CHAR could error in some aggregations.
The cause of the problem is that the BE currently does not support CHAR/VARCHAR as
arguments to aggregates, they require an implicit cast to string first.
The resolution is to have these operators return STRING instead of CHAR(*) or VARCHAR(*).
Note that the CHAR(*) comparisons still ignore spaces for min/max.

This takes advantage of the fact that STRING, VARCHAR(*), and CHAR(*) values are all
handled as a StringVal for exprs. The STRING aggregates are registered as CHAR(*) and
VARCHAR(*) aggregates and the front end converts the return type to a STRING in all cases.

Also includes a fix for a TODO about casting between CHAR and VARCHAR.

Change-Id: I1d3a9cc48e426286ce63677324a8c680e67b005a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4573
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:13:17 -07:00
Victor Bittorf
fa502f973a IMPALA-1319: Fixed CHAR padding for numeric casts
IMPALA-1322: Crash on VARCHAR/CHAR join

Fixed 2 issues:
  (1) Disabled codegen for CHAR in hash join equality
  (2) fixed memory layout for CHAR
  (3) Fixed a regression where space padding could be dropped for numeric casts.

Change-Id: I6475fd527ca0d67c7d4d5ec7e561549e43fbc336
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4640
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:44 -07:00
Skye Wanderman-Milne
0db2181d97 IMPALA-1326: fix bug in BufferedTupleStream::GetTupleRow()
Change-Id: If133a2041e0bae0c327fe83b114e36b9320784bb
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4658
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-10-06 15:12:32 -07:00
Henry Robinson
080299730c IMPALA-1298: Add var_{pop,samp} as aliases for variance_{pop,samp}
Change-Id: I5880ad7ebf0775704ee7fa08685928224e316458
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4656
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:25 -07:00
Nong Li
a1b2de9c95 Update distinctpc/pcsa to return bigint.
Change-Id: Iac3414aa0151f52ba9ec028da152b09fc09af264
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4637
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:12 -07:00
Matthew Jacobs
8b1b8f5780 IMPALA-1302: Incorrect result of FIRST_VALUE query
FIRST_VALUE with row offsets preceding did not produce the correct
results. This fix changes the rewrite for FIRST_VALUE and adds
additional handling for NULLs in the backend.

Change-Id: I03d54c05f63f46e9adb467008fa876ab33812c7b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4648
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:03 -07:00