Commit Graph

330 Commits

Author SHA1 Message Date
Dimitris Tsirogiannis
2ab66c4ca2 Add support for uncorrelated EXISTS subqueries
This commit adds support for uncorrelated EXISTS subqueries in Impala.
Uncorrelated EXISTS subqueries are rewritten using a CROSS JOIN.
Uncorrelated NOT EXISTS subqueries are not supported.

Change-Id: I0003dcdc0fa5cc99931b9a9f4deddbcd42572490
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4140
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4186
2014-09-05 12:36:18 -07:00
Alex Behm
321b9b0804 IMPALA-1148: Do not generate a sort node if the sort tuple has no materialized slots.
Change-Id: If9d55b54a8305798ab68470a4a698d95ef92ce7a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4176
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4184
2014-09-05 10:12:55 -07:00
Matthew Jacobs
5bf1c1f223 Analytic Functions: Add rank() and dense_rank()
Adds the rank() and dense_rank() analytic functions and makes internal
changes to the AggFnEvaluator that are necessary to support calling
Finalize() repeatedly (as the AnalyticEvalNode does) on UDAs that destroy
state in Finalize().

Rank requires both the current rank and the count at that rank in order to
determine the next rank, so the intermediate state is a StringVal containing
a struct with these two fields.

Aggregate functions (internally only, for now) can expose a GetValue() method
which takes an intermediate value and returns a final value without destroying
the intermediate state. Finalize() is then used to clean up intermediate
state, if necessary.

This also adds a second optional, internal-only function for UDAs to allow
removing values from intermediate state: Remove(). This will be required for
implementing sliding windows later but is added here because the change is
nearly identical to that for adding GetValue().

Some cleanup in the AnalyticEvalNode, most notably we avoid allocating tuples
to DeepCopy prev_input_row_ between input batches. Instead, we keep the last
two child row batches because the prev child row batch owns the resources for
prev_input_row_.

Change-Id: I5a30eb517a38d369fe63f7af91904a4b9786fadc
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3962
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 137bb45d81ea57655aefbf5cde0cbeab0121b8f0)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4183
2014-09-05 02:15:42 -07:00
Marcel Kornacker
b68b6dedc1 Planning and grouping of multiple analytic exprs from a select block.
This patch adds support for:
- Planning of multiple analytic exprs from a select block
- Simple grouping of analytic exprs by partition/order/window
  to reduce data exchanges and sorts

Change-Id: Ie2162558b2bc2e6218c30e694393e85cbf3251ff
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4120
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4168
2014-09-04 13:44:33 -07:00
Matthew Jacobs
c322e9590d Analytic functions: Initial BE support for ROWS windows
Adds support in the AnalyticEvalNode for ROWS windows with the start
boundary UNBOUNDED PRECEDING, i.e. the end boundary can specify an
offset or CURRENT ROW.

To reduce complexity where we maintain windows and determine when output
results can be produced (ProcessInputBatch), the logic that depends on
the window is factored into several functions. The core functionality
remains the same: for every input row, produce output results if possible,
update the analytic functions, and add the row to the input_stream_ to be
returned later when enough results are available. The functions
TryFinalizePrevRow, TryFinalizeCurrentRow, and InitializeNewPartition
are now called and handle the various window types appropriately.

Change-Id: I36cf76bf11d9e8b48d2556169683abcb43c1db7a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4073
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 421a032035fcb13e03f8e7d34b4908f1221fd9f5)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4163
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-09-04 00:46:23 -07:00
Nong Li
8fbd5fe2c9 PHJ memory transfer fixes and misc bug fixes.
Row batches contain auxiliary memory that can reside in tuple pools, io buffers and
now tuple streams. Like the other resources, these need to attached to row batches
and transfered up the operator tree to make sure the tuple ptrs are always valid.

Fixed bug in BufferedTupleStream to not delete blocks on read if it is pinned.

Fixed PHJ bug with row batch boundaries causing current_probe_row_ to be NULL.

Change-Id: I4c66d9961a117bfe3ed577de6170e875ea1feee7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3983
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4157
2014-09-03 20:12:24 -07:00
Dimitris Tsirogiannis
d9fa1a2e60 Fix issue where subqueries return wrong results in the presence of
distinct

This commit fixes two subquery issues:
1. During the rewrite of aggregate subqueries with count, a new select
list is created for the outer select block to eliminate new visible
tuples. However, the new select list was not initialized correctly,
causing distinct clauses to not be preserved.
2. Pushing negation to operands during a query rewrite was causing a
StackOverflowError when it was encountering predicates for which a
negate function is not implemented. Consequently, it was using the
negate function from the parent class causing it to recurse infinitely.

Change-Id: I6f1b8090af40fa55b13661d637f9aaaa00dfcf5c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4115
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4141
2014-09-03 12:25:59 -07:00
Alex Leblang
2a59029c2c [CDH5] IMPALA-1147: Updated compatibility tests run with Hive .13
Change-Id: I3947d0d8eb9ad5a7cb0248c0e8b512cc6e059a4f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4114
Reviewed-by: Alex Leblang <alex.leblang@cloudera.com>
Tested-by: jenkins
2014-09-02 15:25:54 -07:00
Dimitris Tsirogiannis
c2abcd6f3d Query transformation of nested queries.
This commit implements nested queries with [NOT] IN, [NOT] EXISTS and
aggregate subquery predicates in Impala. The following cases are
supported:
1. Correlated and uncorrelated [NOT] IN.
2. Correlated [NOT] EXISTS.
3. Correlated and uncorrelated aggregate subqueries.

Change-Id: Ia3f4843c5f07d4e31ef3faedc58a15e623f91a5d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3754
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4109
2014-08-29 15:35:21 -07:00
Alex Behm
e89ba550c9 Support aggregate functions with different intermediate and output types.
As a proof-of-concept, this patch implements avg() with a STRING intermediate
type, and changes variance() to output a DOUBLE.

I tested this change on single-node and distributed plans, with the
partitioned as well as the old aggregation node.

This patch leaves several things for follow-on changes:
- plumb through CHAR as an intermediate type
- modify other builtin aggregtes to use appropriate output/intermediate types
- allow analytic functions to have different output/intermediate types

Change-Id: I8d3396201cb370f44660ab4f7fe10216129abd09
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4016
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4079
2014-08-28 21:33:34 -07:00
Matthew Jacobs
b6cfe1af41 Few small fixes for analytic functions
1) Fix mem usage after free in AnalyticEvalNode:
   current_tuple_ cannot be allocated from the output_tuple_pool_ which
   occasionally transfers resources to the output row batch pool because
   the same tuple is reused.
2) Analysis should allow windows with UNBOUNDED PRECEDING to X PRECEDING
   and X FOLLOWING to UNBOUNDED FOLLOWING.
3) Fix a few bugs in the distributed planning.
4) Adds a few more tests and allows running the tests with the distributed
   plans.

Change-Id: I6bdc1e35b3d30b6e1e50ca85d78b75ef70469de5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4022
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
(cherry picked from commit 788b027439a03a1cc3378ff0191487577608e8b7)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4068
2014-08-27 18:03:22 -07:00
Victor Bittorf
2dce31f6c2 Adding VARCHAR front & backend.
VARCHAR is treated as StringVal in the backend. All UDAs and UDFs which accept STRING
will also accept VARCHAR(N).

TODO: Reverted Avro codegen to fix Jenkins; needs separate patch.

Change-Id: Ifc120b6f0fe1f996b11a48b134d339ad3719331e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/2527
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 3fcbf4f677b8e26c37eded4d8bb628e6fc53c1e9)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4058
2014-08-27 13:52:58 -07:00
Victor Bittorf
820e1c070b Support writing to Avro files
Introduces support for writing tables stored as Avro files. This supports writing all
data types except TIMESTAMP. Supports the following COMPRESSION_CODECs: NONE, DEFLATE,
SNAPPY.

Change-Id: Ica62063a4f172533c30dd1e8b0a11856da452467
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3863
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 15c6066d05d5077bee0d5123d26777b0715eb9c6)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4056
2014-08-27 13:41:42 -07:00
Dan Hecht
bc124b460a IMPALA-883: COMPUTE STATS returns -1 for number of rows in empty partition.
The query used to generate the stats does a GROUP BY on the partition keys,
and so empty partitions will not get any results.  Detect the empty partition
case and set the number of rows to 0.

Change-Id: I1ccb7d2016f35026aa1b418155c4534024f3cee5
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4029
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 128a02f508cdb280b53b8a8429e6b90491e43956)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4042
2014-08-26 13:48:07 -07:00
Matthew Jacobs
1515fd9db2 Analytic functions BE implementation
Evaluates analytic functions with a single pass over sorted input rows and using
a BufferedTupleStream to buffer output rows. It is assumed that the input has
already been sorted on all of the partition keys and then the order by exprs.
Analytic functions are implemented as aggregate functions.

Current implementation only supports partition clauses and order by clauses with
the default window (i.e. UNBOUNDED PRECEDING to CURRENT ROW).

Change-Id: I93f37a4e7fd8167261bf86c2a5b7c8569a1f7b11
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3939
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit af7703841d682c4b24fdc2f41b4b4655037475e6)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4015
2014-08-23 17:12:19 -07:00
Ippokratis Pandis
e21987e338 Bug fix in PHJ, addresses also IMPALA-1160
In PHJ, we have to reset hash_tbl_iterator_ before probing a new batch.
Adds regression test for IMPALA-1160.

Change-Id: I608280815de2c5c1e334b7d2b4a50b12bf1d9096
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3968
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3995
2014-08-22 01:51:34 -07:00
Alex Behm
7f51449869 Rename ANTI JOIN to LEFT ANTI JOIN for consistency with LEFT SEMI JOIN.
Change-Id: I8171b2d44b45529fdbd040d5709aaeb9f13facfa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-08-17 12:46:10 -07:00
Alex Behm
bceeb834f3 IMPALA-677: Fix visibility of semi and anti-joined table references.
Semi or anti-joined table references are now only visible inside the
On-clause of the corresponding join.

Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-08-17 12:45:45 -07:00
Victor Bittorf
f2ef06bef1 SEQUENCEFILE: Add support for writing sequence files.
This supports both uncompressed and block compressed formats. Row compressed formats are
not supported. The type of compression is specified using a query parameter
COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY.

Note: this patch only has basic testing. More extensive testing will be done when this
avro writer is used in data loading.

Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-08-17 12:45:10 -07:00
Skye Wanderman-Milne
559b83d3d0 Expr refactoring
This patch changes the interface for evaluating expressions, in order
to allow for thread-safe expression evaluations and easier
codegen. Thread safety is achieved via the ExprContext class, a
light-weight container for expression tree evaluation state. Codegen
is easier because more expressions can be cross-compiled to IR.

See expr.h and expr-context.h for an overview of the API
changes. See sort-exec-exprs.cc for a simple example of the new
interface and hdfs-scanner.cc for a more complicated example.

This patch has not been completely code reviewed and may need further
cleanup/stylistic work, as well as additional perf work.

Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:44 -07:00
Skye Wanderman-Milne
7a0cc27fd1 Convert math functions to the UDF interface.
Also adds FunctionContext::GetNumArgs() method to the public UDF API.

Change-Id: I76e21814e423f075a0a22b4e924c1d3ec26daba7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3410
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:32 -07:00
Nong Li
fd35cee887 Reorganize/reduce end to end test time.
This patch does a few things:
1) Move the metadata tests into their own folder under tests/. I think it's useful to
loosely categorize them so it's easier to run a subset of the tests that are most
useful for the changes you are making.

2) Reduce the test vectors for query_tests. We should have identical coverage in
the daily exhaustive runs but the normal runs should be much better. In particular,
deemphasizing scanner tests since that code is more stable now.

3) Misc test cleanup/consolidate python test files/etc.

Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-08-17 12:43:57 -07:00
anusha
901f3504cc Performance improvements: Alter table add partition and drop partition
This patch improves the performance of the DDL queries
    "Alter table add partition" and "Alter table drop partition"
    as the number of partitions is scaled up.

    The issue was that every time a partition was added or dropped,
    the entire block metadata for that table was reloaded. This
    operation was highly expensive especially as the number
    of partitions became larger.

    This patch handles this by adding/dropping only the added/dropped
    partition's metadata to the hdfsTable (adding/dropping it to/from
    the internal partition list), and incrementally updating the
    corresponding data structures instead of refreshing them from scratch.

    The following are the time improvements observed.

    Number of partitions    Time taken to add/drop     Time to add/drop
    (existing)              a new partition (before)   a new partition (now)

                       1          1.02s                    1.02s
                      10          0.27s                    0.27s
                     100          0.14s                    0.14s
                     500          0.35s                    0.35s
                    1000          0.91s                    0.51s
                   10000          11.72s                   0.85s
                   20000          21.92s                   0.87s

Out of this total time (for the worst case), around 0.50s is spent in
adding and dropping the partition to the hive meta store and rest of the
time is spent in updating the catalog.

Change-Id: I359ab0af921543c0fdcb975c14b05f80f93fe803
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3291
Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>
Tested-by: jenkins
2014-08-17 12:43:23 -07:00
Alex Behm
68592a82a3 IMPALA-1021: Fix loading of views with decimal and complex-typed columns.
Change-Id: I8b63c31be47dd64f1e13fb29be3105b0f7e245dc
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3820
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-08-17 12:26:54 -07:00
Ippokratis Pandis
3ee273ae50 Adding support for {anti,left semi,left outer} joins to the partitioned hash join implementation.
Adding the "anti join" keyword in the frontend and the corresponding backend paths for the
partitioned hash join implementation.  Adding some basic testing for this new join (the
other types have already tests).

Also, fixing a bug in the tuple stream when it was handling strings.

Change-Id: Ied8cff96b2bca284a5f66f7d11df5c5b5ec789cc
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3805
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-08-17 12:24:17 -07:00
Lenni Kuff
cd30246f17 Fix flaky group_concat() query tests
The ordering of results returned by the  group_concat() tests were not deterministic. This
fixes the problem by switching the test cases to use a subquery with an order by.

Also fixed a similar problem with the limit and union tests.

Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-08-12 13:38:12 -07:00
Lenni Kuff
286e312460 [CDH5] Minor code changes for Hive .13 support
Changes include:
* Fix compile errors due to new column stats API and other stats related
  fixes.
* Temporarily disable JDBC tests due to new serialization format in Hive .13
* Disable view compatibility tests until we can get them to work in Hive .13
* Test fixes due to Hive's type checking for partition column values

Change-Id: I05cc6a95976e0e037be79d91bc330a06d2fdc46c
2014-08-11 09:53:02 -07:00
Alex Behm
3111827ae2 IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode.
Before: Constant conjuncts used to be registered in the analyzer together with
non-constant conjuncts. Since constant conjuncts are not bound by any slot or
tuple they were incorrectly placed into whatever plan node called init() first
and then were incorrectly marked as assigned. For handling queries with a
limit 0 we had special code in the BE.

After: Since constant conjuncts do not fit well into the existing slot/tuple
based assignment logic this patch treats them specially as follows. Constant
that do not originate from the ON clause of an outer join are evaluated
directly. Depending on which clause the conjunct came from either the entire
query block is marked as returning an empty set (HAVING clause) or the block
is marked as having an empty select-project-join portion (ON and WHERE clause).
In the latter case, aggregations (if any) must still be performed.
The plan sub-trees that are guaranteed to return an empty result set are
implemented by an EmptySetNode. Constant conjuncts from the ON clause of an
outer are assigned to the node implementing the join.

Similarly, query blocks with a limit 0 are marked as returning an empty result,
and planned as an EmptySetNode.

As a side effect, this patch also fixes:
IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0
consistent with Hive's. The target table is left empty after
such an operation.

Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800
2014-08-08 04:35:31 -07:00
Nong Li
f0c7947558 IMPALA-1121: Fix joins on decimal columns with different precision/scale.
Change-Id: Ibac69763e28ad33ef41d000b5dd74fc73c74b73a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3726
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3739
Reviewed-by: Nong Li <nong@cloudera.com>
2014-08-04 01:45:40 -07:00
Alex Behm
22858ba7e1 IMPALA-1123: Add casts to the partition exprs of hash-partitioning senders.
This patch ensures that all hash-partitioning senders to a hash-partitioned
fragment hash on exprs of identical types. Casts are added as necessary.
Otherwise, the hashes generated for identical partition values may differ
among senders if the partition-expr types are not identical.

The new logic is placed into PlanFragment.finalize() in order to avoid
repeated re-casting of senders during plan generation, since every time
a child fragment is absorbed into a partition-compatible parent we
potentially need to add casts to all senders of that fragment again.

Change-Id: Id9f581cc03127f64f0631d9b288fab4cd4dd8a82
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3689
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3708
2014-07-31 23:57:08 -07:00
Dan Hecht
285aeda16e IMPALA-1110: group_concat agg function does not work with optional separator.
Rather than omit the first separator in each intermediate result,
always include the separator, but also remember the length of the
first separator.  Then, during finalize, remove whichever separator
string ends up at the beginning of the final merged result.

Change-Id: I6de7d1cda1a43b8de7d03c6798ec9667ffa457f8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3669
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c0d7cedb79fe557e22912afc716303b24a9dad0d)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3690
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
2014-07-31 18:15:16 -07:00
Dan Hecht
09bd8b7c27 Fix SetStmt.toSql().
It needs to handle the "SET" case.  Also, add some missing test cases
for "SET".  Also, cleanup test_set/set.test.

Change-Id: I34f6005ef17e196d94366e5301251a2987746fbf
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3620
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 41890b5a13f9429f058fb12453c78323df11fc7d)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3655
2014-07-30 11:37:11 -07:00
Matthew Jacobs
8258c53478 Disable histogram UDA test for decimal vals due to IMPALA-1111
Change-Id: I21391e671896c9ebe52fd45accc2d290267ee0ac
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3641
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5a7ede02ffda4bdbbde4bc56184639fdee0a9857)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3652
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-07-30 02:53:35 -07:00
Dan Hecht
1fee56cb26 IMPALA-1080: Implement "SET <query_option>" as SQL statement.
Also add support for "SET", which returns a table of query options and
their respective values.

The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.

Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
2014-07-25 10:25:09 -07:00
Matthew Jacobs
b83aa4984b Add compute histograms aggregate function
Adds an aggregate function to compute equi-depth histograms. The UDA
creates a sample of the column values using weighted reservoir sampling
and computes the histogram from the sorted sample.

TODO:
* Extract highly frequent values into separate buckets (i.e. 'compressed
  histogram').
* Expose separate finalize fn to produce samples and histogram data for stats

Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608
2014-07-25 00:21:10 -07:00
Paden Tomasello
67d23c2d4b Modified Case expression tests in exprs.test
Change-Id: I65cee2e14291db8bf14a428715b08dac475b863a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3485
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3601
2014-07-24 12:34:02 -07:00
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Lenni Kuff
7157f54bbe Support DROP STATS <table name>
Adds support for dropping all table and column stats from a table. Once incremental
stats are supported, this will provide the user a way to force a recompute of all
stats.

Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551
2014-07-21 10:28:16 -07:00
Paden Tomasello
3d173e65d2 Adding Codegen function and tests for CASE expressions.
Change-Id: Ib52b3e3f12b35e2c0a60ef94501c20ef83abdfe5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3187
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3498
2014-07-18 12:03:58 -07:00
Ippokratis Pandis
e1ae5fe95a IMPALA-1068: COMPUTE STATS should place -1 in #NULLs
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.

Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
2014-07-07 15:13:25 -07:00
Matthew Jacobs
65c1a6f21e Remove SOURCE keyword by parsing as an identifier and checking the value
Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier"

Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a
2014-06-30 16:47:47 -07:00
Alex Behm
96722da3fe Fix misplaced comment in testfile.
Change-Id: I55dc7d0e8e74a4f8c9a99e9601b2578ef6b0390d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3303
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3317
2014-06-30 10:17:26 -07:00
Skye Wanderman-Milne
3a6600c964 Fix UDF test
UDF invocations in udf.test should not specify a database. This is how
we switch between testing IR UDFs in the ir_function_test database and
native UDFs in the native_function_test database.

Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295
2014-06-26 22:17:56 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
bf85225911 IMPALA-881: Tests for joins with union inputs.
Change-Id: I4be6821ac3938345ca95c542d868c87512ff66da
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3229
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-06-23 15:38:06 -07:00
Nong Li
a7beb12540 [CDH5] Fix column stats for decimal.
Change-Id: I72b31f6431bf6259e759fd290200fd1a755f82c6
2014-06-20 23:03:06 -07:00
Victor Bittorf
2d7f2e19b2 IMPALA 938: Infer schema from Parquet file
Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'".
Supports all options that CREATE TABLE does. Currently only PARQUET is supported.
Run testdata/bin/create-load-data.sh after pulling this patch.

Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158
2014-06-20 17:38:01 -07:00
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00
Ippokratis Pandis
6026f1ebe1 IMPALA-1055: Compute stats query statements don't quote DB and table names
The compute stats statement was not quoting the DB and table names. If those names
were aliasing with keywords, then the compute stats would not execute due to a syntax
error.

Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198
2014-06-20 09:32:52 -07:00
Alex Behm
70d7ff07af CDH-19856: Disable Hive's stats autogathering.
Change-Id: I04e91f91d29b7863848a750e362c9d94469df7f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3156
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3169
2014-06-19 16:48:34 -07:00