impala

mirror of https://github.com/apache/impala.git synced 2026-01-30 06:00:18 -05:00

Author	SHA1	Message	Date
Dimitris Tsirogiannis	2ab66c4ca2	Add support for uncorrelated EXISTS subqueries This commit adds support for uncorrelated EXISTS subqueries in Impala. Uncorrelated EXISTS subqueries are rewritten using a CROSS JOIN. Uncorrelated NOT EXISTS subqueries are not supported. Change-Id: I0003dcdc0fa5cc99931b9a9f4deddbcd42572490 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4140 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4186	2014-09-05 12:36:18 -07:00
Alex Behm	321b9b0804	IMPALA-1148: Do not generate a sort node if the sort tuple has no materialized slots. Change-Id: If9d55b54a8305798ab68470a4a698d95ef92ce7a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4176 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4184	2014-09-05 10:12:55 -07:00
Matthew Jacobs	5bf1c1f223	Analytic Functions: Add rank() and dense_rank() Adds the rank() and dense_rank() analytic functions and makes internal changes to the AggFnEvaluator that are necessary to support calling Finalize() repeatedly (as the AnalyticEvalNode does) on UDAs that destroy state in Finalize(). Rank requires both the current rank and the count at that rank in order to determine the next rank, so the intermediate state is a StringVal containing a struct with these two fields. Aggregate functions (internally only, for now) can expose a GetValue() method which takes an intermediate value and returns a final value without destroying the intermediate state. Finalize() is then used to clean up intermediate state, if necessary. This also adds a second optional, internal-only function for UDAs to allow removing values from intermediate state: Remove(). This will be required for implementing sliding windows later but is added here because the change is nearly identical to that for adding GetValue(). Some cleanup in the AnalyticEvalNode, most notably we avoid allocating tuples to DeepCopy prev_input_row_ between input batches. Instead, we keep the last two child row batches because the prev child row batch owns the resources for prev_input_row_. Change-Id: I5a30eb517a38d369fe63f7af91904a4b9786fadc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3962 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 137bb45d81ea57655aefbf5cde0cbeab0121b8f0) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4183	2014-09-05 02:15:42 -07:00
Marcel Kornacker	b68b6dedc1	Planning and grouping of multiple analytic exprs from a select block. This patch adds support for: - Planning of multiple analytic exprs from a select block - Simple grouping of analytic exprs by partition/order/window to reduce data exchanges and sorts Change-Id: Ie2162558b2bc2e6218c30e694393e85cbf3251ff Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4120 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4168	2014-09-04 13:44:33 -07:00
Matthew Jacobs	c322e9590d	Analytic functions: Initial BE support for ROWS windows Adds support in the AnalyticEvalNode for ROWS windows with the start boundary UNBOUNDED PRECEDING, i.e. the end boundary can specify an offset or CURRENT ROW. To reduce complexity where we maintain windows and determine when output results can be produced (ProcessInputBatch), the logic that depends on the window is factored into several functions. The core functionality remains the same: for every input row, produce output results if possible, update the analytic functions, and add the row to the input_stream_ to be returned later when enough results are available. The functions TryFinalizePrevRow, TryFinalizeCurrentRow, and InitializeNewPartition are now called and handle the various window types appropriately. Change-Id: I36cf76bf11d9e8b48d2556169683abcb43c1db7a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4073 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins (cherry picked from commit 421a032035fcb13e03f8e7d34b4908f1221fd9f5) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4163 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-09-04 00:46:23 -07:00
Nong Li	8fbd5fe2c9	PHJ memory transfer fixes and misc bug fixes. Row batches contain auxiliary memory that can reside in tuple pools, io buffers and now tuple streams. Like the other resources, these need to attached to row batches and transfered up the operator tree to make sure the tuple ptrs are always valid. Fixed bug in BufferedTupleStream to not delete blocks on read if it is pinned. Fixed PHJ bug with row batch boundaries causing current_probe_row_ to be NULL. Change-Id: I4c66d9961a117bfe3ed577de6170e875ea1feee7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3983 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4157	2014-09-03 20:12:24 -07:00
Dimitris Tsirogiannis	d9fa1a2e60	Fix issue where subqueries return wrong results in the presence of distinct This commit fixes two subquery issues: 1. During the rewrite of aggregate subqueries with count, a new select list is created for the outer select block to eliminate new visible tuples. However, the new select list was not initialized correctly, causing distinct clauses to not be preserved. 2. Pushing negation to operands during a query rewrite was causing a StackOverflowError when it was encountering predicates for which a negate function is not implemented. Consequently, it was using the negate function from the parent class causing it to recurse infinitely. Change-Id: I6f1b8090af40fa55b13661d637f9aaaa00dfcf5c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4115 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4141	2014-09-03 12:25:59 -07:00
Alex Leblang	2a59029c2c	[CDH5] IMPALA-1147: Updated compatibility tests run with Hive .13 Change-Id: I3947d0d8eb9ad5a7cb0248c0e8b512cc6e059a4f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4114 Reviewed-by: Alex Leblang <alex.leblang@cloudera.com> Tested-by: jenkins	2014-09-02 15:25:54 -07:00
Dimitris Tsirogiannis	c2abcd6f3d	Query transformation of nested queries. This commit implements nested queries with [NOT] IN, [NOT] EXISTS and aggregate subquery predicates in Impala. The following cases are supported: 1. Correlated and uncorrelated [NOT] IN. 2. Correlated [NOT] EXISTS. 3. Correlated and uncorrelated aggregate subqueries. Change-Id: Ia3f4843c5f07d4e31ef3faedc58a15e623f91a5d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3754 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4109	2014-08-29 15:35:21 -07:00
Alex Behm	e89ba550c9	Support aggregate functions with different intermediate and output types. As a proof-of-concept, this patch implements avg() with a STRING intermediate type, and changes variance() to output a DOUBLE. I tested this change on single-node and distributed plans, with the partitioned as well as the old aggregation node. This patch leaves several things for follow-on changes: - plumb through CHAR as an intermediate type - modify other builtin aggregtes to use appropriate output/intermediate types - allow analytic functions to have different output/intermediate types Change-Id: I8d3396201cb370f44660ab4f7fe10216129abd09 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4016 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4079	2014-08-28 21:33:34 -07:00
Matthew Jacobs	b6cfe1af41	Few small fixes for analytic functions 1) Fix mem usage after free in AnalyticEvalNode: current_tuple_ cannot be allocated from the output_tuple_pool_ which occasionally transfers resources to the output row batch pool because the same tuple is reused. 2) Analysis should allow windows with UNBOUNDED PRECEDING to X PRECEDING and X FOLLOWING to UNBOUNDED FOLLOWING. 3) Fix a few bugs in the distributed planning. 4) Adds a few more tests and allows running the tests with the distributed plans. Change-Id: I6bdc1e35b3d30b6e1e50ca85d78b75ef70469de5 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4022 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> (cherry picked from commit 788b027439a03a1cc3378ff0191487577608e8b7) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4068	2014-08-27 18:03:22 -07:00
Victor Bittorf	2dce31f6c2	Adding VARCHAR front & backend. VARCHAR is treated as StringVal in the backend. All UDAs and UDFs which accept STRING will also accept VARCHAR(N). TODO: Reverted Avro codegen to fix Jenkins; needs separate patch. Change-Id: Ifc120b6f0fe1f996b11a48b134d339ad3719331e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/2527 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 3fcbf4f677b8e26c37eded4d8bb628e6fc53c1e9) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4058	2014-08-27 13:52:58 -07:00
Victor Bittorf	820e1c070b	Support writing to Avro files Introduces support for writing tables stored as Avro files. This supports writing all data types except TIMESTAMP. Supports the following COMPRESSION_CODECs: NONE, DEFLATE, SNAPPY. Change-Id: Ica62063a4f172533c30dd1e8b0a11856da452467 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3863 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 15c6066d05d5077bee0d5123d26777b0715eb9c6) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4056	2014-08-27 13:41:42 -07:00
Dan Hecht	bc124b460a	IMPALA-883: COMPUTE STATS returns -1 for number of rows in empty partition. The query used to generate the stats does a GROUP BY on the partition keys, and so empty partitions will not get any results. Detect the empty partition case and set the number of rows to 0. Change-Id: I1ccb7d2016f35026aa1b418155c4534024f3cee5 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4029 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins (cherry picked from commit 128a02f508cdb280b53b8a8429e6b90491e43956) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4042	2014-08-26 13:48:07 -07:00
Matthew Jacobs	1515fd9db2	Analytic functions BE implementation Evaluates analytic functions with a single pass over sorted input rows and using a BufferedTupleStream to buffer output rows. It is assumed that the input has already been sorted on all of the partition keys and then the order by exprs. Analytic functions are implemented as aggregate functions. Current implementation only supports partition clauses and order by clauses with the default window (i.e. UNBOUNDED PRECEDING to CURRENT ROW). Change-Id: I93f37a4e7fd8167261bf86c2a5b7c8569a1f7b11 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3939 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit af7703841d682c4b24fdc2f41b4b4655037475e6) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4015	2014-08-23 17:12:19 -07:00
Ippokratis Pandis	e21987e338	Bug fix in PHJ, addresses also IMPALA-1160 In PHJ, we have to reset hash_tbl_iterator_ before probing a new batch. Adds regression test for IMPALA-1160. Change-Id: I608280815de2c5c1e334b7d2b4a50b12bf1d9096 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3968 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3995	2014-08-22 01:51:34 -07:00
Alex Behm	7f51449869	Rename ANTI JOIN to LEFT ANTI JOIN for consistency with LEFT SEMI JOIN. Change-Id: I8171b2d44b45529fdbd040d5709aaeb9f13facfa Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3873 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:46:10 -07:00
Alex Behm	bceeb834f3	IMPALA-677: Fix visibility of semi and anti-joined table references. Semi or anti-joined table references are now only visible inside the On-clause of the corresponding join. Change-Id: Id93e53ecdf2a74baf9736aa427fa7af15358ca27 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3789 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:45:45 -07:00
Victor Bittorf	f2ef06bef1	SEQUENCEFILE: Add support for writing sequence files. This supports both uncompressed and block compressed formats. Row compressed formats are not supported. The type of compression is specified using a query parameter COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY. Note: this patch only has basic testing. More extensive testing will be done when this avro writer is used in data loading. Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-08-17 12:45:10 -07:00
Skye Wanderman-Milne	559b83d3d0	Expr refactoring This patch changes the interface for evaluating expressions, in order to allow for thread-safe expression evaluations and easier codegen. Thread safety is achieved via the ExprContext class, a light-weight container for expression tree evaluation state. Codegen is easier because more expressions can be cross-compiled to IR. See expr.h and expr-context.h for an overview of the API changes. See sort-exec-exprs.cc for a simple example of the new interface and hdfs-scanner.cc for a more complicated example. This patch has not been completely code reviewed and may need further cleanup/stylistic work, as well as additional perf work. Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-08-17 12:44:44 -07:00
Skye Wanderman-Milne	7a0cc27fd1	Convert math functions to the UDF interface. Also adds FunctionContext::GetNumArgs() method to the public UDF API. Change-Id: I76e21814e423f075a0a22b4e924c1d3ec26daba7 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3410 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-08-17 12:44:32 -07:00
Nong Li	fd35cee887	Reorganize/reduce end to end test time. This patch does a few things: 1) Move the metadata tests into their own folder under tests/. I think it's useful to loosely categorize them so it's easier to run a subset of the tests that are most useful for the changes you are making. 2) Reduce the test vectors for query_tests. We should have identical coverage in the daily exhaustive runs but the normal runs should be much better. In particular, deemphasizing scanner tests since that code is more stable now. 3) Misc test cleanup/consolidate python test files/etc. Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-08-17 12:43:57 -07:00
anusha	901f3504cc	Performance improvements: Alter table add partition and drop partition This patch improves the performance of the DDL queries "Alter table add partition" and "Alter table drop partition" as the number of partitions is scaled up. The issue was that every time a partition was added or dropped, the entire block metadata for that table was reloaded. This operation was highly expensive especially as the number of partitions became larger. This patch handles this by adding/dropping only the added/dropped partition's metadata to the hdfsTable (adding/dropping it to/from the internal partition list), and incrementally updating the corresponding data structures instead of refreshing them from scratch. The following are the time improvements observed. Number of partitions Time taken to add/drop Time to add/drop (existing) a new partition (before) a new partition (now) 1 1.02s 1.02s 10 0.27s 0.27s 100 0.14s 0.14s 500 0.35s 0.35s 1000 0.91s 0.51s 10000 11.72s 0.85s 20000 21.92s 0.87s Out of this total time (for the worst case), around 0.50s is spent in adding and dropping the partition to the hive meta store and rest of the time is spent in updating the catalog. Change-Id: I359ab0af921543c0fdcb975c14b05f80f93fe803 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3291 Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com> Tested-by: jenkins	2014-08-17 12:43:23 -07:00
Alex Behm	68592a82a3	IMPALA-1021: Fix loading of views with decimal and complex-typed columns. Change-Id: I8b63c31be47dd64f1e13fb29be3105b0f7e245dc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3820 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-08-17 12:26:54 -07:00
Ippokratis Pandis	3ee273ae50	Adding support for {anti,left semi,left outer} joins to the partitioned hash join implementation. Adding the "anti join" keyword in the frontend and the corresponding backend paths for the partitioned hash join implementation. Adding some basic testing for this new join (the other types have already tests). Also, fixing a bug in the tuple stream when it was handling strings. Change-Id: Ied8cff96b2bca284a5f66f7d11df5c5b5ec789cc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3805 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins	2014-08-17 12:24:17 -07:00
Lenni Kuff	cd30246f17	Fix flaky group_concat() query tests The ordering of results returned by the group_concat() tests were not deterministic. This fixes the problem by switching the test cases to use a subquery with an order by. Also fixed a similar problem with the limit and union tests. Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-08-12 13:38:12 -07:00
Lenni Kuff	286e312460	[CDH5] Minor code changes for Hive .13 support Changes include: * Fix compile errors due to new column stats API and other stats related fixes. * Temporarily disable JDBC tests due to new serialization format in Hive .13 * Disable view compatibility tests until we can get them to work in Hive .13 * Test fixes due to Hive's type checking for partition column values Change-Id: I05cc6a95976e0e037be79d91bc330a06d2fdc46c	2014-08-11 09:53:02 -07:00
Alex Behm	3111827ae2	IMPALA-1101: Plan sub-trees with no results are implemented by an EmptySetNode. Before: Constant conjuncts used to be registered in the analyzer together with non-constant conjuncts. Since constant conjuncts are not bound by any slot or tuple they were incorrectly placed into whatever plan node called init() first and then were incorrectly marked as assigned. For handling queries with a limit 0 we had special code in the BE. After: Since constant conjuncts do not fit well into the existing slot/tuple based assignment logic this patch treats them specially as follows. Constant that do not originate from the ON clause of an outer join are evaluated directly. Depending on which clause the conjunct came from either the entire query block is marked as returning an empty set (HAVING clause) or the block is marked as having an empty select-project-join portion (ON and WHERE clause). In the latter case, aggregations (if any) must still be performed. The plan sub-trees that are guaranteed to return an empty result set are implemented by an EmptySetNode. Constant conjuncts from the ON clause of an outer are assigned to the node implementing the join. Similarly, query blocks with a limit 0 are marked as returning an empty result, and planned as an EmptySetNode. As a side effect, this patch also fixes: IMPALA-89: Make our behavior of INSERT OVERWRITE ... LIMIT 0 consistent with Hive's. The target table is left empty after such an operation. Change-Id: Ia35679ac0b3a9d94edae7f310efc4d934c1bfb0d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3653 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3800	2014-08-08 04:35:31 -07:00
Nong Li	f0c7947558	IMPALA-1121: Fix joins on decimal columns with different precision/scale. Change-Id: Ibac69763e28ad33ef41d000b5dd74fc73c74b73a Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3726 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3739 Reviewed-by: Nong Li <nong@cloudera.com>	2014-08-04 01:45:40 -07:00
Alex Behm	22858ba7e1	IMPALA-1123: Add casts to the partition exprs of hash-partitioning senders. This patch ensures that all hash-partitioning senders to a hash-partitioned fragment hash on exprs of identical types. Casts are added as necessary. Otherwise, the hashes generated for identical partition values may differ among senders if the partition-expr types are not identical. The new logic is placed into PlanFragment.finalize() in order to avoid repeated re-casting of senders during plan generation, since every time a child fragment is absorbed into a partition-compatible parent we potentially need to add casts to all senders of that fragment again. Change-Id: Id9f581cc03127f64f0631d9b288fab4cd4dd8a82 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3689 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3708	2014-07-31 23:57:08 -07:00
Dan Hecht	285aeda16e	IMPALA-1110: group_concat agg function does not work with optional separator. Rather than omit the first separator in each intermediate result, always include the separator, but also remember the length of the first separator. Then, during finalize, remove whichever separator string ends up at the beginning of the final merged result. Change-Id: I6de7d1cda1a43b8de7d03c6798ec9667ffa457f8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3669 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins (cherry picked from commit c0d7cedb79fe557e22912afc716303b24a9dad0d) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3690 Reviewed-by: Daniel Hecht <dhecht@cloudera.com>	2014-07-31 18:15:16 -07:00
Dan Hecht	09bd8b7c27	Fix SetStmt.toSql(). It needs to handle the "SET" case. Also, add some missing test cases for "SET". Also, cleanup test_set/set.test. Change-Id: I34f6005ef17e196d94366e5301251a2987746fbf Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3620 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins (cherry picked from commit 41890b5a13f9429f058fb12453c78323df11fc7d) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3655	2014-07-30 11:37:11 -07:00
Matthew Jacobs	8258c53478	Disable histogram UDA test for decimal vals due to IMPALA-1111 Change-Id: I21391e671896c9ebe52fd45accc2d290267ee0ac Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3641 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins (cherry picked from commit 5a7ede02ffda4bdbbde4bc56184639fdee0a9857) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3652 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-07-30 02:53:35 -07:00
Dan Hecht	1fee56cb26	IMPALA-1080: Implement "SET <query_option>" as SQL statement. Also add support for "SET", which returns a table of query options and their respective values. The front-end parses the option into a (key, value) pair and then the existing backend logic is used to set the option, or return the result sets. Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins (cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614	2014-07-25 10:25:09 -07:00
Matthew Jacobs	b83aa4984b	Add compute histograms aggregate function Adds an aggregate function to compute equi-depth histograms. The UDA creates a sample of the column values using weighted reservoir sampling and computes the histogram from the sorted sample. TODO: * Extract highly frequent values into separate buckets (i.e. 'compressed histogram'). * Expose separate finalize fn to produce samples and histogram data for stats Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608	2014-07-25 00:21:10 -07:00
Paden Tomasello	67d23c2d4b	Modified Case expression tests in exprs.test Change-Id: I65cee2e14291db8bf14a428715b08dac475b863a Reviewed-on: http://gerrit.ent.cloudera.com:8080/3485 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3601	2014-07-24 12:34:02 -07:00
Alex Behm	19bab59854	Create/alter/describe tables with complex types. This patch adds parsing of complex types and tests for using complex types in various exprs and create/alter/describe stmts. Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594	2014-07-23 17:26:14 -07:00
Lenni Kuff	7157f54bbe	Support DROP STATS <table name> Adds support for dropping all table and column stats from a table. Once incremental stats are supported, this will provide the user a way to force a recompute of all stats. Change-Id: I27e03d5986b64eb91852bfc3417ffa971d432d6b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3533 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit f1f074f24bfdc77c4cef147fe9d26f27df80ab81) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3551	2014-07-21 10:28:16 -07:00
Paden Tomasello	3d173e65d2	Adding Codegen function and tests for CASE expressions. Change-Id: Ib52b3e3f12b35e2c0a60ef94501c20ef83abdfe5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3187 Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3498	2014-07-18 12:03:58 -07:00
Ippokratis Pandis	e1ae5fe95a	IMPALA-1068: COMPUTE STATS should place -1 in #NULLs With IMPALA-1033 we disabled the counting of the number of NULLs in each column, and that gave a 2x speed-up in the computation. But erroneously the value 0 was being placed in the number of NULLs, instead of the correct -1 that indicates 'unknown'. Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378	2014-07-07 15:13:25 -07:00
Matthew Jacobs	65c1a6f21e	Remove SOURCE keyword by parsing as an identifier and checking the value Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier" Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a	2014-06-30 16:47:47 -07:00
Alex Behm	96722da3fe	Fix misplaced comment in testfile. Change-Id: I55dc7d0e8e74a4f8c9a99e9601b2578ef6b0390d Reviewed-on: http://gerrit.ent.cloudera.com:8080/3303 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3317	2014-06-30 10:17:26 -07:00
Skye Wanderman-Milne	3a6600c964	Fix UDF test UDF invocations in udf.test should not specify a database. This is how we switch between testing IR UDFs in the ir_function_test database and native UDFs in the native_function_test database. Change-Id: I09ede18f2b91440ef7a2a76b0daf41a007af2671 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3130 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 4d6160c0b88285aea754f6353cdd02b5e4b15633) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3295	2014-06-26 22:17:56 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Alex Behm	bf85225911	IMPALA-881: Tests for joins with union inputs. Change-Id: I4be6821ac3938345ca95c542d868c87512ff66da Reviewed-on: http://gerrit.ent.cloudera.com:8080/3229 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-06-23 15:38:06 -07:00
Nong Li	a7beb12540	[CDH5] Fix column stats for decimal. Change-Id: I72b31f6431bf6259e759fd290200fd1a755f82c6	2014-06-20 23:03:06 -07:00
Victor Bittorf	2d7f2e19b2	IMPALA 938: Infer schema from Parquet file Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'". Supports all options that CREATE TABLE does. Currently only PARQUET is supported. Run testdata/bin/create-load-data.sh after pulling this patch. Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158	2014-06-20 17:38:01 -07:00
Taras Bobrovytsky	7faaa65996	Added order by query tests - Added static order by tests to test_queries.py and QueryTest/sort.test - test_order_by.py also contains tests with static queries that are run with multiple memory limits. - Added stress, scratch disk and failpoints tests - Incorporated Srinath's change that copied all order by with limit tests into the top-n.test file Extra time required: Serial: scratch disk: 42 seconds test queries sort : 77 seconds test sort: 56 seconds sort stress: 142 seconds TOTAL: 5 min 17 seconds Parallel(8 threads): scratch disk: 40 seconds test queries sort: 42 seconds test sort: 49 seconds sort stress: 93 seconds TOTAL: 3 min 44 sec Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205	2014-06-20 13:35:10 -07:00
Ippokratis Pandis	6026f1ebe1	IMPALA-1055: Compute stats query statements don't quote DB and table names The compute stats statement was not quoting the DB and table names. If those names were aliasing with keywords, then the compute stats would not execute due to a syntax error. Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198	2014-06-20 09:32:52 -07:00
Alex Behm	70d7ff07af	CDH-19856: Disable Hive's stats autogathering. Change-Id: I04e91f91d29b7863848a750e362c9d94469df7f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3156 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3169	2014-06-19 16:48:34 -07:00

1 2 3 4 5 ...

330 Commits