impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Alex Behm	f9d26fb896	IMPALA-2203: Set an InsertStmt's result exprs from the source statement's result exprs. This patch fixes an issue where incorrect results are produced by a CTAS or IAS that is fed from a QueryStmt that has outer-joined inline views with constants or conditionals in the select list. The regression was introduced in this commit: b8f642710ea9d311a7aca32611eaa7cac6cd86df Now that the final expression substitution with TupleIsNullPredicate() wrapping is performed in planning, the InsertStmt's result expressions should be taken from the feeding QueryStmt's result expressions, and not the QueryStmt's (already substituted) base table result expressions. Change-Id: Iae29683638df01f140d0f74976cca8ca9ba0852d Reviewed-on: http://gerrit.cloudera.org:8080/637 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-18 01:44:45 +00:00
Casey Ching	cf60967b7e	IMPALA-1675: Avoid overflow when adding large intervals to TIMESTAMPs It turns out there is a variety of cases where boost incorrectly adds intervals if the interval is at (or beyond) an edge case value. This change defines a max interval and returns NULL if the user supplies an interval beyond the max. Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080 Reviewed-on: http://gerrit.cloudera.org:8080/492 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-16 12:09:24 +00:00
Christopher Channing	9ea5caf0ef	IMPALA-2199: Row count not set for empty partition when spec is used with compute incremental stats This patch resolves an issue where row count is not set to 0 when a partition spec is used with 'compute incremental stats' on a partition that contains no data. The fix is to populate the partition 'expected list' in the frontend with the partition spec, the backend keeps track of which partitions had statistics generated. In the scenario where no statistics are generated for a partition, the backend will fall back to the 'expected list' to zero out the statistics. Change-Id: If4aac131dbe44e14a0477afa58e980da9e235d6b Reviewed-on: http://gerrit.cloudera.org:8080/627 Reviewed-by: Christopher Channing <cchanning@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 09:38:30 +00:00
Dimitris Tsirogiannis	47c5ae405a	Revert "IMPALA-2015: Add support for nested loop join" This reverts commit 6837cdec7f6a7e1c7e8157e323f3ab68277689aa. Change-Id: I2fd6424c553a701fcbfd425b4486af7280820b23 Reviewed-on: http://gerrit.cloudera.org:8080/636 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 02:20:07 +00:00
Sailesh Mukil	8f11fbdd5c	IMPALA-2081: Add PERCENT_RANK, NTILE, CUME_DIST analytic window functions These functions are implemented as rewrites in the analysis stage. They are rewritten as different arithmetic expressions and make use of the existing analytic functions such as 'rank', 'count' and 'row_number' to compute the final results. TODO: IMPALA-2171: NTILE() currently takes only constant expressions. We need to modify it to take non-constant expressions as well in a future patch. Change-Id: I8773df8ceefff27ab66a41169dc4ac0927465191 Reviewed-on: http://gerrit.cloudera.org:8080/584 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-08-07 04:57:37 +00:00
Skye Wanderman-Milne	f000758ca8	IMPALA-2015: Add support for nested loop join Implement nested-loop join in Impala with support for multiple join modes, including inner, outer, semi and anti joins. Null-aware left anti-join is not currently supported. Summary of changes: Introduced the NestedLoopJoinNode class in the FE that represents the nested loop join. Common functionality between NestedLoopJoinNode and HashJoinNode (e.g. cardinality estimation) was moved to the JoinNode class. In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop join execution strategy. Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e Reviewed-on: http://gerrit.cloudera.org:8080/457 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-08-07 02:47:32 +00:00
Alex Behm	480a56e3a0	IMPALA-1737: Substitute an InsertStmt's partition key exprs with the root node's smap. The bug was that we were not substituting the partition key exprs of an InsertStmt with the root plan node's output smap during single-node planning. Change-Id: I16eff4bab0b1d95c7f30fd89b14af2628d6f865f Reviewed-on: http://gerrit.cloudera.org:8080/580 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-03 19:31:51 +00:00
Alex Behm	c908ba1b7e	IMPALA-1136: Support loading Avro tables without an explicit Avro schema Hive allows creating Avro tables without an explicit Avro schema since 0.14.0. For such tables, the Avro schema is inferred from the column definitions, and not stored in the metadata at all (no Avro schema literal or Avro schema file). This patch adds support for loading the metadata of such tables, although Impala currently cannot create such tables (expect a follow-on patch). Change-Id: I9e66921ffbeff7ce6db9619bcfb30278b571cd95 Reviewed-on: http://gerrit.cloudera.org:8080/538 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-31 12:13:37 +00:00
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Sailesh Mukil	8a01527bad	IMPALA-2141: UnionNode::GetNext() doesn't check for query errors When a UDF with constant parameters in the select list calls SetError(), it does not fail the query. This is because UnionNode::GetNext() does not check for errors after UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not report the error. Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a	2015-07-27 22:09:19 -07:00
Alex Behm	3ac341287c	IMPALA-2088: Fix planning of empty union operands with analytics. The check for ignoring empty union operands was simply misplaced. This misplacement resulted in empty union operands not being dropped if the containing UnionStmt had analytic functions. Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3 Reviewed-on: http://gerrit.cloudera.org:8080/554 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 15:46:41 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Sailesh Mukil	c21c080a46	IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception. Changed the way the function context error message is returned. Also, changed the exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException in case of an exception in registerConjuncts(). This commit follows from: `d497ba6cef` This is a new commit since the previous one was closed before making these changes. Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18 Reviewed-on: http://gerrit.cloudera.org:8080/560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 19:04:38 +00:00
Tim Armstrong	5990b43fe2	IMPALA-1898: Explicit aliases + ordinals analysis bug Analysis errors occurred with select queries that combined ordinals in the group by/order by clauses with select list aliases that had the same name as a column in one of the underlying tables. The root cause was a double substitution: e.g. the ordinal 1 in a GROUP BY clause was replaced with the corresponding select list expression, then a reference to column 'x' in an underlying table was replaced erroneously with the select list expression with alias 'x' Change-Id: I0f298290c58f18239e1ff83f0388d037c311f5fb Reviewed-on: http://gerrit.cloudera.org:8080/542 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-07-22 21:23:36 +00:00
Sailesh Mukil	6d7bb76e87	IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception. When a builtin has an error (in the constant case), it is checked for but the state cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the constant case), the error does not propagate back up the stack due to a lack of error checking in ScalarFnCall::Open() after it calls GetConstVal(). Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1 Reviewed-on: http://gerrit.cloudera.org:8080/537 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 04:01:39 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Shant Hovsepian	6d87fe090c	Improve Hll estimate for small cardinalities. Based on Google's HyperLogLog++ paper. Uses a bias correcting interpolation as a sub algorithm for Hll estimates within a specific range. Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9 Reviewed-on: http://gerrit.cloudera.org:8080/415 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-07-16 19:38:17 +00:00
Ippokratis Pandis	7e9f8478e1	Removing duplicate query test Change-Id: Ia8b33ca2a2eadae288acea4bd2111a1a974bc484 Reviewed-on: http://gerrit.cloudera.org:8080/526 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-15 03:28:36 +00:00
Ippokratis Pandis	e99c68fe52	IMPALA-2130: Wrong verification of Parquet file version This patch corrects a mistake in the Parquet magic file number verification and adds a test about it. Note that with this patch Impala may fail to read Parquet files with wrong magic number that it used to read before. Change-Id: Iff31accda1e1d541946ef1f750e38886ce4cb8d5 Reviewed-on: http://gerrit.cloudera.org:8080/515 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-14 02:52:02 +00:00
Martin Grund	51aa077448	IMPALA-2133: Properly unescape string value for HBase filters This patch fixes the problem, that the Frontend would simply pass the escaped value to the backend as an HBase filter and not the unescaped one. Now queries including an escaped character will work as well. Change-Id: I96e544973b523f3ef1abdec86ea1ec5596d9bee9 Reviewed-on: http://gerrit.cloudera.org:8080/520 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-13 18:38:39 +00:00
Ippokratis Pandis	4951f895e7	Nested Types: Reset() for partitioned hash join node TODO: Need to modify Reset()'s functionality in case of NAAJs. Change-Id: I7d0ea0dabd0b3404957e228bbaa51781c5fc34c0 Reviewed-on: http://gerrit.cloudera.org:8080/490 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-08 01:51:09 +00:00
Alex Behm	a274cfd787	Nested Types: Fix self-joining of collection table refs. When referencing the same path in multiple CollectionTableRefs (e.g., self-join on a nested collection), we used to register only a single SlotDescriptor in the root tuple descriptor and share it among those multiple CollectionTableRefs. A collection-typed SlotDescriptor has a single item tuple descriptor, set to the tuple descriptor of the corresponding CollectionTableRef. Therefore, sharing a single collection-typed SlotDescriptor among multiple CollectionTableRefs with the same path does not work (the item tuple desc was arbitrarily set to the last CollectionTableRef's tuple desc). In order to maintain our assumed 1:1 relationship between a table ref and a tuple descriptor, the siple fix for now is to give each CollectionTableRef a new slot in the root tuple descriptor, regardless of its path. We could conceivably allow more intelligent sharing of tuple descriptors for nested collections, but that change is too invasive for now. Change-Id: I2135d026191f51d1daa741455a7e1b0f6905af1e Reviewed-on: http://gerrit.cloudera.org:8080/495 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 06:56:28 +00:00
Ippokratis Pandis	f2c483802f	Nested Types: Reset() for partitioned aggregation node Change-Id: Ia5b4b9b3a7b8e9acb1b614c979cccca615fe2fbe Reviewed-on: http://gerrit.cloudera.org:8080/480 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 00:43:55 +00:00
Ippokratis Pandis	c3a7916812	IMPALA-2065: Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory() If the build side of any partition of PHJ was very large we could end up trying to Init() hash tables that are larger than 1GB. The result was overflows (see IMPALA-1619) and eventually DCHECKS. This patch returns false whenever we try to allocate memory in the BufferedBlockMgr that it is larger than 1GB. Change-Id: Id4590ea434bef4dca7dc3f137cfe7b638ae3d916 Reviewed-on: http://gerrit.cloudera.org:8080/465 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-06-27 01:17:50 +00:00
Alex Behm	569e86a60b	Nested Types: Change ExecNode::Reset() to only clear state and not tuple data. This patch changes the ExecNode::Reset() to: Status ExecNode::Reset(RuntimeState* state); The new Reset() should only clear the internal state of an exec node in preparation for another Open()/GetNext(). Reset() should not clear memory backing rows returned by a node in GetNext() because those rows could still be in flight. Subplan Memory Management: To ensure that the memory backing rows produced by the subplan tree of a SubplanExecNode remains valid for the lifetime a row batch, we intend to use our conventional transfer mechanism. That is, the ownership of memory that is no longer used by an exec node is transferred to an output row batch in GetNext() at a "convenient" point, typically at eos or when the memory usage exceeds some threshold. Note that exec nodes may choose not to transfer memory at eos to amortize the cost of memory allocation over multiple Reset()/Open()/GetNext() cycles. To show the main ideas, this patch fixes transferring of tuple data ownership in several places and implements Reset() for the following nodes: - AnalyticEvalNode - BlockingJoinNode - CrossJoinNode - SelectNode - SortNode - TopNNode - UnionNode To make the transfer of ownership work for SortNode a row batch can now also own a list of BufferdBlockMgr::Block*. Also included are basic query tests that are not meant to be exhaustive. The tests are disabled for now because we cannot run them without several other code changes. I have manually run the test queries on a branch that has all necessary changes. Change-Id: I3ac94b8dd7c7eb48f2e639ea297b447fbf443185 Reviewed-on: http://gerrit.cloudera.org:8080/454 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-23 07:43:22 +00:00
Dimitris Tsirogiannis	2c1f0a4942	IMPALA-1987: Fix TupleIsNullPredicate to return false if no tuples are nullable. This commit fixes the issue where an outer join returns wrong results if the equi-join predicate contains a TupleIssNullPredicate expr. Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4 Reviewed-on: http://gerrit.cloudera.org:8080/445 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 03:37:18 +00:00
ishaan	f327a53c70	Fix metadata/test_load.py to work with Isilon. test_load was using /tmp as the staging directory, which did not cleaned up in Isilon, leading to a build failure. This patch does the following: - use /test-warehouse as the staging directory. - replace calls to the hdfs commandline with calls to the in-house hdfs client. - cleanup the test file and remove duplicates. Additionally, a new method is introduced in the hdfs client to simulate hdfs dfs -cp, i.e, it does a get and a put to mimic the hdfs command line's semantics. Change-Id: I0cc27ab00df5f5ec3138b995144ab45ad622605d Reviewed-on: http://gerrit.cloudera.org:8080/431 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-06-05 00:52:14 +00:00
ishaan	dbc78aaa2c	Enable isilon end to end tests for Impala. This patch introduces changes to run tests against Isilon, combined with minor cleanup of the test and client code. For Isilon, it: - Populates the SkipIfIsilon class with appropriate pytest markers. - Introduces a new default for the hdfs client in order to connect to Isilon. - Cleans up a few test files take the underlying filesystem into account. - Cleans up the interface for metadata/test_insert_behaviour, query_test/test_ddl On the client side, we introduce a wrapper around a few pywebhdfs's methods, specifically: - delete_file_dir does not throw an error if the file does not exist. - get_file_dir_status automatically strips the leading '/' Change-Id: Ic630886e253e43b2daaf5adc8dedc0a271b0391f Reviewed-on: http://gerrit.cloudera.org:8080/370 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-27 22:25:12 +00:00
Shant Hovsepian	69079411bf	Improve distinctpc/sa for small cardinalities. Improving the cardinality estimate for Flajolet and Martin's algorithm used in distinctpc and distinctpcsa. The estimate for small cardinalities is improved by providing a correction hinted to in the original paper. We use the correction constant 1.75 proposed by Scheuermann et al DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications] Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f Reviewed-on: http://gerrit.cloudera.org:8080/395 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-05-24 06:26:47 +00:00
Alex Behm	b3bb0ea525	Fix S3 build v2: Adjust expected SHOW TABLE STATS output. Change-Id: Idc1f255a7d170e6083439220140c5eb895133b22 Reviewed-on: http://gerrit.cloudera.org:8080/382 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-16 02:47:01 +00:00
Casey Ching	ac0c075997	Parquet: Fix value def level when max def level is 0 When running with a release build, NULL would be returned when reading values from required fields in parquet files (with a debug build a DCHECK would be hit). Previously when the max definition level for a field was 0 (which happens if a field is required), the definition level for value was incorrectly set to 1. The max definition level is related to nested data and is defined to be the number of nullable fields that will be encountered when traversing a path to reach the desired end field. For example, if a nested schema has a path a.b.c.d where b and d are nullable then the max def level is 2. A def level is attached to each value to indicate the number of optional values that are present (in the previous example an def level of 2 means both b and d are not null). So having a def level for a value that is greater than the max def level for a field should never happen. Change-Id: Ia91a97cf79e672c420d10416c6817f0930dcc920 (cherry picked from commit cdd67e4c7fd62d5b08adfaa303d7bb2382e6932c) Reviewed-on: http://gerrit.cloudera.org:8080/386 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 06:41:02 +00:00
Skye Wanderman-Milne	7801aa499f	Use codegen to inject runtime constants in exprs This patch introduces the function GetConstant(), which is used by expr compute function and UDFs to access query constants. There is a corresponding GetIrConstant() function that returns the IR versions of the same constants. Currently the only implemented constants are the expr's return type and argument types, but other constants can be easily be added to these functions. Interpreted expr functions run normally, but cross-compiled functions can be passed to InlineConstants(), which looks for calls to GetConstant() and replaces them with the result of calling GetIrConstant(). I used this technique in the decimal functions that previously were not switching on the type at all. The performance of LeastGreatest() after this patch is the same as it was before it switched on the type. Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41 Reviewed-on: http://gerrit.cloudera.org:8080/352 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 02:24:04 +00:00
Alex Behm	5f54b2c4d3	Fix S3 build: Adjust expected SHOW TABLE STATS output. Change-Id: I3fb0c551dfbe53aecd9c0bced3bc29d5a5fa41e5 Reviewed-on: http://gerrit.cloudera.org:8080/375 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 07:55:36 +00:00
zuowang	304d985523	IMPALA-1139: Implement TRUNCATE TABLE statement Synopsis: TRUNCATE [TABLE] [database.]table TRUNCATE quickly removes all rows from a set of tables. TRUNCATE also drops all table and column stats, but preserves HMS partitions and HDFS directories. You must have the INSERT privilege on a table to truncate it. It requires taking the metastoreDdlLock before truncate tables. Examples: TRUNCATE TABLE t1; TRUNCATE t1; Change-Id: I546e4ee0279083f437cdf0e7487faad47957dbf6 Reviewed-on: http://gerrit.cloudera.org:8080/241 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 07:50:34 +00:00
Juan Yu	d1c263402e	IMPALA-1973: Fixing crash when uninitialized, empty row is added in HdfsTextScanner This patch fixes an issue when an uninitialized, empty row is falsely added to the rowbatch. The uninitialized data inside this row leads later on to a crash when the null byte is checked together with the offsets (that contains garbage). The fix is to not only check for the number of materialized columns, but as well for the number of materialized partition key columns. Only if both are empty and the parser has an unfinished tuple, add the empty row. To accommodate for the last row, check in FinishScanRange() if there is an unfinished tuple with materialized slots or materialized partition key. Write the fields if necessary. Change-Id: I2808cc228e62d048d917d3a6352d869d117597ab (cherry picked from commit c1795a8b40d10fbb32d9051a0e7de5ebffc8a6bd) Reviewed-on: http://gerrit.cloudera.org:8080/364 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-05-05 00:19:12 +00:00
Ippokratis Pandis	4d428440d8	IMPALA-1919: Avoid calling ProcessBatch with out_batch->AtCapacity in right joins PHJ::GetNext() of RIGHT_OUTER, RIGHT_ANTI and FULL_OUTER joins that had repartitioned were not checking whether the output batch reached capacity at the OutputUnmatchedBuild() call. In case of repartitioned joins where the list of build_partitions was exhausted and the output batch has already reached capacity, we would call ProcessProbeBatch() with a full output batch, resulting a DCHECK. This patch adds the missing AtCapacity() check. It also adds a new join test (tpch-out-joins) that uses the TPC-H dataset and moves there some of the join tests that were using it. Running join tests with the larger TPC-H dataset is needed, for example, in order to trigger repartitions. Change-Id: I4434ad0683e1b09f75a25b3eb870a817d4988370 Reviewed-on: http://gerrit.cloudera.org:8080/314 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-05-04 19:49:56 +00:00
Henry Robinson	f98a7bee46	IMPALA-1595: Fix exhaustive test failure Change-Id: I49db59c936c105295b7159bb1c06558b127d4c4a Reviewed-on: http://gerrit.cloudera.org:8080/353 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-04-23 19:46:31 +00:00
Dimitris Tsirogiannis	dd5ecb9deb	IMPALA-1960: Illegal reference to non-materialized tuple when query has an empty select-project-join block This commit fixes an issue where an aggregation expr may reference a non-materialized slot if the query contains an empty select-project-join block. This fix ensures that all the exprs in an aggregation reference materialized slots/tuples. Change-Id: Ic2cc9818061b3f06ab1d1cebf4e604352c2df6d1 Reviewed-on: http://gerrit.cloudera.org:8080/348 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-21 23:29:14 +00:00
Henry Robinson	f22b8659fd	IMPALA-1595: Add 'location' to SHOW [TABLE STATS\|PARTITIONS] for HDFS tables This patch adds a 'location' column to the output of SHOW TABLE STATS / SHOW PARTITIONS. This helps users understand the effects of ALTER TABLE SET LOCATION commands, particularly for partitions, and is easier to identify than the output of DESCRIBE FORMATTED. Some existing tests in alter-table.test have been updated to include checking the location output before and after a SET LOCATION command. The tests in show.test have also been updated to check for the location; all other tests that use SHOW [TABLE STATS\|PARTITIONS] use a generic regex to avoid overly verbose tests. Change-Id: I9d276f7b133c38c9319e0906397ca1c31cec95bb Reviewed-on: http://gerrit.cloudera.org:8080/316 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-04-21 19:27:50 +00:00
Alex Behm	7067a5d94d	IMPALA-1519: Fix wrapping of exprs via a TupleIsNullPredicate with analytics. The bug: Analytic functions introduced a few challenges in properly wrapping exprs with TupleIsNullPredicates when substituting exprs from outer-joined inline views. 1. The logical to physical tuple mapping during the plan generation of analytics invalidated the tuple ids originally set in upstream TupleIsNullPredicates introduced during analysis (e.g., in the result exprs). 2. TupleIsNullPredicates require specific tuple ids for evaluation. Since sort nodes materializes a new tuple, it's impossible to evaluate TupleIsNullPredicates referring to a sort's input after the sort. Non-analytic sorts handle this case during analysis by materializing the result of that select block. However, analytic sorts used to only materialize the slots of materialized tuple ids of the input plan node. The fixes: 1. Move the TupleIsNullPredicate wrapping from the inline-view analysis into the inline-view planning. This avoids the original problem because all physical output tuples are known during plan generation. This simple change has a few subtle consequences: First, we must rely on the plan root's output smap for substituting the final result exprs, and not use the top-level base table smap generated during analysis. Second, during plan generation we must use an inline view's smap (and not its base table smap) for generating the output smap of its plan such that we can properly wrap the rhs exprs in TupleIsNullPredicates at every level. This change also fixes IMPALA-1946 by deferring the TupleIsNullWrapping to planning time. 2. To preserve the information whether an input tuple was null or not at an anlytic sort, we materialize TupleIsNullPredicates, which are then substituted by a SlotRef into the sort's tuple in ancestor nodes. This patch also cleans up and consolidates the code used for wrapping exprs into TupleIsNullPredicate itself. Change-Id: I5c6d142bdf9c99ece2a564e557d4ffe22ac90865 Reviewed-on: http://gerrit.cloudera.org:8080/317 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-04-14 23:33:20 +00:00
Dimitris Tsirogiannis	d8e5bbe2da	IMPALA-1949: Analysis exception when a binary operator contain an IN operator with values This commit fixes an issue where a query is not successfully analyzed if an IN operator with values appears in a binary predicate. Change-Id: Ia3b83803a553b9a3b3489382fc53978a720c4b4f Reviewed-on: http://gerrit.cloudera.org:8080/334 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-14 03:54:33 +00:00
Dimitris Tsirogiannis	4eceeacf16	IMPALA-1550: Invalid rewrite when EXISTS subqueries contain aggregate functions This commit fixes an issue where a [NOT] EXISTS subquery that contains an aggregate function will sometimes be incorrectly rewritten into a join, thereby returning incorrect results. Change-Id: I18b211d76ee3de77d8061603ff5bb1fbceae2e60 Reviewed-on: http://gerrit.cloudera.org:8080/266 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-04-02 19:11:00 +00:00
Juan Yu	e121bc9b0a	IMPALA-1476: Impala incorrectly handles text data missing a newline on the last line. I did a local benchmark and there's minimal performance impact(<1%) Change-Id: I8d84a145acad886c52587258b27d33cff96ea399 (cherry picked from commit 7e750ad5d90007cc85ebe493af4dce7a537ad7c0) Reviewed-on: http://gerrit.cloudera.org:8080/189 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 19:58:50 -07:00
Skye Wanderman-Milne	9d6586cdb8	Addendum to IMPALA-1755 patch This patch introduces SetLookup functionality for timestamp and decimal types, as well addressing remaining code review comments. Change-Id: Ied40d2d55adbdea891ff2ab97b30f0d3986645f9 Reviewed-on: http://gerrit.cloudera.org:8080/245 Tested-by: Internal Jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2015-03-20 14:37:23 -07:00
Matthew Jacobs	e8527ddb8e	IMPALA-1888: FIRST_VALUE may produce incorrect results with preceding windows Fixes a bug where FIRST_VALUE may produce incorrect results (or a DCHECK failure in debug) when there is a window like "ROWS X PRECEDING Y PRECEDING", such that X < Y and X > the size of a partition. For windows with an end boundary that is PRECEDING (i.e. the entire window is before a row), there is some special handling between partitions, and the logic was not correct in some corner cases for FIRST_VALUE. Change-Id: Ied5d440684e99dcaf60b47489c90300891f09b91 Reviewed-on: http://gerrit.cloudera.org:8080/236 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 14:37:19 -07:00
Skye Wanderman-Milne	5118c55a0a	IMPALA-1810: IN predicate was not comparing DecimalVals correctly The IN predicate wasn't using the decimal type when comparing decimal values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a query with a huge decimal IN predicate) and there is a ~5% performance degradation with codegen enabled (surprisingly, there appears to be a slight performance gain with codegen disabled). We should be able to remove this penalty when we add constant injection via codegen. Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd Reviewed-on: http://gerrit.cloudera.org:8080/230 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-03-20 14:37:18 -07:00
Alex Behm	745e64a096	IMPALA-1837: Handle truncation when implicitly casting a literal to a decimal. Implicit casting to decimals allows truncating digits from the left of the decimal point (see TypesUtil). A literal that is implicitly cast to a decimal with truncation is wrapped into a CastExpr so the BE can evaluate it and report a warning. This behavior is consistent with casting/overflow of non-constant exprs that return decimal. IMPALA-1837: Without the CastExpr wrapping, such literals can exceed the max expected byte size sent to the BE in toThrift(). Change-Id: Icd7b8751b39b8031832eec04bd8eac7d7000ddf8 Reviewed-on: http://gerrit.cloudera.org:8080/195 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 19:58:58 -07:00
Ippokratis Pandis	e36c436fa6	Adding tests with right joins and duplicates Those tests were added as part of the new hash table implementation, as we didn't have tests with right joins and duplicates (and other conjuncts) as well as aggregation distinct queries with group bys on multiple columns. Adding them as a separate patch, To improve testing coverage in the 2.2 release branch. Change-Id: Id1b4f27fa6e587b2031635974ac9d2d39a1b015a Reviewed-on: http://gerrit.cloudera.org:8080/193 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 16:39:40 -07:00
Dan Hecht	25b54eac1e	S3: Fix test_multiple_filesystems.py The filesizes changed slightly, causing the S3 CI build to fail. Let's regex the file sizes in the compute stats expected results. Change-Id: Ie95bdf3a253a28aa2b6f3deb281948780ca2cc6a Reviewed-on: http://gerrit.cloudera.org:8080/200 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Dan Hecht <dhecht@cloudera.com>	2015-03-11 16:39:39 -07:00
Dan Hecht	2916132283	S3: enable more tests for S3 As needed, fix up file paths and other misc things to get more test cases running against S3. Change-Id: If4eaf9200f2abd17074080a37cd0225d977200ad Reviewed-on: http://gerrit.cloudera.org:8080/167 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 16:39:39 -07:00

1 2 3 4 5 ...

531 Commits