impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 21:00:54 -05:00

Author	SHA1	Message	Date
Jim Apple	1a3d7ffd4f	IMPALA-2147: Support IS [NOT] DISTINCT FROM and "<=>" predicates Enforces that the planner treats IS NOT DISTINCT FROM as eligible for hash joins, but does not find the minimum spanning tree of equivalences for use in optimizing query plans; this is left as future work. Change-Id: I62c5300b1fbd764796116f95efe36573eed4c8d0 Reviewed-on: http://gerrit.cloudera.org:8080/710 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 05:45:22 +00:00
Zuo Wang	fc17169006	IMPALA-2654: Fix formatting when describing deeply nested complex types. Change-Id: I60955e2efd16ccad1ea7b909c1e68e4fb4624086 Reviewed-on: http://gerrit.cloudera.org:8080/1709 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-01-13 12:37:32 +00:00
Matthew Jacobs	52f88fbd75	IMPALA-2829: SEGV in AnalyticEvalNode touching NULL input_stream_ input_stream_ is now set to NULL on eos, but execution can continue through the function to l753 which touches input_stream_ without checking for NULL. This occurs occasionally when the transfer of memory from the mem pool to the output row batch occurs at eos. Change-Id: I8ca88ef10d48e19cfde7f3c6de9512eefcae561e Reviewed-on: http://gerrit.cloudera.org:8080/1757 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-01-12 06:10:05 +00:00
Alex Behm	51d59354a8	Fix expected output of show.test to work on S3. Change-Id: I7e24c1a875126320717f636b6e4c91bf5d2aa979 Reviewed-on: http://gerrit.cloudera.org:8080/1758 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2016-01-10 23:22:44 +00:00
Alex Behm	85de03f294	Fix bad 'show files' test that relies on previous tests. Fixes tests in show.test that executes 'show files' on the 'insert_string_partitioned' table in our functional db. The expected output relied on modifications to 'insert_string_partitioned' made in test_insert.py. Tests should not rely on the overall ordering of test execution. This patch also fixes 'show files' to produce a consistently ordered output. Change-Id: Ic736b94b70677b0e3f4f8a9838ffdfdde2ba17ab Reviewed-on: http://gerrit.cloudera.org:8080/1748 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-01-08 10:53:45 +00:00
Alex Behm	f09050bedd	IMPALA-2801: Modify HBase planner test to use a table that has stats computed. The problem was that a recently added planner test was assuming stats were computed for functional_hbase.alltypes, but we purposely do not compute stats for that table. The fix is to use the alltypessmall table instead. The modified test still covers the same issue. Change-Id: I043c485489c7868b4f320048eb383943627f620b Reviewed-on: http://gerrit.cloudera.org:8080/1705 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-01-06 02:58:49 +00:00
Alex Behm	1a4a830a6d	IMPALA-2776: Remove escapechartesttable and associated tests. The original purpose of the escapechartesttable was to test Impala's behavior on text tables that have the same character as line terminator and escape character. Recent changes in Hive have made creating such a table impossible because 1) Only newline is allowed as the line terminator 2) Newline is forbidden as the escape character See HIVE-11785 for details on the Hive changes. This commit removes escapechartesttable and all associated tests, but does not add the same enforcement rules as Hive. These enforcement rules should be added in a follow-on change. Change-Id: I2bd9755f4c2cc3d7dfd8d67c3759885951550f08 Reviewed-on: http://gerrit.cloudera.org:8080/1690 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2016-01-05 06:04:41 +00:00
Martin Grund	89113544cd	Fix targeted perf queries to deal with run-workload.py limitations. This patch fixes the comment style in the queries to work properly with the limitations of the run-workload.py script. This includes removing quotes and + from comments that otherwise get interpreted. Change-Id: I791e7bd4145717aa0628c56b93582cd207195039 Reviewed-on: http://gerrit.cloudera.org:8080/1689 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2016-01-05 00:52:38 +00:00
Bharath Vissapragada	cc1dac1158	IMPALA-2695: Fix GRANTs on URIs with uppercase letters This commit fixes a bug where catalog incorrectly converts all sentry authorizables to lower case. Since hdfs URIs are case sensitive, this bug can result in incorrect grants when the URIs have uppercase letters. Change-Id: I642c34d9046729dc904cc45871c7d7959ae828bc Reviewed-on: http://gerrit.cloudera.org:8080/1675 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2015-12-29 12:14:16 +00:00
Mostafa Mokhtar	f79a021cce	Add targeted perf queries for nightly performance runs Tests were tuned to run on a 9x node cluster with 64GB RAM against TPC-H 300GB database. Change-Id: Ib421bcd463d370f795a235b755aeb24a6a70f705 Reviewed-on: http://gerrit.cloudera.org:8080/1394 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-29 05:04:10 +00:00
Bharath Vissapragada	775d464fdc	IMPALA-2765: Preserve return type of subexpressions substituted in isTrueWithNullSlots() This commit fixes the issue where queries with outerjoins and case expressions in predicates can fail with an AnalysisException. This is due to the method isTrueWithNullSlots() not preserving the return types of sub-expressions during substitution with null literals and the resulting predicate can fail to analyze even though the original predicate succeeds. The fix is to preserve the type of each slot in the predicate that we subsitute with the null literal. Change-Id: I8cd827b460620355db6fd518464418e701a724f1 Reviewed-on: http://gerrit.cloudera.org:8080/1656 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2015-12-19 09:52:51 +00:00
Michael Ho	e01ab4f1b2	IMPALA-2620: FunctionContext::Allocate() and friends should check for memory limits. FunctionContext::Allocate(), FunctionContextImpl::AllocateLocal() and FunctionContext::Reallocate() allocate memory without taking memory limits into account. The problem is that these functions invoke FreePool::Allocate() which may call MemPool::Allocate() that doesn't check against the memory limits. This patch fixes the problem by making these FunctionContext functions check for memory limits and set an error in the FunctionContext object if memory limits are exceeded. An alternative would be for these functions to call MemPool::TryAllocate() instead and return NULL if memory limits are exceeded. However, this may break some existing external UDAs which don't check for allocation failures, leading to unexpected crashes of Impala. Therefore, we stick with this ad hoc approach until the UDF/UDA interfaces are updated in the future releases. Callers of these FunctionContext functions are also updated to handle potential failed allocations instead of operating on NULL pointers. The query status will be polled at various locations and terminate the query. This patch also fixes MemPool to handle the case in which malloc may return NULL. It propagates the failure to the callers instead of continuing to run with NULL pointers. In addition, errors during aggregate functions' initialization are now properly propagated. Change-Id: Icefda795cd685e5d0d8a518cbadd37f02ea5e733 Reviewed-on: http://gerrit.cloudera.org:8080/1445 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-12-19 04:45:55 +00:00
Alex Behm	aec7ad9050	IMPALA-2144: Add regression test. Bug was fixed in 892abca. Change-Id: I163566ff6141b5d810b00082167a379ce86049bd Reviewed-on: http://gerrit.cloudera.org:8080/1515 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 23:44:35 +00:00
Tim Armstrong	ab3e9f19bf	IMPALA-783: view support for show create table SHOW CREATE TABLE now supports views. It returns a CREATE VIEW statement with column names and the original sql statement. Authorization allows SHOW CREATE TABLE to be run on view if the user has VIEW_METADATA privilege on the view and SELECT privilege on all underlying views and table. E.g. "SHOW CREATE TABLE some_view" returns output of form: CREATE VIEW a_database.some_view (id, bool_col, tinyint_col) AS SELECT id, bool_col, tinyint_col FROM functional.alltypes Change-Id: Id633af2f5c1f5b0e01c13ed85c4bf9c045dc0666 Reviewed-on: http://gerrit.cloudera.org:8080/713 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 03:28:32 +00:00
Michael Giambalvo	cf9d2485dd	Update Impala to HBase 1.0 APIs. Remove all use of deprecated HBase APIs and bring everything up to 1.0. This work is done to enable use of the Google Cloud Bigtable HBase client, which does not support the deprecated APIs. However, nothing in this change depends on Cloud Bigtable, and should work fine for HBase 1.x and greater. This involves two major changes. 1) HBase is trying to move away from unmanaged connections. Thus, catalogd and the backend are updated to create a Connection object which is used to create the Table client objects. Since the Connection object owns a threadpool, there is no need to create a separate ExecutorService and pass it to the HTables on creation. 2) Instead of reading the size on disk of the different Region servers, we use a single call to the HBase master to get the ClusterStatus, which contains the storefile sizes for all the region servers the master knows about. This enables the HBase table implementation to work with Cloud Bigtable, which of course does not keep its data in HDFS. To use the Cloud Bigtable driver, simply update hbase-site.xml to set the "hbase.client.connection.impl" property to the appriopriate Cloud Bigtable client implementation class . Further details can be found here: https://cloud.google.com/bigtable/docs/connecting-hbase Tested by running: py.test tests/query_test/test_hbase_queries.py \ --exploration_strategy=exhaustive Change-Id: I6c758502126884670bb6dd3153aea5aa5b41aab6 Reviewed-on: http://gerrit.cloudera.org:8080/775 Readability: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-12-15 04:45:50 +00:00
Vlad Berindei	b6c20b2a40	Allow Impala to run against local filesystem. Allow Impala to start only with a running HMS (and no additional services like HDFS, HBase, Hive, YARN) and use the local file system. Skip all tests that need these services, use HDFS caching or assume that multiple impalads are running. To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has permissions since this is the location where the test data will be extracted. Test coverage (with core strategy) in comparison with HDFS and S3: HDFS 1348 tests passed S3 1157 tests passed Local Filesystem 1161 tests passed Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03 Reviewed-on: http://gerrit.cloudera.org:8080/1352 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins Readability: Alex Behm <alex.behm@cloudera.com>	2015-12-05 06:48:32 +00:00
Tim Armstrong	a280b93a37	IMPALA-2070: include the database comment when showing databases As part of change, refactor catalog and frontend functions to return TDatabase/Db objects instead of just the string names of databases - this required a lot of method/variable renamings. Add test for creating database with comment. Modify existing tests that assumed only a single column in SHOW DATABASES results. Change-Id: I400e99b0aa60df24e7f051040074e2ab184163bf Reviewed-on: http://gerrit.cloudera.org:8080/620 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-12-04 01:40:41 +00:00
Michael Ho	ba0bd1d0da	IMPALA-2612: Free local allocations once for every row batch when building hash tables. When building hash tables for the build side in partitioned hash join or aggreagtion, we will evaluate the build or probe side expressions to compute the hash values for each TupleRow. Evaluation of certain expressions (e.g. CastToChar) requires "local" memory allocation. "Local" memory allocation is supposed to be freed after processing each row batch. However, the calls to free local allocations are missing in PartitionedHashJoinNode::BuildHashTableInternal() and PartitionedAggregationNode::ProcessStream(). This causes all "local" memory allocation to accumulate potentially for the entire duration of the query or until GetNext() is called. This may lead to unnecessary memory allocation failure as memory limit is exceeded. This patch calls ExecNode::FreeLocalAllocations() at least once per row-batch when building hash tables. It also adds the missing checks for the query status in the loop building hash tables. Please note that QueryMaintenance() isn't called due to its overhead in memory limit checks. Change-Id: Idbeab043a45b0aaf6b6a8c560882bd1474a1216d Reviewed-on: http://gerrit.cloudera.org:8080/1448 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-11-26 03:21:46 +00:00
Alex Behm	096073472f	IMPALA-1459: Fix migration/assignment of On-clause predicates inside inline views. The bug: Our predicate assignment logic used to rely on a flag isWhereClauseConjunct set in Exprs to determine whether assigning a predicate at a certain plan node was correct or not. This flag was intended to distinguish between predicates from the On-clause of a join and other predicates, and the bug was that we used !isWhereClauseConjunct to imply that a predicate originated from an On-clause. For example, predicates migrated into inline views apply to the post-join, post-aggregation, post-analytic result of the inline view, but the existing flag and logic were insufficient to correctly assign predicates coming from the On-clause of an enclosing block. This patch removes the isWhereClauseConjunct flag in favor of an isOnClauseConjunct flag which can more directly capture the originally intended logic. This patch also adds a test to cover the same issue reported in IMPALA-2665. Change-Id: I9ff9086b8a4e0bd2090bf88319cb917245ae73d2 Reviewed-on: http://gerrit.cloudera.org:8080/1453 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-11-20 04:53:28 +00:00
Tim Armstrong	643c800d62	IMPALA-2473: reduce scanner memory usage This patch reduces memory usage of scanners by adjusting how batch capacity is checked and handled and by freeing unneeded memory. Change RowBatch::AtCapacity(MemPool) so that batches with no rows cannot hold onto an unbounded amount of memory - instead they will pass these batches up operator tree so that the resources can be freed. The Parquet scanner also only checked capacity every 1024 rows. With large rows (e.g. nested collections), it can overrun the intended 8mb limit. It also didn't include the MemPool usage in its checks. After the change the scanner will produce smaller batches if rows contain large nested collections or strings. I benchmarked this with a scan of the nested TPC-H customers tables. The row batch sized decrease from ~16MB to ~8MB. If the nested collections were larger this would be more drastic. Also pass at capacity up the tree if no rows passed the conjuncts in the DataSourceScanNode and Parquet scanner so that resources can be freed. HdfsTableSink is modified to avoid the incorrect assumption that a batch only has 0 rows at eos. It is also refactored to pass a related flag as an argument to make the semantics clearer. Two simple benchmarks (one column and many columns) shows no change in scanner performance: > set num_scanner_threads=1; > select count(l_orderkey) from biglineitem; > select count(l_orderkey), count(l_partkey), count(l_suppkey), count(l_returnflag), count(l_quantity), count(l_linenumber), count(l_extendedprice), count(l_linestatus), count(l_shipdate), count(l_commitdate) from biglineitem; Change-Id: I3b79671ffd3af50a2dc20c643b06cc353ba13503 Reviewed-on: http://gerrit.cloudera.org:8080/1239 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-11-19 22:57:05 +00:00
Amos Bird	12e57c5b32	IMPALA-2196: Implement DESCRIBE DATABASE [FORMATTED\|EXTENDED] <db_name>. This commmit implements DESCRIBE DATABASE [FORMATTED\|EXTENDED] <db_name>. Without FORMATTED\|EXTENDED this statement only prints database's location and comment. With FORMATTED\|EXTENDED it will output all the properties of the database (e.g. OWNER and PARAMETERS). Currently we only retrieve privileges stored in hive metastore. This commit also implements DESCRIBE EXTENDED <table>, which is the same as DESCRIBE FORMATTED <table> for consistency purpose. Change-Id: I2a101ec0e3d27b344fcb521eb00e5bdbcbac8986 Reviewed-on: http://gerrit.cloudera.org:8080/804 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-11-09 21:13:38 +00:00
Dan Hecht	82c7dbe265	IMPALA-2559: Fix check failed: sorter_runs_.back()->is_pinned_ Status was ignored by the callers of CreateMerger(), which caused the sorter to continue in a bad state, leading to this DCHECK. Change-Id: I1aae11d142cdbf3923289768e296dbf340f049e9 Reviewed-on: http://gerrit.cloudera.org:8080/1367 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-11-07 01:50:57 +00:00
Michael Ho	eb479d040e	IMPALA-1714: Check if tables or partitions specified in "CREATE/ALTER TABLE ... CACHE IN ..." statements are cacheable. This change adds more analysis checks to verify the location of the table or partition specified in a "CREATE/ALTER TABLE ... CACHED IN ..." statement can actually be cached. Caching is only supported for HDFS locations. If table-wide caching is enabled for a table, adding a partition at an uncacheable location will be disallowed for that table unless the attribute "UNCACHED" is explicitly specified. Enabling table-wide caching for a table at an uncacheable location or a table with partitions at uncacheable locations will also be disallowed. However, caching can still be enabled for individual partitions whose underlying locations support caching. Change-Id: I2299c9285126f4b035360f2ef902147188ccd5f1 Reviewed-on: http://gerrit.cloudera.org:8080/1373 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-11-06 23:30:14 +00:00
Bharath Vissapragada	084b9b1692	IMPALA-2432: Add query endtime to impalad's lineage This commit adds query endtime to impalad's lineage log entries consumed by navigator. The lineage graph is constructed in the frontend and is then passed to the backend as a serialized thrift object. When the query terminates (includes cancellations and aborts), the backend appends the query endtime ("endTime") to the lineage graph and generates the lineage log entry in JSON format. Change-Id: I2236e98895ae9a159ad6e78b0e18e3622fdc3306 Reviewed-on: http://gerrit.cloudera.org:8080/934 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2015-11-04 08:39:12 +00:00
Skye Wanderman-Milne	dd2eb951d7	IMPALA-2558: DCHECK in parquet scanner after block read error There was an incorrect DCHECK in the parquet scanner. If abort_on_error is false, the intended behaviour is to skip to the next row group, but the DCHECK assumed that execution should have aborted if a parse error was encountered. This also: - Fixes a DCHECK after an empty row group. InitColumns() would try to create empty scan ranges for the column readers. - Uses metadata_range_->file() instead of stream_->filename() in the scanner. InitColumns() was using stream_->filename() in error messages, which used to work but now stream_ is set to NULL before calling InitColumns(). Change-Id: I8e29e4c0c268c119e1583f16bd6cf7cd59591701 Reviewed-on: http://gerrit.cloudera.org:8080/1257 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-10-30 22:35:57 +00:00
Jim Apple	19b6bf0201	IMPALA-2226: Throw AnalysisError if table properties are too large (for the Hive metastore) This only enforced the defaults in Hive. Users how manually choose to change the schema in Hive may trigger these new analysis exceptions in this commit unnecessarily. The Hive issue tracking the length restrictions is https://issues.apache.org/jira/browse/HIVE-9815 Change-Id: Ia30f286193fe63e51a10f0c19f12b848c4b02f34 Reviewed-on: http://gerrit.cloudera.org:8080/721 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2015-10-29 19:39:25 +00:00
Michael Ho	34a94c2503	IMPALA-2404: Implements built-in function regexp_match_count This patch implements a new built-in function regexp_match_count. This function returns the number of matching occurrences in input. The regexp_match_count() function has the following syntax: int = regexp_match_count(string input, string pattern) int = regexp_match_count(string input, string pattern, int start_pos, string flags) The input value specifies the string on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match. It is set to 1 by default if it's not specified. The flags value (if specified) dictates the behavior of the regular expression matcher: m: Specifies that the input data might contain more than one line so that the '^' and the '$' matches should take that into account. i: Specifies that the regex matcher is case insensitive. c: Specifies that the regex matcher is case sensitive. n: Specifies that the '.' character matches newlines. By default, the flag value is set to 'c'. Note that the flags are consistent with other existing built-in functions (e.g. regexp_like) so certain flags in IBM netezza such as 's' are not supported to avoid confusion. Change-Id: Ib33ece0448f78e6a60bf215640f11b5049e47bb5 Reviewed-on: http://gerrit.cloudera.org:8080/1248 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-10-27 10:11:13 +00:00
Michael Ho	f0c2742641	IMPALA-2004: Implement "SHOW CREATE" for udfs and udas. This patch extends the SHOW statement to also support user-defined functions and user-defined aggregate functions. The syntax of the new SHOW statements is as follows: SHOW CREATE [AGGREGATE] FUNCTION [<db_name>.]<func_name>; <db_name> and <func_name> are the names of the database and udf/uda respectively. Sample outputs of the new SHOW statements are as follows: Query: show create function fn +------------------------------------------------------------------+ \| result \| +------------------------------------------------------------------+ \| CREATE FUNCTION default.fn() \| \| RETURNS INT \| \| LOCATION 'hdfs://localhost:20500/test-warehouse/libTestUdfs.so' \| \| SYMBOL='_Z2FnPN10impala_udf15FunctionContextE' \| \| \| +------------------------------------------------------------------+ Query: show create aggregate function agg_fn +------------------------------------------------------------------------------------------+ \| result \| +------------------------------------------------------------------------------------------+ \| CREATE AGGREGATE FUNCTION default.agg_fn(INT) \| \| RETURNS BIGINT \| \| LOCATION 'hdfs://localhost:20500/test-warehouse/libudasample.so' \| \| UPDATE_FN='_Z11CountUpdatePN10impala_udf15FunctionContextERKNS_6IntValEPNS_9BigIntValE' \| \| INIT_FN='_Z9CountInitPN10impala_udf15FunctionContextEPNS_9BigIntValE' \| \| MERGE_FN='_Z10CountMergePN10impala_udf15FunctionContextERKNS_9BigIntValEPS2_' \| \| FINALIZE_FN='_Z13CountFinalizePN10impala_udf15FunctionContextERKNS_9BigIntValE' \| \| \| +------------------------------------------------------------------------------------------+ Please note that all the overloaded functions which match the given function name and category will be printed. This patch also extends the python test infrastructure to support expected results which include newline characters. A new subsection comment called 'MULTI_LINE' has been added for the 'RESULT' section. With this comment, a test can include its multi-line output inside [ ] and the content inside [ ] will be treated as a single line, including the newline character. Change-Id: Idbe433eeaf5e24ed55c31d905fea2a6160c46011 Reviewed-on: http://gerrit.cloudera.org:8080/1271 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-10-23 05:11:07 +00:00
Martin Grund	65772cf9ce	IMPALA-2527: Disable small-query optimization for collection types When scanning tables with collection types the limit applied to the scan node is not sufficient enough to safely enable the small-query optimization. This patch adds an additional check to the MaxRowsProcessedVisitor that will abort checking the number of processed rows once a scan node accesse a collection type. Change-Id: Ic43baf3f97acfb8d7b53b0591c215046179d18b3 Reviewed-on: http://gerrit.cloudera.org:8080/1235 Reviewed-by: Silvius Rus <srus@cloudera.com> Tested-by: Internal Jenkins	2015-10-14 17:51:38 -07:00
Michael Ho	26690ff1cd	IMPALA-2204: Underscore in like predicate does not work for multi-line text This patch fixes the option passed to the RE2 regex matcher so that it will count the newline character '\n' as a valid candidate for '.'. Previously, that option was set to false by default, causing multi-line text to fail to match against patterns with wildcard characters in it. This patch also adds some tests to address these cases and fixes some typos in like-predicate.h. Change-Id: I25367623f587bf151e4c87cc7cb6aec3cd57e41a Reviewed-on: http://gerrit.cloudera.org:8080/1172 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins	2015-10-14 19:16:48 +00:00
Dan Hecht	84c4c2ce86	IMPALA-2480, IMPALA-2519: Don't force IO-buffer on probe side when spilling PHJ This fixes a regression introduced with: IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce min mem needed for PAGG/PHJ Prior to that change, as soon as any partition's stream overflowed its small buffers, all partitions' streams would be switched immediately to IO-buffers, which would be satisfied by the initial buffer "reservation". After that change, individual streams are switched to IO-buffers on demand as they overflow their small buffers. However, that change also made it so that Partition::Spill() would eagerly switch that partition's streams to IO-buffers, and fail the query if the buffer is not available. The buffer may not be available because the reserved buffers may be in use by other partition's streams. We don't need to fail the query if the switch to IO-buffers in Partition::Spill() fails. Instead, we should just let the streams switch on demand as they fill up the small buffers. When that happens, if the IO buffer is not available, then we already have a mechanism to pick partitions to spill until we can get the IO-buffer (in the worst case it means working our way back down to the initial reservation). See AppendRowStreamFull() and BuildHashTables(). The symptom of this regression was that some queries would fail at a lower memory limit than before. Also revert the max_block_mgr_memory values back to their originals. Additional testing: loop custom_cluster/spilling.py. We should also remeasure minimum memory required by queries after this change. Change-Id: I11add15540606d42cd64f2af99f4e96140ae8bb5 Reviewed-on: http://gerrit.cloudera.org:8080/1228 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:08 -07:00
Matthew Jacobs	16759d7989	IMPALA-2529: expr test case fails on non-partitioned HJ Failure due to an issue with NULL tuples (IMPALA-2375) where NULL tuples come from the right side of a left outer join where the right side comes from an inline view which produces 0 slots (e.g. the view selects a constant). The HJ doesn't handle them correctly because the planner inserts an IsTupleNull expr. This isn't an issue for the PHJ because the BufferedTupleStream returns non-NULL Tuple* ptrs even for tuples with no slots. Per IMPALA-2375, we're going to address this after 2.3, so moving this test case into joins-partitioned.test which only runs on the PHJ. Change-Id: I64cb7e8ffd60f3379aa8860135db5af8e66d686f Reviewed-on: http://gerrit.cloudera.org:8080/1231 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:05 -07:00
Dan Hecht	0144fc3be6	IMPALA-2530: fix flaky test_analytic_fns test case Commit: IMPALA-2265: Sorter was not checking the returned Status of PrepareRead added a new test case that sets a mem_limit. The mem_limit was calibrated against functional_parquet tables, but during exhaustive this test is run against other formats. Other scanners will use different amounts of memory causing this test to fail in exhaustive. Fix the query to always run exactly as it was tuned. Change-Id: I8140653825cb4f303ad569f087757148c756e42d Reviewed-on: http://gerrit.cloudera.org:8080/1230 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:04 -07:00
Sailesh Mukil	c66ea8ad03	IMPALA-2514: DCHECK on destroying an ExprContext If an error occurs while processing an Expr, only that ExprContext is closed and the others are still left open. This leads to a DCHECK during teardown of the query because the destructor of ExprContext expects it to be closed. In this patch, we close all the ExprContexts if an error occurs in any one. Change-Id: Ic748bfd213511c314c59594d075048f6c6d82073 Reviewed-on: http://gerrit.cloudera.org:8080/1222 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-10-12 14:41:00 -07:00
Skye Wanderman-Milne	b669800432	IMPALA-2484: make array allocation failure test less flaky This query was occasionally succeeding. This patch lowers the mem limit so it's is less likely to succeed. It also adds the specific "Failed to allocate buffer" message to the expected error so we don't accidentally lose coverage. This test could still become flaky in the future (I'm not sure 4mb is guaranteed to work every time, plus it could change in the future), but I can't think of a better solution than to continue adjusting it or only rely on more targetted unit tests. Change-Id: Id585f0e3b2c77a0279efffe9dd8b8b4472225730 Reviewed-on: http://gerrit.cloudera.org:8080/1207 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-09 16:47:48 -07:00
Skye Wanderman-Milne	cfe1e38d6e	IMPALA-2495: make Expr::IsConstant() recurse on children Before, Expr::IsConstant() manually specified the constant Expr classes, but TupleIsNullPredicate and AnalyticExpr overrode IsConstant() to always return false (which Expr::IsConstant() didn't specify). This meant that unless the TupleIsNullPredicate was the root expr, TupleIsNullPredicate::IsConstant() would never be called and Expr::IsConstant() would return true. This patch changes Expr::IsConstant() to recurse on its children, rather than having it contain the constant logic for all expr types. Change-Id: I756eb945e04c791eff39c33305fe78d957ec29f4 Reviewed-on: http://gerrit.cloudera.org:8080/1214 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-10-09 16:47:46 -07:00
Alex Behm	bbaef98281	IMPALA-2445: Preserve chain of table refs until end of computeParentAndSubplanRefs(). The bug: While separating the parent table refs from the subplan refs, we immediately changed the left table link of a chosen ref. However, we rely on the link structure to correctly determine the required table ref ids, so in some cases we missed a required table ref id due to a broken table ref chain. The fix: Preserve the original chain of table refs until the end of computeParentAndSubplanRefs(). TODO: This fix has the unfortunate consequence of making the plan for nested TPCH Q21 worse. It should be possible to change the planning logic to be correct and also get the original better plan for Q21, but this needs some more thought. Change-Id: Ib8dc13c950f7783b62ce6ab7c8a6f534f9a9bb31 Reviewed-on: http://gerrit.cloudera.org:8080/1177 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins Conflicts: testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test	2015-10-09 16:47:00 -07:00
Ippokratis Pandis	49b588a714	IMPALA-2265: Sorter was not checking the returned Status of PrepareRead The sorter was dropping on the floor the returned Status of the PrepareRead() calls. PrepareRead() tries to Pin() blocks. In some queries with large sorts, those Pin() calls could fail with OOM, but because the sorter was ignoring the returned Status it would happily put the unpinned block in the vector of blocks and eventually seg fault, because the buffer_desc_ of that block was NULL. This patch fixes this problem and adds a test that eventually we may want to move to the exhaustive build because it takes quite some time. It also changes the comments of the sorter class to the doxygen style. Change-Id: Icad48bcfbb97a68f2d51b015a37a7345ebb5e479 Reviewed-on: http://gerrit.cloudera.org:8080/1156 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-10-09 16:42:03 -07:00
Dimitris Tsirogiannis	38f29b048d	IMPALA-2474: PlannerTest fails due to nested types file size mismatch (part2) With this commit we use regex when comparing the file sizes of table 'tpch_nested_parquet.region' in the PlannerTest. Change-Id: I03fa177c9d36d60bcb5ce7eece8a5a7c98bb7985 Reviewed-on: http://gerrit.cloudera.org:8080/1216 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins Conflicts: testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test	2015-10-09 16:39:28 -07:00
Ippokratis Pandis	f1ef5170cb	IMPALA-2168: Do not try to access streams of repartitioned spilled partition in right-joins In case of right joins (right outer, right anti and full outer), if a spilled partition was repartitioned we would try to access its build rows stream, even though that was already set to NULL leading to SEGV. Change-Id: Ia570333c62a4da1152d8d47be9176ac024ba3f5f Reviewed-on: http://gerrit.cloudera.org:8080/1209 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-10-09 16:33:14 -07:00
Dan Hecht	13738e26e6	IMPALA-2474: PlannerTest fails due to nested types file size mismatch For some reason, on each full data load, Hive seems to slightly change the file size of the tpch_nested.customer files. The PlannerTest result comparator already supports regex:, so use that to regex away the file size to get the builds working. Change-Id: If84ac71bc3a309407efa6c597be71f83993c5533 Reviewed-on: http://gerrit.cloudera.org:8080/1148 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-10-08 15:16:33 -07:00
Sailesh Mukil	9f5fbdfb9a	IMPALA-2497: Flaky test in analytic_fns.test The test introduced as a part of IMPALA-2457 was flaky as the order of the results was not enforced and could change. This patch forces ordering and fixes this bug. Change-Id: Ib03d2d93818f33835b267347dcd3a94062aa7475 Reviewed-on: http://gerrit.cloudera.org:8080/1189 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:42 -07:00
Juan Yu	41509ce3c1	IMPALA-2477: Parquet metadata randomly 'appears stale' Stream::ReadBytes() could fail by other reasons than 'stale metadata'. Adding Errorcode Check to make sure Impala return proper error message. It also fixes IMPALA-2488 metadata.test_stale_metadata fails on non-hdfs filesystem. Change-Id: I9a25df3fb49f721bf68d1b07f42a96ce170abbaa Reviewed-on: http://gerrit.cloudera.org:8080/1166 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:41 -07:00
Matthew Jacobs	38bc1c77b8	IMPALA-2375: Disabling/moving tests that don't work with the old HJ Change-Id: I6d1d0d0edd3b60e854130c4d8b9fcbe765c1aba0 Reviewed-on: http://gerrit.cloudera.org:8080/1173 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:40 -07:00
Alex Behm	c153d094d4	IMPALA-2478: Unset the expr id of bound conjuncts. The bug: When assigning bound collection conjuncts to scan, we incorrectly marked the source of the bound conjunct as assigned even though the source conjunct must also be evaluated by a join (source conjunct is a where-clause conjunct bound by an outer-joined tuple). The root issue was that a bound conjunct retained the expr id of its source. The fix: Unset the expr id of bound conjuncts to prevent callers from inadvertently marking the source conjunct as assigned. Change-Id: Ica775adfc551d9fc0457a2392c4988cb2eb7de72 Reviewed-on: http://gerrit.cloudera.org:8080/1149 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:38 -07:00
Sailesh Mukil	f06497e1d6	IMPALA-2457: PERCENT_RANK() returns NaN for row group with 1 row The analytic function PERCENT_RANK() is implemented as a query rewrite in the analysis stage. An edge case where the count of the rows returned is 1 resulted in a divide by zero which returned NaN as the result. This patch fixes that by creating a conditional expression. Change-Id: Ic8d826363e4108e0246b8e844355f3382a4a3193 Reviewed-on: http://gerrit.cloudera.org:8080/1131 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-10-07 14:47:37 -07:00
Dimitris Tsirogiannis	a4d24954b5	IMPALA-2446: Fix wrong predicate assignment in outer joins This commit fixes an issue where a predicate from the WHERE clause that can be evaluated at a join node is incorrectly assigned to that node's join conjuncts even if it is an outer join, thereby causing the join to return wrong results. Change-Id: Ibf83e4e2c7b618532b3635b312a70a2fa12a0286 Reviewed-on: http://gerrit.cloudera.org:8080/1129 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-10-06 10:54:10 -07:00
Alex Behm	d5e0e2eebc	IMPALA-2456: For hash joins inside a subplan, open child(0) before doing the build. The bug: A query with a subplan containing a hash join with unnest nodes on both the build and probe sides would not project the collectionn-typed slots referenced in unnest nodes of the probe side. The reason is that we used to first complete the hash join build before opening the probe side. Since the build does a deep-copy those collection-typed slots to be unnested in the probe side would not be projected. Example query that exhibited the bug: subplan hash join nested-loop join singular row src unnest t.c1 unnest t.c2 scan t The tuple of 't' has two-collection typed slots, one for 't.c1', and another for 't.c2'. If the hash join completes the build without opening the probe side, then the 't.c2' slot would not be projected and deep copied into the build-side hash table. That collection would then be returned in GetNext() of the hash join. The fix: For hash joins inside a subplan, open child(0) before doing the build. Change-Id: I569107b5ecafdbb75f3562707947ecc73951140c Reviewed-on: http://gerrit.cloudera.org:8080/1128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-10-06 10:54:08 -07:00
Alex Behm	b89d69da90	Nested Types: Enforce and test maximum nesting depth of 100. The limit of 100 was determined empirically by generating deeply nested Parquet and Avro files and then trying to run queries with and without subplans over them (one absolute table ref vs. all relative table refs for maximally nested subplans). Based on those experiments we can handle up to 200 levels of nesting, but the queries get very slow. At 300 levels, we exceed the stack space due to the recursive implementation of the scan. Also, we decode the rep/def levels of Parquet as uint8_t. I settled with 100 because it is safe, future proof and reasonably high for most practical cases. Change-Id: Iebdfa96a6dd6060387e38eaedb8ddf0f9901ac24 Reviewed-on: http://gerrit.cloudera.org:8080/905 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-10-05 11:30:54 -07:00
Alex Behm	06c96e4074	IMPALA-2349,IMPALA-2412: Planner fixes to subplan ordering. IMPALA-2349: The bug was that we were not adding the parent tuple ids of a relative or correlated table ref to the list of required tuple ids if that table ref itself depended on a relative table ref (nested subplans). This patch simplifies and fixes the planning with straight_join. The required tuple ids are properly set, and the ordering requirement is enforced by adding the last parent table ref's id to the list of required table ref ids. IMPALA-2412: The bug was that we were relying on both the required materialized tuple ids as well as the table ref ids to determine whether a table ref belongs into a subplan at a certain level. However, as the existing comments in the code actually already state, the subplan placement should be determined only based on whether the required parent tuple ids are materilaized. The correct join/subplan ordering is independent, and is handled by the required table ref ids. Change-Id: I922fcbd0039242bf5940534d667926cdbdf72946 Reviewed-on: http://gerrit.cloudera.org:8080/907 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-10-05 11:30:51 -07:00

1 2 3 4 5 ...

871 Commits