impala

mirror of https://github.com/apache/impala.git synced 2026-01-09 06:05:09 -05:00

Author	SHA1	Message	Date
Skye Wanderman-Milne	44125729dc	UDF/UDA memory management improvements * AggFnEvaluator now uses the UDF mem pool (I'm planning to change this to per-exec node pools in the expr refactoring) * FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker * Added FunctionContextImpl::Close() which sets warnings for leaked allocations Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954 Tested-by: jenkins	2014-03-17 20:38:25 -07:00
Henry Robinson	635dd7d289	IMPALA-875: Respect isAnalyzed_ in IntLiteral expressions Partition column expressions are analysed twice for INSERT statements - once to infer the type and so to add a possible cast, and once to compute stats on the resulting expr. However, this process resulted in an partition column expr that was a IntLiteral getting the smallest type that would contains its value, rather than retaining the column-compatible type that had been assigned to it. This patch does the minimum thing, which is make IntLiteral.analyze() idempotent. Doing the same thing to Expr and LiteralExpr unearths some other bugs, which we will have to fix in a follow-on patch (see IMPALA-884). Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947	2014-03-17 17:35:12 -07:00
Lenni Kuff	dd20958e5d	Minor test cleanup * Prefer 'refresh <table name>' over 'invalidate metadata' * Remove the 'RELOAD' test setup option that was used by only 1 test. * Delete a .py test file that seems to be a duplicate Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-03-17 17:30:15 -07:00
Alex Behm	f4cfc75544	IMPALA-860: Place outer and semi joins at a fixed position in the plan. The bug: It is generally incorrect to re-order joins across outer/semi joins. For example, an inner join following an outer join may reduce the cardinality, so placing the inner join before the outer join during join re-ordering would be incorrect because the outer join is cardinality preserving (on one or both sides). The same argument holds for semi joins. The fix: Place outer and semi joins at a fixed position in the plan based on where they appeared in the original query. Inner joins to the left/right of outer/semi joins are still re-ordered properly. Change-Id: Idae837097b9376473d7f8124eef69b51f612b210 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1909 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1922	2014-03-15 05:18:11 -07:00
Srinath Shankar	74a975c45b	IMPALA-862: count(x) may return null when a similar count(distinct x) is also used count(x) with no distinct and no group-by expressions returns NULL on empty input if other distinct aggs (e.g. COUNT(distinct x) are present. This happens because the COUNT is transformed to SUM(COUNT()), with the inner COUNT being evaluated WITH a group-by expression (e.g. x). SUM over empty input returns NULL, but COUNT should return 0. This patch fixes this by replacing COUNT with zeroifnull(COUNT) before AggregateInfo is generated if there are distinct aggs and no group-bys. The logic in AggregateInfo itself has not been modified. Change-Id: I902e3fdd95767135b2f3fe423e8802ef57366af1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1921 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-03-14 23:35:55 -07:00
Alex Behm	ce40134ad0	IMPALA-867: Fail COMPUTE STATS in analysis for Avro tables affected by HIVE-6308. Avro tables that were not created with a column-definition list do not have their columns properly populated in the Metastore backend DB (HIVE-6308). For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed. This patch fails COMPUTE STATS in analysis for such broken Avro tables and adds tests for Avro tables with mismatched a column-definition list and Avro schema. Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920	2014-03-14 23:24:55 -07:00
Lenni Kuff	aa0b7a35f5	IMPALA-880: COMPUTE STATS should update partitions in batches When updating partition metadata as part of COMPUTE STATS we would previously attempt to update all partitions at once. This could lead to HMS socket timeouts and also could run into issues if there were > 32K partitions. In this change we now update the partitions in batches, with a max size of 500 partitions per batch. We also compare whether the row count has changed and only update partitions that have been modified. Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918	2014-03-14 19:20:12 -07:00
Alex Behm	15e05082c0	IMPALA-831: Distributed aggregation and top-n over unions. Change-Id: I056e8271421008378db93e8b2393861cc9dd4b90 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1840 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1886	2014-03-13 15:42:31 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Alex Behm	d640273a3f	IMPALA-861: Proper slot materialization on distinct aggs inside an inline view. The bug: Slot materialization on distinct aggs inside an inline view did not work if the only reference to the 2nd-phase agg-tuple slots was in a predicate from an outer query block (e.g., Where-clause of the block with the inline view ref). The reason was that bound predicates were fetched from the wrong tuple (from the 1st phase agg). The fix: Assign predicates to the top-most agg in the single-node plan that can evaluate them, as follows: For non-distinct aggs place them in the 1st phase agg node. For distinct aggs place them in the 2nd phase agg node. Change-Id: I0f6ab53cf7bb0c6aed9524ad2e24a849d2dc0ec4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1843 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1881	2014-03-13 04:39:48 -07:00
Alex Behm	7fcd7cd64e	Add list of tables missing stats to explain header and mem-limit exceeded error. Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880	2014-03-12 21:15:22 -07:00
Nong Li	b1aeea3f0b	Disable decimal in analysis. Change-Id: I4cff1fc74ef0afeba15bee5b9eb6851abbfddbdf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1874 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-03-12 18:55:06 -07:00
Alex Behm	58950a52a3	IMPALA-798: Distributed execution of CTAS and explain CTAS. Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867	2014-03-12 16:51:50 -07:00
Matthew Jacobs	8fa8a0f828	IMPALA-843: Do not close reader contexts until plan fragment close Fixes a crash that occurs in some cases when io buffers are still used and child nodes are closed early. We close child nodes early when all rows have been consumed and resources are transfered, but in some cases io buffers are still in use when a scan node is closed. We avoid this problem by only closing reader contexts when the entire fragment is closed. Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859	2014-03-12 14:44:18 -07:00
Alex Behm	748ea3f38b	Fix test_partitioning.py and expected results. Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-12 11:25:17 -07:00
Lenni Kuff	08417c875f	IMPALA-849: Impala does not work with boolean partition key columns This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral (incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr. As part of this change it turns out Boolean partition columns are also broken in Hive. I filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition column for Impala due to this bug. Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-11 18:42:08 -07:00
Alex Behm	47c52ade84	IMPALA-866: Make HdfsScanNode.computeStats() idempotent with respect to totalBytes_. Change-Id: I1c243b089db82c0544586a2a1428081aa2dbcd52 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1844 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1852	2014-03-11 18:20:15 -07:00
Nong Li	5022aa08fb	IMPALA-869: Fix result initialization for MIN(). Change-Id: I50eceb04c0eb1c9eedb9c963cb75d2fc5aeb4825 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1847 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-11 17:26:31 -07:00
Nong Li	f6de8d9e30	IMPALA-765: Fix subexpr elimination codegen optimization. The previous implementation did not properly handle replacing the is_null return argument from expr calls. Change-Id: I96cd0dfca8876b4f914b0cbc4eb459ea3dcdf230 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1795 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-10 15:20:53 -07:00
Alex Behm	fe4f3babe5	IMPALA-824: Generalize getBoundPredicates() to handle multi-slot predicates. The bug: Multi-slot predicates bound to a single outer-joined tuple were not marked as materialized. In addition, such predicates were not picked up by nodes under the join via getBoundPredicates() even if it would be correct to do so. The fix: Always mark slots of predicates that must be evaluated by a join in SelectStmt.materializeRequiredSlots(), regardless of whether the predicate can also be safely evaluated below the join. This patch also generalizes getBoundPredicates() to handle multi-slot predicates and fixes some issues with redundant predicate assignment. Still, the new approach has several limitations which are documented in the predicate propatation planner test to ease future improvements. Change-Id: If5da0354a83c00a9766fc63b7780ed4d5a9c46e5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1717 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1819	2014-03-08 04:22:39 -08:00
Alex Behm	a615ebc549	IMPALA-822,IMP-1271: Binding predicates on an aggregation now properly trigger slot materialization. The bug was that the number of materialized agg-tuple slots did not correspond to the number of materialized agg functions, due to binding predicates against an AggNode causing slot materialization after SelectStmt.materializeRequiredSlots(). This patch fixes the issue by taking binding predicates (bound to a slot in an agg tuple) into consideration in SelectStmt.materializeRequiredSlots(). I added a new sanity check in AggregationNode.toThrift() surfaced another issue with slot materialization that is also fixed in this patch. The ordering exprs must be marked before the agg exprs in SelectStmt.materializeRequiredSlots() because the odering exprs may contain agg exprs that are only referenced inside the ORDER BY clause. Change-Id: I1bdc0466f583907bed625ce6608938e59faee83f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1639 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1818 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-08 00:25:26 -08:00
Alex Behm	f7c2781afe	IMPALA-845: Transfer predicates to 2nd phase merge agg in some cases. Having predicates need to be transferred to the 2nd phase merge agg for distinct + non-distinct aggregates without group by. For distinct + non-distinct aggregates with group by, it is correct to evaluate the predicates at the 2nd phase (non-merge) agg. Change-Id: I71d73c4ef92becbb81e142bc0cb5f54e790b1fb5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1743 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1817	2014-03-07 21:45:16 -08:00
Alex Behm	66a6c1f312	Fix UDF query test files. Change-Id: Idea277ea2d20c47b2a81b0f2f06c48455de2ea45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1780 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-03-06 07:37:14 -08:00
Alex Behm	04dc185993	Exit HBaseTestDataRegionAssignment forcefully after success to avoid HDFS-6057. Change-Id: I125c09c0bf817f3427302adf89e17c419200907c	2014-03-05 15:24:53 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Alex Behm	69a840d965	Consistent memory estimates for explain tests. Our new build machines (e.g., beefy) have more cores than our other machines, so scan nodes may have a different memory estimate causing the explain tests to fail. This patch fixes the num_scanner_threads to 1 for explain tests to ensure consisteny estimates. Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-05 05:38:30 -08:00
ishaan	9e043e862c	Fix run-hbase.sh to correctly pick up the classpath. We run wat-for-hbase-master.py after starting hbase to account for a race between the master and region server. This script has not been working for some time. It caused no ill effects sinc the said race was absent. However, the race has manifested itself again, so the script needs to be fixed. Setting the correct classpath does so. Change-Id: I783a7473cfd24a9cb66711f5428f7052ceb96282 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1756 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-03-05 01:04:56 -08:00
ishaan	00724a47da	Prefix the path to the local core-site to the classpath used by minillama With a recent upstream change, a core-site.xml was introduced in a YARN test jar pulled in by thirdparty. This causes MiniLlama to ignore options set in fe/src/test/resources/core-site.xml. The problem manifests itself with the MiniDfsCluster starting on an arbitary port, but it would have also caused a lot of tests to fail as none of the compression codecs are pulled in. This change prepends the classpath used by minillama with the path to the internal core-site. Change-Id: Iee267fe12e02301baec059a1f7469288c038d6fa Reviewed-on: http://gerrit.ent.cloudera.com:8080/1739 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-03-04 09:59:50 -08:00
Nong Li	f0a67153d3	Decimal analysis changes. Change-Id: Ib7d6a6a7650cc9058ff1486fc7546ab66c698d46 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1734 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-03 21:15:00 -08:00
Alex Behm	f71767c612	IMPALA-846: Add additional regression test. This issue has coincidentally been resolved by the fix for IMPALA-820. This patch adds an additional regression test for explicitly covering IMPALA-846. Change-Id: Ib60174676e5bb53de543a1db30adc05cef4d6593 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1719 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1730	2014-03-03 16:50:36 -08:00
Skye Wanderman-Milne	203fc66456	Add GetTypeDesc() method to FunctionContext. This is currently only implemented for NativeUdfExpr. Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720	2014-03-01 20:24:30 -08:00
Lenni Kuff	bf16b5cd0d	IMPALA-749: Fetch partitions in batches, rather than all at once. This updates how Impala fetches partition metadata from the Hive Metastore to fetch partitions in batches, rather than all at once. This helps reduce the load on the HMS and also lets Impala scale to above 32K partitions. The downside is that it may require additional RPCs to get all the partitions. This is done by first querying the metastore to get all the partition names that exist, then splitting the list of names into seperate batches to get the actual partition metadata. Impala uses a default size of 1000 partitions per batch, but it can be configured by setting the 'hive.metastore.batch.retrieve.table.partition.max' parameter in the hive-site.xml config file. Change-Id: Ide0ec30ef8a9e00f79c26551aa8e5e7814c73034 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1662 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1698	2014-02-28 22:30:45 -08:00
Alex Behm	75e89263fb	IMPALA-820: Register Having-clause predicates with analyzer after expr substitution. Change-Id: I638a7324b10007f7f55564b82def3fd05f2c9fa4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1607 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1701	2014-02-28 10:40:55 -08:00
Nong Li	80658d9eab	IMPALA-828: Fix avro codegen if conjuncts cannot be codegend. Change-Id: I9ff0214e541eb958132fbe5b7798883db91ef025 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1695 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-27 19:58:13 -08:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Alex Behm	8223e1e44b	Avoid Hive replication bug (CDH-17414) by 'warming up' HiveServer2 after it starts. The purpose of this patch is to avoid CDH-17414 which causes data files loaded with Hive to incorrectly have a replication factor of 1. When using beeline this problem only appears to occur immediately after creating the first HBase table since starting HiveServer2, i.e., subsequent loads seem to function correctly. This patch add a new script that creates an external HBase table in Hive to 'warm up' HiveServer2 immediately after it is started. Subsequent loads should assign a correct replication factor. Change-Id: Ic54c9401b67b748a8848d19f82b8e7df9535e845 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1640 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-02-25 17:33:53 -08:00
Alex Behm	3d764619f7	Run Hive data loading through beeline instead of the Hive shell. Fixes our log configuration to put the Hive logs in cluster_logs/hive. Change-Id: I5d98581e35325f2173e4b3170e36bec42d33f8f3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1497 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1615 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-02-20 15:43:31 -08:00
Alex Behm	cb8150e8ee	IMPALA-817: Check equality of function name in Function.equals(). Change-Id: Ib9b4ee3a21f90fdb0d7ebccd89462dc67040bd1e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1594 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1611 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-02-19 17:13:51 -08:00
Alex Behm	15f4f2c33a	IMP-1258: Fix incorrect assignment/propagation of predicates with outer joins. This patch includes several changes to predicate assignment and propagation. First, we now only register as outer joined those tuples of TableRefs directly participating in an outer join. In particular, materialized tuples referenced inside an outer-joined InlineView are not registered as outer joined - only the InlineView's tuple is registered. The other major change is that we detect when it is correct to propagate predicates to scan nodes participating (directly or indirectly) in an outer join by testing whether a predicate can become true if a tuple is NULL. If that is the case, then it is generally not safe to propagate a predicate because it would change the final result of the outer join. Change-Id: Ia135ab15ec8c6ef756a908f797f96812d28c84c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1567 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1606	2014-02-19 15:27:44 -08:00
Alex Behm	1fc3b384ed	IMPALA-805: Disable elimination of redundant join predicates. Our code for eliminating redundant join predicates based on equivalence classes is not quite right. I've commented out the relevant code to ensure we don't incorrectly remove predicates. Left a TODO to fix and re-enable this feature. Change-Id: Ie76b365903dff6df271a378cbb4fd327ffa0631f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1569 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1572	2014-02-19 13:23:11 -08:00
Nong Li	0d2919fe7f	Refactor scalar and aggregate function analysis and execution. This patch cleans up analysis and execution of scalar and aggregate functions so that there is no difference between how builtins and user functions are handled. The only difference is that the catalog is populated with the builtins all the time. The BE always gets a TFunction object and just executes it (builtins will have an empty hdfs file location). This removes the opcode registry and all of the functionality is subsumed by the catalog, most of which was already duplicated there anyway. This also introduces the concept of a system database; databases that the user cannot modify and is populated automatically on startup. Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577	2014-02-18 18:40:08 -08:00
Lenni Kuff	95404d4888	Support prioritized background table loading The overall goal of this change allow for table metadata to be loaded in the background but also to allow prioritization of loading on an as-needed basis. As part of analysis, any tables that are not loaded are tracked and if analysis fails the Impalad will make an RPC to the CatalogServer to requiest the metadata loading of these tables be prioritized and analysis will be restarted. To support this, the CatalogServer now has a deque of the tables to load. For background loading, tables to load are added to the tail of the deque. However, a new CatalogServer RPC was added that can prioritize the loading of one or more tables in which case they will get added to the head of the deque. The next table to load is always taken from the head. This helps prioritize loading but is admittedly not the most fair approach. The support the prioritized loading, some changes had to made on the Impalad side during analysis: - During analysis, any tables that are missing metadata are tracked. - Analysis now runs in a loop. If it fails due to an AnalysisException AND at least 1 table/view was missing metadata, these tables missing metadata are requested to be loaded by calling the CatalogServer. - The impalad will wait until the required tables are received (by getting notified each time there is a call to updateCatalog()), and waiting to run analysis until all tables are available. Once the tables are available, analysis will restart. This change also introduces two new flags: --load_catalog_in_background (bool). When this is true (the default) the catalog server will run a period background thread to queue all unloaded tables for loading. This is generally the desired behavior, but there may be some cases (very large metastores) where this may need to be disabled. --num_metadata_loading_threads (int32). The number of threads to use when loading catalog metadata (degree of parallelism). The default is 16, but it can be increased to improve performance at the cost of stressing the Hive metastore/HDFS. Change-Id: Ib94dbbf66ffcffea8c490f50f5c04d19fb2078ad Reviewed-on: http://gerrit.ent.cloudera.com:8080/1476 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1538	2014-02-13 23:43:06 -08:00
Nong Li	80d4fd958e	IMPALA-786: Drop function should clear library cache. We were previously only clearing the cache in the catalog service update loop so the impalad the drop was issued to was not doing the right thing. Change-Id: I6bee228e8c0d565cea4ea61cbf64240d83a45a7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1511 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 18:51:39 -08:00
Nong Li	d5d4b4785b	Fix broken udf test case. Should not specify DB. Change-Id: I5f6343cbef9f52d349130360e029b38b23d0187a Reviewed-on: http://gerrit.ent.cloudera.com:8080/1505 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 11:34:56 -08:00
Nong Li	1a55133f0a	IMPALA-735. Fix codegen bug affecting outer joins. Change-Id: I99ca45b558fb2ed694f261a22e7e91e59f1ad675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1496 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 05:00:21 -08:00
Nong Li	7d578a9e54	Cleanup for IMPALA-774 fix. Change-Id: I47bce71c482b3576957e88980f764c30f45229a9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1454 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1470	2014-02-05 22:58:51 -08:00
Henry Robinson	16af29ea5f	IMPALA-770: Fix crash in aggregation node with zero-width tuple The select exprs of an inline view may not always be materialised, yet the output tuple itself may be. This patch fixes a crash in this situation in the backend aggregation node which assumed its output tuple would always have at least one materialised slot. The cause was a couple of too-conservative DCHECKs that failed if the tuple was NULL. In fact, the code was robust to this possibility without the checks, so this bug didn't affect release builds of Impala. Change-Id: If0b90809d30fcd196f55197953392452d1ac9c4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1431 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 8c1c21b66c43e900760ace54d090305f32a85a1f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1471 Tested-by: Henry Robinson <henry@cloudera.com>	2014-02-05 22:01:35 -08:00
Nong Li	ccd8c0338f	IMPALA-774: Fix runtimestate setup when evaluating expr from FE. We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part of a constant expression. Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-02-05 11:02:27 -08:00
Lenni Kuff	c4879340e8	IMPALA-781: Block Impala from reading RC file tables created using the LazyBinaryColumnarSerDe With Hive .12, the default RC file format can be configured to be ColumnarSerDe or LazyBinaryColumnarSerDe. Impala does not yet support the LazyBinaryColumnarSerDe. This change verifies it is properly disabled. Change-Id: Ia84495868237ce2c89a9706ad75e0f7eb8499057 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1416 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1423	2014-02-01 10:05:19 -08:00
Lenni Kuff	7a6892dcbe	Fix race when invalidating catalog metadata and loading a new table There was race when the catalog was invalidated at the same time a table was being loaded. This is because an uninitialized Table was being returned unexpectedly to the impalad due to the concurrent invalidate. This fixes the problem by updating the CatalogObjectCache to load when a catalog object is uninitialized, rather than load when null. New items can now be added in a initialized or uninitialized state; uninitialized objects are loaded on access. Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh In addition, it cleans up the locking in the Catalog to make it more straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog and this lock is used to protect the catalogVersion_. Operations that need to perform an atomic bulk catalog operation can use this lock (such as when the CatalogServer needs to take a snapshot of the catalog to calculate what delta to send to the statestore). Otherwise, the lock is not needed and objects are protected by the synchronization at each level in the object heirarchy (Db->[Function/Table]). That is, Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized independently. Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418	2014-01-31 16:16:32 -08:00

1 2 3 4 5 ...

496 Commits