impala

mirror of https://github.com/apache/impala.git synced 2025-12-31 06:02:51 -05:00

Author	SHA1	Message	Date
Dan Hecht	1fee56cb26	IMPALA-1080: Implement "SET <query_option>" as SQL statement. Also add support for "SET", which returns a table of query options and their respective values. The front-end parses the option into a (key, value) pair and then the existing backend logic is used to set the option, or return the result sets. Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins (cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b) Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614	2014-07-25 10:25:09 -07:00
Nong Li	cfa58a4567	Run test_rows_availability serially. Change-Id: Id87a209a614f889209456f8c0d9aedd8ad0e513f Reviewed-on: http://gerrit.ent.cloudera.com:8080/3565 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3584	2014-07-22 14:35:46 -07:00
Nong Li	7dc57aaa9e	Change buffered block mgr to support multiple clients. This patch does a few things: 1. Moves the buffer block mgr from the sorter to the runtime state. This is now one that is shared across the query fragment. The partitioned hash join and agg will use this as well. 2. Adds a Client interface to the block mgr. Each exec node is a different client and can reserve a minimum number of buffers. This avoid starvation. 3. Updated the BufferedBlockMgr interface's for getting pinned blocks to collapse two existing APIs. Change-Id: Ibb31fbe480f3726048457f26e24a9e33f7201d86 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3504 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/3574	2014-07-22 12:45:37 -07:00
Nong Li	a25400c94e	Increase timeout in test_rows_availability to make sure query state is what we expect. Change-Id: Id4feebcc7b7cecb07555009219e6420e48a0c82b Reviewed-on: http://gerrit.ent.cloudera.com:8080/3534 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/3579	2014-07-22 12:12:13 -07:00
Nong Li	202d656ddc	Stop setting query state to EXCEPTION for non-exception cases. We were setting the state to exception on Cancel() all the time. We use the cancellation path as the normal cleanup path so this gets called even when the query went fine (e.g. UnregisterQuery calls Cancel()). We had already plumbed through a 'cause' argument to differentiate. Change-Id: Icf1091c165dec36d3dad7ce308367bbbc9edee4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/3524 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3575	2014-07-22 04:08:28 -07:00
Nong Li	188a0ea833	Rework structure of hash table. This patch does two things in preparation for external joins. The hash table used to contain a directory structure (buckets and nodes) both of which were contiguous. The nodes contained the tuple ptrs within it. This patch changes it so the nodes are not stored contiguously but allocated in pages. (this structure is dense and does not require random lookups by index). The bucket structure is still contiguous since we rely on the doubling property and random lookup by index. The second change is that the node's no longer store the tuple ptrs within them. This makes it easier to build the hash table ontop of existing data. Here's a quick benchmark doing a self join on tpch lineitem. Both build and probe times decreased a bit. Before: HASH_JOIN_NODE (id=2):(Total: 1s139ms, non-child: 985.939ms, % non-child: 86.50%) - BuildBuckets: 2.10M (2097152) - BuildRows: 6.00M (6001215) - BuildTime: 527.991ms - LeftChildRows: 6.00M (6001215) - LeftChildTime: 451.964ms - LoadFactor: 0.50 - RowsReturned: 30.01M (30012985) - RowsReturnedRate: 26.33 M/sec After: HASH_JOIN_NODE (id=2):(Total: 1s019ms, non-child: 835.350ms, % non-child: 81.97%) - BuildBuckets: 2.10M (2097152) - BuildRows: 6.00M (6001215) - BuildTime: 423.175ms - LeftChildRows: 6.00M (6001215) - LeftChildTime: 406.67ms - LoadFactor: 0.50 - RowsReturned: 30.01M (30012985) - RowsReturnedRate: 29.45 M/sec Change-Id: I79e209a24c24fb4f2f99574bcf187746fddadc06 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3245 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-07-15 16:57:09 -07:00
Henry Robinson	9d0173c647	[CDH5] Disable ACL tests The tests pass every time locally (in a 60 minute run), but fail intermittently on our build machines. Change-Id: I62d5ea0df8c42728a538b29bd16006be3179bfd3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3489 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-07-14 15:38:11 -07:00
Henry Robinson	ff32821c6b	[CDH5] Test to confirm that ACLs are inherited correctly on INSERT Change-Id: I781a6b7203c2e12b484162954abae51a6443bead Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-07-09 19:04:55 -07:00
Matthew Jacobs	65c1a6f21e	Remove SOURCE keyword by parsing as an identifier and checking the value Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier" Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a	2014-06-30 16:47:47 -07:00
Alex Behm	7777fbff53	Clean up expr substitution and cloning. Before: The pre- and postconditions of expr substitution and cloning, in particular, their effect on the isAnalyzed_ flag were unclear and sometimes inconsistent e.g., some literal exprs set isAnalyzed_ to true in their c'tor. As a result, several places required ad-hoc solutions like Expr.unsetIsAnalyzed() and Expr.reanalyze(). This patch cleans up expr substitution and cloning, summarized as follows: Expr analysis: All exprs start our with isAnalyzed_ = false. The flag it set to true iff analyze() has been called on the expr. Expr.clone(): Creates a deep copy of an expr including all its analysis state. Expr.equals(): Comparison of expr trees ignores implicit casts. This simplifies expr substitution because un/analyzed exprs can be easily compared/substituted. ExprSubstitutionMap: When adding a mapping, the rhs expr must be analyzed to allow substitution across query blocks. There is no requirement on the lhs expr. Expr substitution: Substitution returns an analyzed clone of the original expr with exprs substituted. While performing the substitution, implicit casts and analysis state are removed such that the returned result has minimal implicit casts and types. There are two versions of substitute functions: One that throws exceptions one that does not, because the caller may have different expectations on whether a substitution must succeed or not. Numeric literals: This patch combines IntLiteral and DecimalLiteral into a NumericLiteral. Its main benefit is that analyze() always produces the same type, even if the literal was implicitly cast and/or isAnalyzed was unset because of expr substitution. This was not the case before because an implicit cast could permanently turn an IntLiteral into a DecimalLiteral. There is no more need for unsetIsAnalyzed() or reanalyze(). Change-Id: I646110e3714cff8ae8d5a378c25a107dd43334b6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3228 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3318	2014-06-30 10:18:26 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Alex Behm	881f3a8c33	Re-order union operands descending by their estimated per-host memory. Re-order union operands descending by their estimated per-host memory, s.t. parent nodes can gauge the peak memory consumption of a MergeNode after opening it during execution (a MergeNode opens its first operand in Open()). Scan nodes are always ordered last because they can dynamically scale down their memory usage, whereas many other nodes cannot (e.g., joins, aggregations). One goal is to decrease the likelihood of a SortNode parent claiming too much memory in its Open(), possibly causing the mem limit to be hit when subsequent union operands are executed. Change-Id: Ia51caaffd55305ea3dbd2146cd55acc7da67f382 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3146 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/3213 Tested-by: jenkins	2014-06-20 18:46:10 -07:00
Taras Bobrovytsky	7faaa65996	Added order by query tests - Added static order by tests to test_queries.py and QueryTest/sort.test - test_order_by.py also contains tests with static queries that are run with multiple memory limits. - Added stress, scratch disk and failpoints tests - Incorporated Srinath's change that copied all order by with limit tests into the top-n.test file Extra time required: Serial: scratch disk: 42 seconds test queries sort : 77 seconds test sort: 56 seconds sort stress: 142 seconds TOTAL: 5 min 17 seconds Parallel(8 threads): scratch disk: 40 seconds test queries sort: 42 seconds test sort: 49 seconds sort stress: 93 seconds TOTAL: 3 min 44 sec Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205	2014-06-20 13:35:10 -07:00
Dimitris Tsirogiannis	7dbd3a5860	IMPALA-1040: Reading a decimal partitioned column with invalid values This commit fixes IMPALA-1040 in which when an invalid value is inserted to a decimal partitioned column through hive it results in a non informative error message and in some cases in the associated table to disappear from Impala's catalog. The fix results in a more informative error message to always be thrown by Impala to indicate the insertion of an invalid partition key value. Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200	2014-06-20 12:46:52 -07:00
Ippokratis Pandis	6026f1ebe1	IMPALA-1055: Compute stats query statements don't quote DB and table names The compute stats statement was not quoting the DB and table names. If those names were aliasing with keywords, then the compute stats would not execute due to a syntax error. Change-Id: Ie08421246bb54a63a44eaf19d0d835da780b7033 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3170 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3198	2014-06-20 09:32:52 -07:00
Alex Behm	aacd8bcf72	Change UnionNode to open its first child in UnionNode::Open(). This patch ensures that rows are available for clients to fetch after we advance the query to FINISHED if the coordinator fragment is rooted at a UnionNode. Change-Id: I9b4ad3f70b46c7e7720bdd5ca9ad85479c2cb7fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/3139 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3168	2014-06-19 16:44:43 -07:00
ishaan	dc3dc3dc1e	Enable tpch queries to run on text to unblock the full data load build. Some planner tests depend on data being populated in the tpch tmp tables (in text format) . This change re-enables the tpch query tests to run on text so that they pass. Change-Id: I4ed09f55e05cb01978cb6f0808c6395552c0f129 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3176 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-06-19 16:19:13 -07:00
Lenni Kuff	0ac0527643	Reduce test execution time by limiting long running tests to exhaustive exec strategy I looked at the latest run from master and took the tests suites that had long execution times. This cleans those test suites up to either completely disable them on 'core' or add constraints to limit the number of test vectors. It shouldn't impact nightly coverage since we still run the same tests exhaustively. Change-Id: I10c78c35155b00de0c36d9fc0923b2b1fc6b44de Reviewed-on: http://gerrit.ent.cloudera.com:8080/3119 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3125 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-06-18 16:18:17 -07:00
anusha	6b3689e8c7	IMPALA-973: Fix for invalidate metadata behaviour Change-Id: Ie0c4c458b0919978b03ebaba28bf37950dd34643 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3009 Tested-by: jenkins Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/3091	2014-06-17 12:18:50 -07:00
Dimitris Tsirogiannis	67eb5eb3a8	IMPALA-1028: Cardinality estimate is wrong for partitioned tables if we filter out all partitions This commit fixes IMPALA-1028 in which the cardinality estimate is not correct when all the partitions of a partitioned table are filtered out. To fix this issue we make sure that the estimated result cardinality of the scan node is zero when all the partitions are filtered out. Change-Id: I225949eb2e8f905a5d0f678d7f199fb95ba4aab0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3063 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3083	2014-06-16 20:36:13 -07:00
Srinath Shankar	0df773eed6	Check RuntimeState for cancellation in sorter. Currently, cancellation checking when a SortNode is executing only happens when a batch is being added to the sorter (SortNode::SortInput()) or when a batch is being retrieved from the sorter (SortNode::GetNext()) This fix passes in a RuntimeState into the Sorter instance itself, which checks for cancellation at the following points: i) During an in-memory sort (In Partition() and SortHelper()). In Partition(), the cancellation check may be delayed if the input is completely sorted. ii) During an intermediate merge before each batch of rows from a merge is copied into a run. Change-Id: I5c28c7244ee2e40627cf14542b99f872e3a8c343 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3007 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3059	2014-06-14 17:48:40 -07:00
anusha	ffc334a735	IMPALA-834: Fix for Create Table like Views Change-Id: Ied1f706c48a1106e1d6fc2aa73e57746f52ea333 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2939 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3014 Reviewed-by: Anusha Dasarakothapalli <anusha.dasarakothapalli@cloudera.com>	2014-06-12 22:13:30 -07:00
Skye Wanderman-Milne	1cc628d32d	IMPALA-950: Skip computing stats for decimal columns. This patch also adds a mechanism to return analysis warnings to client, which is used to log skipped decimal columns. Change-Id: I30c246044a68ec8861cd5bed072bd54e65a079e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2822 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit fc77422acef7e6f93fdeb5448309414b905f0725) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2984	2014-06-11 19:16:34 -07:00
Skye Wanderman-Milne	6ac9a8104b	IMPALA-1009: UDF/UDA leaks should not fail queries With this change, leaky UDFs built with the SDK will still fail when using the test harness, but leaky UDFs running in Impala will only trigger a warning. This change also updates the test infrastructure to always check for non-fatal errors/warnings. Change-Id: I5615349b9d691e4eddea3e03e152ef12e73835e7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2844 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 60ce5190d96add6104aba642d2354d87a26000fa) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2938	2014-06-10 21:46:47 -07:00
Nong Li	5e49150a22	Speed up views compat test. - Use a smaller table so hive runs faster - Don't invalidate the catalog, just the view created in hive - This lets us run it in parallel Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-10 20:53:23 -07:00
Nong Li	ad534429df	[CDH5] Disable flaky hdfs caching test. Change-Id: I19900ae029876d8f74169eda0f08f5be3509fbaf Reviewed-on: http://gerrit.ent.cloudera.com:8080/2946 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-10 18:24:42 -07:00
Lenni Kuff	b3ebfddadd	Allow tests to access query result column values by col alias or col position For example, you can now do something like: result_set = execute("select * from tbl") result_row = result_set[0] result_row['col_alias'] or result_row[4] to access column values. If the column alias/position does not exist an exception is thrown. Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922	2014-06-09 23:24:26 -07:00
Victor Bittorf	09aff77a6c	IMPALA-943: removed database udf_test from front-end tests Added CATCH section to test files. Change-Id: I28ba3a6e5ae4c53df5b86505573793d7b150863b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2782 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit 5b616715958f3ebfdc45b8dc0e4baa82bd55f1d2) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2912	2014-06-09 19:06:15 -07:00
Matthew Jacobs	89ec6b3d7a	IMPALA-1033: Remove SOURCE keyword; very common identifier The SOURCE keyword was introduced for DATA SOURCE ddl commands, but it is also a very common identifier. This removes the SOURCE and SOURCES keywords and instead uses DATASOURCE and DATASOURCES. Change-Id: Ic6c2897d1e23efa169aa8787752fe4aa2bb125d5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2895 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 267c13f9b46d249bfd1b8711fd3fadf6853dc1ef)	2014-06-09 17:17:14 -07:00
Srinath Shankar	5755b0bdee	Order by without limit for Impala Enable order-by without limit Added BufferedBlockMgr to allocate buffers and spill to disk. Added Sorter for the external sort impelementation Added new SortNode execution node that completely sorts its input Changes to enable writing in IoMgr went in a separate patch. Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins Conflicts: testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900 Reviewed-by: Srinath Shankar <sshankar@cloudera.com> Tested-by: jenkins	2014-06-09 16:58:08 -07:00
Matthew Jacobs	4f804f9a34	IMPALA-1029: DROP DATA SOURCE should remove jar from lib caches Change-Id: I2a3f0f4e54474aa4fd4e0515bfcdc272d1535544 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2846 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit f2ae745a09b5025c53919a4ce71e3943b034f87c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2899	2014-06-09 11:37:16 -07:00
Henry Robinson	60cbe1b0e1	IMPALA-741: Support partitions with non-existant HDFS locations If a partition had a location that did not exist in HDFS, Impala would refuse to load its metadata. This meant a typo could render a table unloadable. We fix this problem by removing the existence check from the frontend, and by inheriting access from the first extant parent of the partition directory. Fixing this exposed a second issue, where Impala wouldn't create directories for partitions in the right place after an INSERT if the partition location had been changed. To get this right we have to plumb the partition ID through to Coordinator::FinalizeSuccessfulInsert(), so that the coordinator can look up the partition's location from the query-wide descriptor table. As a by-product, this patch rationalises the per-partition, per-fragment statistics gathering a little bit by putting almost all the per-partition stats into TInsertPartitionStatus. Change-Id: I9ee0a1a1ef62cf28f55be3249e8142c362083163 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2851 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-06-08 18:44:45 -07:00
ishaan	db97981ab9	[CDH5] Switch the tpcds schemas to use decimal instead of float/double. This patch converts the tpcds schemas to use decimal instead of float/double. Currently, Impala can only r/w decimal in text, therefore, the tables are constrained to text. The schemas were obtained from the official tpc spec: http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-06-08 11:47:23 -07:00
Matthew Jacobs	2f9b2ae785	Fix SHOW DATA SOURCE test; must execute setup/cleanup serially The SHOW DATA SOURCE tests were run as part of the other SHOW * tests in test_show(), but the setup/cleanup for data sources can't be run in parallel. This change moves the SHOW DATA SOURCE tests into a separate test method and the setup/cleanup code is only run for this test (i.e. not using setup_method() and teardown_method()). The test is then only executed serially. Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870	2014-06-05 20:07:57 -07:00
Nong Li	5d80942d42	[CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads. Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-30 19:14:39 -07:00
Nong Li	84f851b5a5	IMPALA-959: Fix ASAN decimal crashes. Not quite sure what the underlying issue is but these fixes seem to work. Change-Id: I759804eb8338ba86969c0214a1e6e35588c94297 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2726 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-05-30 16:47:07 -07:00
Skye Wanderman-Milne	c8b2017093	Add decimal UDF/UDA support. Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739	2014-05-29 20:49:53 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00
Matthew Jacobs	f9c9a7ca13	Add SHOW DATA SOURCES Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614	2014-05-19 17:52:27 -07:00
Skye Wanderman-Milne	edbbe6035e	Decimal: read from Avro Allows reading decimal columns with or without codegen. Includes tests based on a data file posted on HIVE-5823. Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6 (cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-05-16 22:26:11 -07:00
ishaan	0298e8b6ab	Fix the ASAN build by xfailing test_decimal when ASAN_OPTIONS is set. Adding decimal columns crashes an ASAN built impalad. This change skips the test. Change-Id: Ic94055a3f0d00f89354177de18bc27d2f4cecec2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2532 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2594	2014-05-16 18:14:30 -07:00
Matthew Jacobs	19f34d9187	test_data_source_tables should only run for 'text/none' Change-Id: I784fb4305f8cff92c2582b0a7008f836c7aa9fa4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2504 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 9f3621ec5d270c60258e93e8a2a596329c31f4e6) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2508	2014-05-09 19:32:18 -07:00
Matthew Jacobs	0c533bb152	External Data Source: Backend changes Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497 Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2014-05-09 02:24:41 -07:00
Henry Robinson	38befd2126	IMPALA-724: Support infinite / nan values in text files This patch allows the text scanner to read 'inf' or 'Infinity' from a row and correctly translate it into floating-point infinity. It also adds is_inf() and is_nan() builtins. Finally, we change the text table writer to write Infinity and NaN for compatibility with Hive. In the future, we might consider adding nan / inf literals to our grammar (postgres has this, see: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html). Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483	2014-05-08 12:28:53 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
Nong Li	5adbcbbce5	Update decimal tests to only run on text/none. Change-Id: I9a35f9e1687171fc3f06c17516bca2ea4b9af9e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2217 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2431 Reviewed-by: Nong Li <nong@cloudera.com>	2014-05-02 12:18:37 -07:00
Nong Li	bb3feb675e	Dynamically scale down mem usage in scanners and io mgr. This patch scales down the amount of buffering in the io mgr and the number of scanner threads if the query is close to mem limits. Change-Id: I68ef247a68642939b98ec7c429dfd393b23a20d2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1906 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2417	2014-05-01 15:04:07 -07:00
Skye Wanderman-Milne	60db4d4d82	CDH-18416: Don't inline ReadWriteUtil::ReadZLong() For wide Avro tables, ReadZLong() would get inlined many times into a single function body, causing LLVM to crash. Not inlining doesn't seem to have a performance impact on narrow tables, and helps with wide tables. This change also adds tests over wide (i.e. many-column) tables. The test tables are produced by specifying shell commands to generate test tables in functional_schema_template.sql, which are executed in generate-schema-statements.py. In the SQL templates, sections starting with a ` are treated as shell commands. The output of the shell command is then used as the section text. This is only a starting point; it isn't currently implemented for all sections, and may have to be tweaked if we use this mechanism for all tables. Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353 Tested-by: jenkins	2014-04-28 15:58:15 -07:00
Skye Wanderman-Milne	bd2fc2d1d4	IMPALA-934: Refresh cached UDF library when creating a new function This change adds the ability to refresh a local cache entry, causing the old cache entry to be dropped and the library to be reloaded from HDFS. This is used in ResolveSymbolLookup(), which is called by the frontend when creating a new a function, and in ImpalaServer when receiving a "create function" heartbeat. This change also makes sure the FE calls into the backend for jars, so jars get refreshed as well. Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348	2014-04-24 19:39:16 -07:00
Lenni Kuff	bb09b5270f	IMPALA-839: Update tests to be more thorough when run exhaustively Some tests have constraints that were there only to help reduce runtime which reduces coverage when running in exhaustive mode. The majority of the constraints are because it adds no value to run the test across additional dimensions (or it is invalid to run with those dimensions). Updates the tests that have legitimate constraints to use two new helper methods for constraining the table format dimension: create_uncompressed_text_dimension() create_parquet_dimension() These will create a dimension that will produce a single test vector, either uncompressed text or parquet respectively. Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins (cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290	2014-04-18 20:11:31 -07:00

1 2 3 4 5

225 Commits