impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 03:01:02 -05:00

Author	SHA1	Message	Date
Alex Behm	7fcd7cd64e	Add list of tables missing stats to explain header and mem-limit exceeded error. Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880	2014-03-12 21:15:22 -07:00
Alex Behm	58950a52a3	IMPALA-798: Distributed execution of CTAS and explain CTAS. Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867	2014-03-12 16:51:50 -07:00
Matthew Jacobs	8fa8a0f828	IMPALA-843: Do not close reader contexts until plan fragment close Fixes a crash that occurs in some cases when io buffers are still used and child nodes are closed early. We close child nodes early when all rows have been consumed and resources are transfered, but in some cases io buffers are still in use when a scan node is closed. We avoid this problem by only closing reader contexts when the entire fragment is closed. Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859	2014-03-12 14:44:18 -07:00
Alex Behm	748ea3f38b	Fix test_partitioning.py and expected results. Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-12 11:25:17 -07:00
Lenni Kuff	08417c875f	IMPALA-849: Impala does not work with boolean partition key columns This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral (incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr. As part of this change it turns out Boolean partition columns are also broken in Hive. I filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition column for Impala due to this bug. Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-03-11 18:42:08 -07:00
Alex Behm	47c52ade84	IMPALA-866: Make HdfsScanNode.computeStats() idempotent with respect to totalBytes_. Change-Id: I1c243b089db82c0544586a2a1428081aa2dbcd52 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1844 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1852	2014-03-11 18:20:15 -07:00
Nong Li	5022aa08fb	IMPALA-869: Fix result initialization for MIN(). Change-Id: I50eceb04c0eb1c9eedb9c963cb75d2fc5aeb4825 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1847 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-11 17:26:31 -07:00
Nong Li	f6de8d9e30	IMPALA-765: Fix subexpr elimination codegen optimization. The previous implementation did not properly handle replacing the is_null return argument from expr calls. Change-Id: I96cd0dfca8876b4f914b0cbc4eb459ea3dcdf230 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1795 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-10 15:20:53 -07:00
Alex Behm	a615ebc549	IMPALA-822,IMP-1271: Binding predicates on an aggregation now properly trigger slot materialization. The bug was that the number of materialized agg-tuple slots did not correspond to the number of materialized agg functions, due to binding predicates against an AggNode causing slot materialization after SelectStmt.materializeRequiredSlots(). This patch fixes the issue by taking binding predicates (bound to a slot in an agg tuple) into consideration in SelectStmt.materializeRequiredSlots(). I added a new sanity check in AggregationNode.toThrift() surfaced another issue with slot materialization that is also fixed in this patch. The ordering exprs must be marked before the agg exprs in SelectStmt.materializeRequiredSlots() because the odering exprs may contain agg exprs that are only referenced inside the ORDER BY clause. Change-Id: I1bdc0466f583907bed625ce6608938e59faee83f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1639 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1818 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-08 00:25:26 -08:00
Alex Behm	f7c2781afe	IMPALA-845: Transfer predicates to 2nd phase merge agg in some cases. Having predicates need to be transferred to the 2nd phase merge agg for distinct + non-distinct aggregates without group by. For distinct + non-distinct aggregates with group by, it is correct to evaluate the predicates at the 2nd phase (non-merge) agg. Change-Id: I71d73c4ef92becbb81e142bc0cb5f54e790b1fb5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1743 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1817	2014-03-07 21:45:16 -08:00
Alex Behm	66a6c1f312	Fix UDF query test files. Change-Id: Idea277ea2d20c47b2a81b0f2f06c48455de2ea45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1780 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-03-06 07:37:14 -08:00
Skye Wanderman-Milne	6ceed1e632	UDF API additions This patch introduces the ability to specify a prepare and close function for a UDF, as well as FunctionContext methods for maintaining state across UDF invocations within a query. Many of the changes are related to adding an Expr::Open() function which calls the UDF's prepare function, if specified (it has to be called in Open() since the LLVM module must be compiled first). Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757	2014-03-05 07:32:34 -08:00
Alex Behm	69a840d965	Consistent memory estimates for explain tests. Our new build machines (e.g., beefy) have more cores than our other machines, so scan nodes may have a different memory estimate causing the explain tests to fail. This patch fixes the num_scanner_threads to 1 for explain tests to ensure consisteny estimates. Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-03-05 05:38:30 -08:00
Skye Wanderman-Milne	203fc66456	Add GetTypeDesc() method to FunctionContext. This is currently only implemented for NativeUdfExpr. Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720	2014-03-01 20:24:30 -08:00
Nong Li	80658d9eab	IMPALA-828: Fix avro codegen if conjuncts cannot be codegend. Change-Id: I9ff0214e541eb958132fbe5b7798883db91ef025 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1695 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-27 19:58:13 -08:00
Alex Behm	cb8150e8ee	IMPALA-817: Check equality of function name in Function.equals(). Change-Id: Ib9b4ee3a21f90fdb0d7ebccd89462dc67040bd1e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1594 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1611 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-02-19 17:13:51 -08:00
Nong Li	0d2919fe7f	Refactor scalar and aggregate function analysis and execution. This patch cleans up analysis and execution of scalar and aggregate functions so that there is no difference between how builtins and user functions are handled. The only difference is that the catalog is populated with the builtins all the time. The BE always gets a TFunction object and just executes it (builtins will have an empty hdfs file location). This removes the opcode registry and all of the functionality is subsumed by the catalog, most of which was already duplicated there anyway. This also introduces the concept of a system database; databases that the user cannot modify and is populated automatically on startup. Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577	2014-02-18 18:40:08 -08:00
Lenni Kuff	95404d4888	Support prioritized background table loading The overall goal of this change allow for table metadata to be loaded in the background but also to allow prioritization of loading on an as-needed basis. As part of analysis, any tables that are not loaded are tracked and if analysis fails the Impalad will make an RPC to the CatalogServer to requiest the metadata loading of these tables be prioritized and analysis will be restarted. To support this, the CatalogServer now has a deque of the tables to load. For background loading, tables to load are added to the tail of the deque. However, a new CatalogServer RPC was added that can prioritize the loading of one or more tables in which case they will get added to the head of the deque. The next table to load is always taken from the head. This helps prioritize loading but is admittedly not the most fair approach. The support the prioritized loading, some changes had to made on the Impalad side during analysis: - During analysis, any tables that are missing metadata are tracked. - Analysis now runs in a loop. If it fails due to an AnalysisException AND at least 1 table/view was missing metadata, these tables missing metadata are requested to be loaded by calling the CatalogServer. - The impalad will wait until the required tables are received (by getting notified each time there is a call to updateCatalog()), and waiting to run analysis until all tables are available. Once the tables are available, analysis will restart. This change also introduces two new flags: --load_catalog_in_background (bool). When this is true (the default) the catalog server will run a period background thread to queue all unloaded tables for loading. This is generally the desired behavior, but there may be some cases (very large metastores) where this may need to be disabled. --num_metadata_loading_threads (int32). The number of threads to use when loading catalog metadata (degree of parallelism). The default is 16, but it can be increased to improve performance at the cost of stressing the Hive metastore/HDFS. Change-Id: Ib94dbbf66ffcffea8c490f50f5c04d19fb2078ad Reviewed-on: http://gerrit.ent.cloudera.com:8080/1476 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1538	2014-02-13 23:43:06 -08:00
Nong Li	d5d4b4785b	Fix broken udf test case. Should not specify DB. Change-Id: I5f6343cbef9f52d349130360e029b38b23d0187a Reviewed-on: http://gerrit.ent.cloudera.com:8080/1505 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 11:34:56 -08:00
Nong Li	1a55133f0a	IMPALA-735. Fix codegen bug affecting outer joins. Change-Id: I99ca45b558fb2ed694f261a22e7e91e59f1ad675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1496 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-02-10 05:00:21 -08:00
Nong Li	7d578a9e54	Cleanup for IMPALA-774 fix. Change-Id: I47bce71c482b3576957e88980f764c30f45229a9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1454 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1470	2014-02-05 22:58:51 -08:00
Henry Robinson	16af29ea5f	IMPALA-770: Fix crash in aggregation node with zero-width tuple The select exprs of an inline view may not always be materialised, yet the output tuple itself may be. This patch fixes a crash in this situation in the backend aggregation node which assumed its output tuple would always have at least one materialised slot. The cause was a couple of too-conservative DCHECKs that failed if the tuple was NULL. In fact, the code was robust to this possibility without the checks, so this bug didn't affect release builds of Impala. Change-Id: If0b90809d30fcd196f55197953392452d1ac9c4f Reviewed-on: http://gerrit.ent.cloudera.com:8080/1431 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 8c1c21b66c43e900760ace54d090305f32a85a1f) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1471 Tested-by: Henry Robinson <henry@cloudera.com>	2014-02-05 22:01:35 -08:00
Nong Li	ccd8c0338f	IMPALA-774: Fix runtimestate setup when evaluating expr from FE. We weren't initializing the udf mem pool causing UDFs to return strings to crash if used as part of a constant expression. Change-Id: Ic3a0e556aec8ce03a9e59f3ccf6980c682046b50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1447 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-02-05 11:02:27 -08:00
Alex Behm	3f68be2caa	IMP-1227: Ignore columns of unsupported types in compute stats. Enclose identifiers that are Impala keywords in quotes. Change-Id: Ie7fa6da2869090428c9229c44b973ecccbb49e8e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1357 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1368	2014-01-28 17:18:17 -08:00
Lenni Kuff	e97a1b52e0	Remove flaky verification in ALTER/CREATE table tests This fixes the flaky ALTER/CREATE tests by removing a verification step that didn't add value and was non-deterministic. The verficiation step that was removed verified that CREATE/ALTER set the appropriate file format by changing the format to something that didn't match the underlying data files, then attempting to read the data. This is already covered by the positive test case where the file format is changed to match the underlying data. Change-Id: I66f485405234f472f3b83f3e776bf7f2c10de874 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1379 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1382 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-28 16:03:02 -08:00
Alex Behm	22ab2595d6	[CDH5] Fixes to expected explain plans due to new HBase version. Change-Id: I33f09283dcea278ca07f9d2d44e542e644def8ca	2014-01-15 15:12:24 -08:00
Nong Li	53d7bbb97a	[CDH5] Impala changes for updated thirdparty components. Changes include: - version changes in impala-config - version changes in various loading scripts - hbase jars are no longer in hive/lib - mini-llama script changes - updates due to sentry api changes - JDBC tests disabled - unsupported types tests disabled. Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee	2014-01-15 15:12:13 -08:00
Alex Behm	6799c93922	Simplified/enhanced explain plans with a total of four explain levels. There are now 4 explain levels summarized as follows: - Level 0: MINIMAL Non-fragmented parallel plan only showing plan nodes with minimal attributes - Level 1: STANDARD Non-fragmented parallel plan with some details in plan nodes - Level 2: EXTENDED Non-fragmented parallel plan with full details in plan nodes including the table/column stats, row size, #hosts, cardinality, and estimated per-host memory requirement - Level 3: VERBOSE Fragmented parallel plan with full details (like level 2) This patch also includes several bugfixes related to plan costing and/or testing of explain plans. Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-10 19:17:59 -08:00
Skye Wanderman-Milne	561da008c7	IMPALA-729: fix resource management in Parquet scanner for multiple row groups We weren't attaching resources to the row batch when starting a new row group, so it was possible for string data to be overwritten. This patch removes CloseStreams() and merges its functionality with AttachCompletedResources() so it's not possible to destroy streams without transferring the resources first. It also merges and removes ScannerContext::Close(). Also adds test cases for IMPALA-720. Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb	2014-01-08 10:56:26 -08:00
Alan Choi	57b961168d	IMP-1188 Fix HBase row key predicates issues This patch fixes a few row key issues: 1. We used to assert that the row key filter must be a string literal. However, it can also be a constant function. We need to eval the expr and then use the result as the start/stop key. 2. Cast(row_key as int) simply failed. This should not be transformed into start/stop key. 3. We used to assert that lower bound < upper bound. This query: select * from tbl where row_key > 'b' and row_key < 'a' would simply ASSERT. We should simply not return any rows. 4. Handle NULL predicate HBase row key can't be null. If either upper/lower bound is null, we simply don't need to return any rows. Change-Id: Ia03590a862888b377bf1f48bcb838b99193fa241 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1180 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:40 -08:00
Alan Choi	468ca0aa5d	IMPALA-723 Fix union with aggregate The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once. Other "materializeSlots" related calls are idempotent, but this one is not. That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value to the array list. We can fix it by making materializeRequiredSlots() idempotent. Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195 Tested-by: jenkins Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: Alan Choi <alan@cloudera.com>	2014-01-08 10:54:40 -08:00
Lenni Kuff	6afea60704	Update test logging to print executable SQL statements and log all actions executed This is the first step in cleaning up the test logging. It provides a common connection interface that provides tracing around all operations. When a test fails the output will be executable SQL. It also logs actions such as when a connection is opened, close, or when an operation is cancelled. Currently only beeswax connections are supported, but I have a seperate patch that adds support for executing using HS2 as well as Beeswax. Example of new logging: -- connecting to: localhost:21000 -- executing against localhost:21000 use functional; SET disable_codegen=False; SET abort_on_error=1; SET batch_size=0; SET num_nodes=0; -- executing against localhost:21000 select a.timestamp_col from alltypessmall a inner join alltypessmall b on (a.timestamp_col = b.timestamp_col) where a.year=2009 and a.month=1 and b.year=2009 and b.month=1; -- closing connection to: localhost:21000 Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:40 -08:00
Alan Choi	00e912b372	IMPALA-715 HBase scanner should use CallIntMethod for int return type function call The hbase-table-scanner used CallShortMethod to retrieve the size of the array. Short will overflow. Because the size of the array is an int, we should use CallIntMethod instead. Change-Id: I941981f7504ee04adf998398f8baf6beae76d000 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1171 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:39 -08:00
Matthew Jacobs	967346b0c4	IMPALA-630: Add fn to get the PID of the impalad to which the user is connected Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:37 -08:00
Chris Channing	7e98708d7d	IMPALA-114: Add support for custom date/time formats This change set adds support for dealing with custom date/time formats in Impala. The following date/time tokens are supported: y – Year M – Month d – Day H – Hour m – Minute s – second S – Fractional second The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21. Formatting character groups can appear in any order along with any separators e.g. yyyy/MM/dd dd-MMM-yy (dd)(MM)(yyyy) HH:mm:sss ..etc.. The following features are not supported with this patch: - Long literal months e.g. MMMM - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd - Lazy formatting Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001 Reviewed-by: Christopher Channing <cchanning@cloudera.com> Tested-by: Christopher Channing <cchanning@cloudera.com>	2014-01-08 10:54:34 -08:00
Alex Behm	74164e8f99	IMPALA-688: Fix column stats computation for HBase row key. Use regex to fix flaky tests. Change-Id: I1d3fb915921bbc5366da0ee51608fd54aa237777 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1135 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:33 -08:00
Alex Behm	e4ad086dee	Added max/avg length for string columns in COMPUTE STATS. Change-Id: I6f61de2323ee12681642684ec633ed4bb7506de2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1079 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:30 -08:00
Alex Behm	dd0409e9d6	IMPALA-509: Minimal type promotion for arithmetic exprs. Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:30 -08:00
Matthew Jacobs	93368e20b1	Fix CROSS JOIN handling in join order optimization and add tests Cross joins should be handled like outer joins in the join order optimization in that the right table referenced by a cross join may not be reordered anywhere before tables referenced to the left of the cross join. If there are inner joins to the right of the cross join, those tables may be reordered before the cross join. E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A and B, but D may be reordered to come before C. Also adds test cases for join order optimization and predicate propagation. Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:29 -08:00
Skye Wanderman-Milne	9e17042185	Allow zero bit width dict/RLE decoders. This allows us to read single-value dictionary-encoded columns generated by parquet-mr. Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	de531e15bd	IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8 parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Matthew Jacobs	f327431a8e	IMPALA-171: Add CROSS JOIN Adds a CROSS JOIN (cartesian product). Common join code is moved from to a new abstract base class BlockingJoinNode. We must keep all build RowBatches in memory in order to iterate over them for every row from the left child. The TupleRowList provides a convenient way to iterate over all of the rows. A future change will address codegen for the CrossJoinNode. Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/970 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne	acdc792355	IMPALA-695: Use the local path of Hive UDF jars in the FE. The FE was creating class loaders with the HDFS locations of Hive UDF libs, rather than the local locations created by the BE. Our tests still passed since we only used UDFs already on the classpath (e.g. Hive builtins). Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne	b54d16dabd	IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions. Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Alex Behm	24662b1941	Allow ALTER TABLE to set per-partition serde and table properties. The main motivation is to allow users to set the per-partition number of rows for manual incremental stats maintenance, as well as a means to 'drop' stats that may have caused undesirable plan changes. Change-Id: Iff38317a993e5d7952ea4df839947f5ec341e930 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1010 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Lenni Kuff	bfb16ff552	Disable SHOW STATS tests because results are unstable (IMPALA-688) Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:22 -08:00
Nong Li	e3fdef7839	Fix subexpr elimination IR rewriting. Change-Id: Iabdcc1686951e71136a603ed30f9d16fb1c1ec46 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1056 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Lenni Kuff	0bae3978c9	Update compute-stats.py to execute using Impala Updates our compute stats script to execute using Impala. This allows us to easily compute stats on all tables in a database or all tables in the metastore. The updated stats caused one of the TPCH plans to change so this also updates the TPCH planner test results. Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Nong Li	ab21dde002	Update compute stats test to use regex for parquet/hbase file size. The parquet file stores the application version that wrote it so is different between our c4 and c5 branches. HBase storage is also not guaranteed to be identical across versions. Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:17 -08:00

1 2 3 4 5

222 Commits