impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 18:00:57 -05:00

Author	SHA1	Message	Date
Alex Behm	22ab2595d6	[CDH5] Fixes to expected explain plans due to new HBase version. Change-Id: I33f09283dcea278ca07f9d2d44e542e644def8ca	2014-01-15 15:12:24 -08:00
Nong Li	53d7bbb97a	[CDH5] Impala changes for updated thirdparty components. Changes include: - version changes in impala-config - version changes in various loading scripts - hbase jars are no longer in hive/lib - mini-llama script changes - updates due to sentry api changes - JDBC tests disabled - unsupported types tests disabled. Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee	2014-01-15 15:12:13 -08:00
Alex Behm	6799c93922	Simplified/enhanced explain plans with a total of four explain levels. There are now 4 explain levels summarized as follows: - Level 0: MINIMAL Non-fragmented parallel plan only showing plan nodes with minimal attributes - Level 1: STANDARD Non-fragmented parallel plan with some details in plan nodes - Level 2: EXTENDED Non-fragmented parallel plan with full details in plan nodes including the table/column stats, row size, #hosts, cardinality, and estimated per-host memory requirement - Level 3: VERBOSE Fragmented parallel plan with full details (like level 2) This patch also includes several bugfixes related to plan costing and/or testing of explain plans. Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-10 19:17:59 -08:00
Skye Wanderman-Milne	561da008c7	IMPALA-729: fix resource management in Parquet scanner for multiple row groups We weren't attaching resources to the row batch when starting a new row group, so it was possible for string data to be overwritten. This patch removes CloseStreams() and merges its functionality with AttachCompletedResources() so it's not possible to destroy streams without transferring the resources first. It also merges and removes ScannerContext::Close(). Also adds test cases for IMPALA-720. Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb	2014-01-08 10:56:26 -08:00
Alan Choi	57b961168d	IMP-1188 Fix HBase row key predicates issues This patch fixes a few row key issues: 1. We used to assert that the row key filter must be a string literal. However, it can also be a constant function. We need to eval the expr and then use the result as the start/stop key. 2. Cast(row_key as int) simply failed. This should not be transformed into start/stop key. 3. We used to assert that lower bound < upper bound. This query: select * from tbl where row_key > 'b' and row_key < 'a' would simply ASSERT. We should simply not return any rows. 4. Handle NULL predicate HBase row key can't be null. If either upper/lower bound is null, we simply don't need to return any rows. Change-Id: Ia03590a862888b377bf1f48bcb838b99193fa241 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1180 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:40 -08:00
Alan Choi	468ca0aa5d	IMPALA-723 Fix union with aggregate The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once. Other "materializeSlots" related calls are idempotent, but this one is not. That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value to the array list. We can fix it by making materializeRequiredSlots() idempotent. Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195 Tested-by: jenkins Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: Alan Choi <alan@cloudera.com>	2014-01-08 10:54:40 -08:00
Lenni Kuff	6afea60704	Update test logging to print executable SQL statements and log all actions executed This is the first step in cleaning up the test logging. It provides a common connection interface that provides tracing around all operations. When a test fails the output will be executable SQL. It also logs actions such as when a connection is opened, close, or when an operation is cancelled. Currently only beeswax connections are supported, but I have a seperate patch that adds support for executing using HS2 as well as Beeswax. Example of new logging: -- connecting to: localhost:21000 -- executing against localhost:21000 use functional; SET disable_codegen=False; SET abort_on_error=1; SET batch_size=0; SET num_nodes=0; -- executing against localhost:21000 select a.timestamp_col from alltypessmall a inner join alltypessmall b on (a.timestamp_col = b.timestamp_col) where a.year=2009 and a.month=1 and b.year=2009 and b.month=1; -- closing connection to: localhost:21000 Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:40 -08:00
Alan Choi	00e912b372	IMPALA-715 HBase scanner should use CallIntMethod for int return type function call The hbase-table-scanner used CallShortMethod to retrieve the size of the array. Short will overflow. Because the size of the array is an int, we should use CallIntMethod instead. Change-Id: I941981f7504ee04adf998398f8baf6beae76d000 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1171 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:39 -08:00
Matthew Jacobs	967346b0c4	IMPALA-630: Add fn to get the PID of the impalad to which the user is connected Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:37 -08:00
Chris Channing	7e98708d7d	IMPALA-114: Add support for custom date/time formats This change set adds support for dealing with custom date/time formats in Impala. The following date/time tokens are supported: y – Year M – Month d – Day H – Hour m – Minute s – second S – Fractional second The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21. Formatting character groups can appear in any order along with any separators e.g. yyyy/MM/dd dd-MMM-yy (dd)(MM)(yyyy) HH:mm:sss ..etc.. The following features are not supported with this patch: - Long literal months e.g. MMMM - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd - Lazy formatting Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001 Reviewed-by: Christopher Channing <cchanning@cloudera.com> Tested-by: Christopher Channing <cchanning@cloudera.com>	2014-01-08 10:54:34 -08:00
Alex Behm	74164e8f99	IMPALA-688: Fix column stats computation for HBase row key. Use regex to fix flaky tests. Change-Id: I1d3fb915921bbc5366da0ee51608fd54aa237777 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1135 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:33 -08:00
Alex Behm	e4ad086dee	Added max/avg length for string columns in COMPUTE STATS. Change-Id: I6f61de2323ee12681642684ec633ed4bb7506de2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1079 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:30 -08:00
Alex Behm	dd0409e9d6	IMPALA-509: Minimal type promotion for arithmetic exprs. Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:30 -08:00
Matthew Jacobs	93368e20b1	Fix CROSS JOIN handling in join order optimization and add tests Cross joins should be handled like outer joins in the join order optimization in that the right table referenced by a cross join may not be reordered anywhere before tables referenced to the left of the cross join. If there are inner joins to the right of the cross join, those tables may be reordered before the cross join. E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A and B, but D may be reordered to come before C. Also adds test cases for join order optimization and predicate propagation. Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:29 -08:00
Skye Wanderman-Milne	9e17042185	Allow zero bit width dict/RLE decoders. This allows us to read single-value dictionary-encoded columns generated by parquet-mr. Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	de531e15bd	IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8 parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Matthew Jacobs	f327431a8e	IMPALA-171: Add CROSS JOIN Adds a CROSS JOIN (cartesian product). Common join code is moved from to a new abstract base class BlockingJoinNode. We must keep all build RowBatches in memory in order to iterate over them for every row from the left child. The TupleRowList provides a convenient way to iterate over all of the rows. A future change will address codegen for the CrossJoinNode. Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/970 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne	acdc792355	IMPALA-695: Use the local path of Hive UDF jars in the FE. The FE was creating class loaders with the HDFS locations of Hive UDF libs, rather than the local locations created by the BE. Our tests still passed since we only used UDFs already on the classpath (e.g. Hive builtins). Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne	b54d16dabd	IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions. Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Alex Behm	24662b1941	Allow ALTER TABLE to set per-partition serde and table properties. The main motivation is to allow users to set the per-partition number of rows for manual incremental stats maintenance, as well as a means to 'drop' stats that may have caused undesirable plan changes. Change-Id: Iff38317a993e5d7952ea4df839947f5ec341e930 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1010 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Lenni Kuff	bfb16ff552	Disable SHOW STATS tests because results are unstable (IMPALA-688) Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:22 -08:00
Nong Li	e3fdef7839	Fix subexpr elimination IR rewriting. Change-Id: Iabdcc1686951e71136a603ed30f9d16fb1c1ec46 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1056 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Lenni Kuff	0bae3978c9	Update compute-stats.py to execute using Impala Updates our compute stats script to execute using Impala. This allows us to easily compute stats on all tables in a database or all tables in the metastore. The updated stats caused one of the TPCH plans to change so this also updates the TPCH planner test results. Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Nong Li	ab21dde002	Update compute stats test to use regex for parquet/hbase file size. The parquet file stores the application version that wrote it so is different between our c4 and c5 branches. HBase storage is also not guaranteed to be identical across versions. Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:17 -08:00
Alex Behm	93e5b262c2	Added COMPUTE STATS command for gathering table and column stats. A compute stats command computes the table and column stats for a given table and persists them in the metastore. The table stats consist of the per-partition and per-table row count. The column stats are computed on a per-table basis and consist of the number of distinct values and the number of NULLs per column. This patch introduces a new 'child query' concept that compute stats utilizes. Child queries are cancelled if the parent query is cancelled. A compute stats stmt is executed by the following query hirarchy: parent: compute stats query (DDL) - child: compute table stats query (QUERY) - child: compute column stats query (QUERY) The new child query concept is necessary to decouple child query fetches from parent query fetches, i.e., we could not execute a child query as part of the original compute stats query, because then a client could fetch the results we need for updating the Metastore statistics. The reason why our existing CTAS works without this decoupling is that its insert 'child query' is not fetchable. Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd Reviewed-on: http://gerrit.ent.cloudera.com:8080/873 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:14 -08:00
Skye Wanderman-Milne	49f4bd285a	Change test_metadata_query_statements.py::test_show to ignore Parquet file sizes since they may fluctuate slightly. Change-Id: I3ddb6ceebe6dcc86cc1c58b35b0cd96986ec43e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/988 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:14 -08:00
Nong Li	7f08146b88	Add ndv (distinct estimate) as a builtin aggregate function. This is implemented in the BE using HLL (but we could change this in the future). These estimates usually work better than the other algorithm we have and we've not implemented all the improvements from the google paper. Change-Id: Ied715ddd0e1a7cbe7f5f90469f1ed3d4b9c537c7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/956 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:03 -08:00
Matthew Jacobs	8a55982105	Add OFFSET to skip rows returned with a LIMIT Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive does not support OFFSET so creating a view with an OFFSET will not work in Hive. For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip 5 rows, then return the next 20. OFFSET requires an ORDER BY clause. Note this is not very efficient as we must actually keep (limit+offset) rows in memory in the topn-node, and all child sort nodes must as well. Users should be careful when using this feature. Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d Reviewed-on: http://gerrit.ent.cloudera.com:8080/900 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:02 -08:00
Skye Wanderman-Milne	9147cd7518	IMPALA-525: Adjust IO buffer size based on read length and other memory fixes We were previously wasting memory by always reading into 8MB IO buffers, even when the data read was much less than 8MB. With this patch, the IO manager picks a buffer size closer to the actual amount being read (we don't use the exact size so we can continue to recycle buffers). The minimum IO buffer size is determined via the --min_buffer_size flag, and the max IO buffer size via the --read_size flag. This technique also helps with IMPALA-652, since short columns will not use as much memory as before (we will not use considerably more memory than the size of the table). This patch also changes StringBuffer to use a doubling strategy so it doesn't end up allocating many large unused buffers, and has the scanner context use the requested length as the sync read size if it's larger than the size produced by read_past_size_cb(). These changes help prevent the boundary buffer in the scanner context from allocating excess memory. Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/938 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:01 -08:00
Lenni Kuff	6bba0c8ffe	Fix bug cleaning up removed Functions and fix test_ddl to create all test dbs When dropping functions, we neeed to remove the function from the list of Functions with that name AND remove the list from the Function map if the list is empty. The second part wasn't happening. Also fixes the test_ddl to properly create all test databases. Change-Id: Id85af7d5db74a31161f48bea3816bdf734063133 Reviewed-on: http://gerrit.ent.cloudera.com:8080/952 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:00 -08:00
Lenni Kuff	39f77b8b8f	Add support for cluster-synchronized catalog operations This change adds support for cluster-synchronized catalog operations. This provides the guaranteethat after a catalog op completes, all other subscribers to the catalog topic have also processed that update. This is useful when load balancing, because a common workflow is to target a different impalad for each statement executed. For example if each of the following were executed sequentially, but targeting a different node: 1) CREATE TABLE Foo 2) INSERT INTO Foo 3) SELECT * FROM Foo 4) INSERT INTO Foo .... Since both the INSERT and the CREATE update the catalog, it would not work as expected without this patch. The user might either get a "table not found" error or would be missing partition information from the INSERT. The downside is that this approach to DDL takes a bit longer because we need to wait until all subscribers have processed an update. If all nodes are healthy, this overhead should not be significantly longer than the current DDL time. However, a single bad node might slow down or completely block the completion of all DDL operations. By default this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1 To test this, the base test suite was updated to support selecting a random impalad to execute each query section in a query test file. This is currently only enabled for the insert and DDL tests, but could be leveraged by more tests in the future. TODO: Add additional failure tests around this functionality. TODO: Add an explicit "sync" statement so users do not need to run all their DDL in this mode (since it is slower). Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83 Reviewed-on: http://gerrit.ent.cloudera.com:8080/899 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:58 -08:00
Lenni Kuff	8bb0010415	IMPALA-597: Do not crash when multiple partitions have the same LOCATION This patch fixes an issue where Impala would crash if two partitions had the same HDFS location. This is now fixed in hdfs-scan-node. It also includes some cleanup and bug fixes to the FE partition related classes and adds tests. There is still a problem where partition location metadata is not sent to the BE for INSERT statements, but that will be resolved in a separate patch. Change-Id: I0f1c3113d654f7d2b410f00e793ff6b0cae1ae18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/876 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:57 -08:00
Matthew Jacobs	51bfc99c63	IMPALA-395: Impala "show create table" statement Adds support for "show create table", a DDL statement that outputs a DDL statement that creates the specified table. In general, the output DDL works in Impala, so a user can copy the output and execute it to create the same table. However, there are a few special cases that output Hive DDL because we do not support creating some tables in Impala: HBase tables and tables with LZO compressed text. When we do support creating these tables in Impala, users should be able to execute the DDL in Impala as well. Change-Id: I8c130297a657810dea5b994bf99d72b0e61b847b Reviewed-on: http://gerrit.ent.cloudera.com:8080/842 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:53 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
Matthew Jacobs	00bc971d34	IMPALA-531: Allow arithmetic expressions for LIMIT Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/829 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:49 -08:00
Matthew Jacobs	65353fd9fb	IMPALA-598: Order by behavior for NULLs should be revisited This change modifies that behavior of NULL ordering such that nulls always compare greater than other values, but "nulls first" or "nulls last" can be used to explicitly specify if nulls should be sorted first or last regardless of the asc/desc. Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/812 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:48 -08:00
Skye Wanderman-Milne	9d05d6d03a	Allow UDF tests to run in parallel. Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d Reviewed-on: http://gerrit.ent.cloudera.com:8080/727 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:47 -08:00
Skye Wanderman-Milne	7e8e184acf	Allow UDFs in conjunct expressions. This patch refactors HDFSScanNode to copy and prepare all conjunct exprs in Prepare(), rather than in the scanner threads. This is necessary so the UDF exprs get codegen'd. Prepare() also only codegens the functions for the necessary file formats now, rather than for all file formats regardless of what's actually be scanned. Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/710 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	97a6b12e37	Fix UDFs used in partition pruning exprs. Exprs used for partition pruning are prepared/evaluated with a separate RuntimeState. If these exprs use UDFs, the runtime state needs access to the process's ExecEnv so we can use the LibCache and the IR produced by the UDF exprs needs to be optimized and jit'd. Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625 Reviewed-on: http://gerrit.ent.cloudera.com:8080/695 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Nong Li	601f24a198	UDA execution loose ends. Unfortunately, the BE does not have the codegen path to execute UDAs. This puts some restrictions on the UDAs we can run. - No IR UDAs - No varargs - Must have 8 arguments or less. The code to do this is almost all there for UDFs but I'm not sure I'll get to it. Change-Id: I8a06e635a9138397c8474a5704c3e588bb92347b Reviewed-on: http://gerrit.ent.cloudera.com:8080/703 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:38 -08:00
Nong Li	a944a1fe52	'Invalidate metadata' no longer clears user functions. Change-Id: I36de18fefa1d515a7960c2bf8c116d5217c388d6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/726 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:36 -08:00
Alex Behm	b670b9f4f9	Fix switch to hive-exec from hive-builtins in UDF test file. Change-Id: Ibb75e129ea6c3da5ede9e8e399e537e3e561e814 Reviewed-on: http://gerrit.ent.cloudera.com:8080/723 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:35 -08:00
Lenni Kuff	01c8c43fec	Uniquify FUNCTION catalog topic entry keys by including parent database name Change-Id: I6aa49520f548ddfcd557e2f908a09be454765e8c Reviewed-on: http://gerrit.ent.cloudera.com:8080/698 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:29 -08:00
Alex Behm	b82880738c	IMPALA-617: Cast NULLs in INSERT statement due to incomplete permutation list to expected column type. Inserting NULLs with NULL_TYPE into Parquet tables cases a crash. Change-Id: I350c7ee2789c017cee5c4b6a1292c9fae36087f1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/696 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:29 -08:00
Skye Wanderman-Milne	b41ff0c8cd	Modify test-udfs.cc so there are no undefined symbols in shared library. AnalyzeDDLTest was failing because the fesupport binary couldn't resolve a function used in libTestUdfs.so (the function was defined in udf.cc, rather than udf.h). I couldn't figure out how to cleanly build udf.cc into the libTestUdfs.so, so instead I removed the use of the function in test-udfs.cc. Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110 Reviewed-on: http://gerrit.ent.cloudera.com:8080/694 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:27 -08:00
Nong Li	911cfc1bb9	Fix vararg UDFs. Change-Id: I0e202b984ece7de3d220b6ce89b0c0a4c9edcb45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/688 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:26 -08:00
Nong Li	4800995d44	Add execution for Hive UDFs. Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500 Reviewed-on: http://gerrit.ent.cloudera.com:8080/458 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:25 -08:00
Nong Li	904289d168	Add UDA execution. Change-Id: Ie5aab79742675fc62ed731c13abe83304df80991 Reviewed-on: http://gerrit.ent.cloudera.com:8080/642 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:24 -08:00
Alex Behm	3f54240fed	PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode. Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c Reviewed-on: http://gerrit.ent.cloudera.com:8080/678 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:23 -08:00

1 2 3 4 5 ...

297 Commits