impala

mirror of https://github.com/apache/impala.git synced 2026-01-02 03:00:32 -05:00

Author	SHA1	Message	Date
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Nong Li	ab21dde002	Update compute stats test to use regex for parquet/hbase file size. The parquet file stores the application version that wrote it so is different between our c4 and c5 branches. HBase storage is also not guaranteed to be identical across versions. Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:17 -08:00
Alex Behm	93e5b262c2	Added COMPUTE STATS command for gathering table and column stats. A compute stats command computes the table and column stats for a given table and persists them in the metastore. The table stats consist of the per-partition and per-table row count. The column stats are computed on a per-table basis and consist of the number of distinct values and the number of NULLs per column. This patch introduces a new 'child query' concept that compute stats utilizes. Child queries are cancelled if the parent query is cancelled. A compute stats stmt is executed by the following query hirarchy: parent: compute stats query (DDL) - child: compute table stats query (QUERY) - child: compute column stats query (QUERY) The new child query concept is necessary to decouple child query fetches from parent query fetches, i.e., we could not execute a child query as part of the original compute stats query, because then a client could fetch the results we need for updating the Metastore statistics. The reason why our existing CTAS works without this decoupling is that its insert 'child query' is not fetchable. Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd Reviewed-on: http://gerrit.ent.cloudera.com:8080/873 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:14 -08:00
Skye Wanderman-Milne	49f4bd285a	Change test_metadata_query_statements.py::test_show to ignore Parquet file sizes since they may fluctuate slightly. Change-Id: I3ddb6ceebe6dcc86cc1c58b35b0cd96986ec43e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/988 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:14 -08:00
Nong Li	7f08146b88	Add ndv (distinct estimate) as a builtin aggregate function. This is implemented in the BE using HLL (but we could change this in the future). These estimates usually work better than the other algorithm we have and we've not implemented all the improvements from the google paper. Change-Id: Ied715ddd0e1a7cbe7f5f90469f1ed3d4b9c537c7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/956 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:03 -08:00
Matthew Jacobs	8a55982105	Add OFFSET to skip rows returned with a LIMIT Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive does not support OFFSET so creating a view with an OFFSET will not work in Hive. For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip 5 rows, then return the next 20. OFFSET requires an ORDER BY clause. Note this is not very efficient as we must actually keep (limit+offset) rows in memory in the topn-node, and all child sort nodes must as well. Users should be careful when using this feature. Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d Reviewed-on: http://gerrit.ent.cloudera.com:8080/900 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:02 -08:00
Skye Wanderman-Milne	9147cd7518	IMPALA-525: Adjust IO buffer size based on read length and other memory fixes We were previously wasting memory by always reading into 8MB IO buffers, even when the data read was much less than 8MB. With this patch, the IO manager picks a buffer size closer to the actual amount being read (we don't use the exact size so we can continue to recycle buffers). The minimum IO buffer size is determined via the --min_buffer_size flag, and the max IO buffer size via the --read_size flag. This technique also helps with IMPALA-652, since short columns will not use as much memory as before (we will not use considerably more memory than the size of the table). This patch also changes StringBuffer to use a doubling strategy so it doesn't end up allocating many large unused buffers, and has the scanner context use the requested length as the sync read size if it's larger than the size produced by read_past_size_cb(). These changes help prevent the boundary buffer in the scanner context from allocating excess memory. Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/938 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:01 -08:00
Lenni Kuff	6bba0c8ffe	Fix bug cleaning up removed Functions and fix test_ddl to create all test dbs When dropping functions, we neeed to remove the function from the list of Functions with that name AND remove the list from the Function map if the list is empty. The second part wasn't happening. Also fixes the test_ddl to properly create all test databases. Change-Id: Id85af7d5db74a31161f48bea3816bdf734063133 Reviewed-on: http://gerrit.ent.cloudera.com:8080/952 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:00 -08:00
Lenni Kuff	39f77b8b8f	Add support for cluster-synchronized catalog operations This change adds support for cluster-synchronized catalog operations. This provides the guaranteethat after a catalog op completes, all other subscribers to the catalog topic have also processed that update. This is useful when load balancing, because a common workflow is to target a different impalad for each statement executed. For example if each of the following were executed sequentially, but targeting a different node: 1) CREATE TABLE Foo 2) INSERT INTO Foo 3) SELECT * FROM Foo 4) INSERT INTO Foo .... Since both the INSERT and the CREATE update the catalog, it would not work as expected without this patch. The user might either get a "table not found" error or would be missing partition information from the INSERT. The downside is that this approach to DDL takes a bit longer because we need to wait until all subscribers have processed an update. If all nodes are healthy, this overhead should not be significantly longer than the current DDL time. However, a single bad node might slow down or completely block the completion of all DDL operations. By default this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1 To test this, the base test suite was updated to support selecting a random impalad to execute each query section in a query test file. This is currently only enabled for the insert and DDL tests, but could be leveraged by more tests in the future. TODO: Add additional failure tests around this functionality. TODO: Add an explicit "sync" statement so users do not need to run all their DDL in this mode (since it is slower). Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83 Reviewed-on: http://gerrit.ent.cloudera.com:8080/899 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:58 -08:00
Lenni Kuff	8bb0010415	IMPALA-597: Do not crash when multiple partitions have the same LOCATION This patch fixes an issue where Impala would crash if two partitions had the same HDFS location. This is now fixed in hdfs-scan-node. It also includes some cleanup and bug fixes to the FE partition related classes and adds tests. There is still a problem where partition location metadata is not sent to the BE for INSERT statements, but that will be resolved in a separate patch. Change-Id: I0f1c3113d654f7d2b410f00e793ff6b0cae1ae18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/876 Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:57 -08:00
Matthew Jacobs	51bfc99c63	IMPALA-395: Impala "show create table" statement Adds support for "show create table", a DDL statement that outputs a DDL statement that creates the specified table. In general, the output DDL works in Impala, so a user can copy the output and execute it to create the same table. However, there are a few special cases that output Hive DDL because we do not support creating some tables in Impala: HBase tables and tables with LZO compressed text. When we do support creating these tables in Impala, users should be able to execute the DDL in Impala as well. Change-Id: I8c130297a657810dea5b994bf99d72b0e61b847b Reviewed-on: http://gerrit.ent.cloudera.com:8080/842 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:53 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
Matthew Jacobs	00bc971d34	IMPALA-531: Allow arithmetic expressions for LIMIT Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/829 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:49 -08:00
Matthew Jacobs	65353fd9fb	IMPALA-598: Order by behavior for NULLs should be revisited This change modifies that behavior of NULL ordering such that nulls always compare greater than other values, but "nulls first" or "nulls last" can be used to explicitly specify if nulls should be sorted first or last regardless of the asc/desc. Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/812 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:48 -08:00
Skye Wanderman-Milne	9d05d6d03a	Allow UDF tests to run in parallel. Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d Reviewed-on: http://gerrit.ent.cloudera.com:8080/727 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:47 -08:00
Skye Wanderman-Milne	7e8e184acf	Allow UDFs in conjunct expressions. This patch refactors HDFSScanNode to copy and prepare all conjunct exprs in Prepare(), rather than in the scanner threads. This is necessary so the UDF exprs get codegen'd. Prepare() also only codegens the functions for the necessary file formats now, rather than for all file formats regardless of what's actually be scanned. Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/710 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne	97a6b12e37	Fix UDFs used in partition pruning exprs. Exprs used for partition pruning are prepared/evaluated with a separate RuntimeState. If these exprs use UDFs, the runtime state needs access to the process's ExecEnv so we can use the LibCache and the IR produced by the UDF exprs needs to be optimized and jit'd. Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625 Reviewed-on: http://gerrit.ent.cloudera.com:8080/695 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:39 -08:00
Nong Li	601f24a198	UDA execution loose ends. Unfortunately, the BE does not have the codegen path to execute UDAs. This puts some restrictions on the UDAs we can run. - No IR UDAs - No varargs - Must have 8 arguments or less. The code to do this is almost all there for UDFs but I'm not sure I'll get to it. Change-Id: I8a06e635a9138397c8474a5704c3e588bb92347b Reviewed-on: http://gerrit.ent.cloudera.com:8080/703 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:38 -08:00
Nong Li	a944a1fe52	'Invalidate metadata' no longer clears user functions. Change-Id: I36de18fefa1d515a7960c2bf8c116d5217c388d6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/726 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:36 -08:00
Alex Behm	b670b9f4f9	Fix switch to hive-exec from hive-builtins in UDF test file. Change-Id: Ibb75e129ea6c3da5ede9e8e399e537e3e561e814 Reviewed-on: http://gerrit.ent.cloudera.com:8080/723 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:35 -08:00
Lenni Kuff	01c8c43fec	Uniquify FUNCTION catalog topic entry keys by including parent database name Change-Id: I6aa49520f548ddfcd557e2f908a09be454765e8c Reviewed-on: http://gerrit.ent.cloudera.com:8080/698 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:29 -08:00
Alex Behm	b82880738c	IMPALA-617: Cast NULLs in INSERT statement due to incomplete permutation list to expected column type. Inserting NULLs with NULL_TYPE into Parquet tables cases a crash. Change-Id: I350c7ee2789c017cee5c4b6a1292c9fae36087f1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/696 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:29 -08:00
Skye Wanderman-Milne	b41ff0c8cd	Modify test-udfs.cc so there are no undefined symbols in shared library. AnalyzeDDLTest was failing because the fesupport binary couldn't resolve a function used in libTestUdfs.so (the function was defined in udf.cc, rather than udf.h). I couldn't figure out how to cleanly build udf.cc into the libTestUdfs.so, so instead I removed the use of the function in test-udfs.cc. Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110 Reviewed-on: http://gerrit.ent.cloudera.com:8080/694 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:27 -08:00
Nong Li	911cfc1bb9	Fix vararg UDFs. Change-Id: I0e202b984ece7de3d220b6ce89b0c0a4c9edcb45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/688 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:26 -08:00
Nong Li	4800995d44	Add execution for Hive UDFs. Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500 Reviewed-on: http://gerrit.ent.cloudera.com:8080/458 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:25 -08:00
Nong Li	904289d168	Add UDA execution. Change-Id: Ie5aab79742675fc62ed731c13abe83304df80991 Reviewed-on: http://gerrit.ent.cloudera.com:8080/642 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:24 -08:00
Alex Behm	3f54240fed	PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode. Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c Reviewed-on: http://gerrit.ent.cloudera.com:8080/678 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:23 -08:00
Alex Behm	c5c2ccb56c	Fix build break due to machine-dependent explain output. Change-Id: I6b72e4e6cf2a7b38d4687c6f0f860e9744c9cedb Reviewed-on: http://gerrit.ent.cloudera.com:8080/675 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:53:22 -08:00
Alex Behm	4bb8b38cde	Added stats and cost estimates to explain output. Change-Id: I1273745a439fd25cefa4e08ecc075c98cc8bfc45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/602 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:22 -08:00
Skye Wanderman-Milne	8692e7df8d	Add timestamp support to CodegenAnyVal Change-Id: I2bbeae16660709c2c15d545e6d1c791912e880db Reviewed-on: http://gerrit.ent.cloudera.com:8080/655 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:21 -08:00
Nong Li	6b9a7de02e	Add symbol resolution during analysis for create function stmts. Before this, we had to specify the entire mangled symbol. This can be quite long and quite tedious (take a look at some of the create UDA test cases that specify all the symbols). This patch adds some code to convert from the user function signature to the mangled name. This means the user can specify the unmangled name and we can do the symbol lookup. The mangling rules are pretty convoluted but if it is messed up, the user can always specify the full symbol. Some other minor cleanup in: - JNI from FE to BE - UDFs/UDAs that are loaded as test data Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/624 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:20 -08:00
Nong Li	c031cd4e96	Update RLE encoding to pad literal groups to 8. Change-Id: I77cb2b80b888b569ff715c583f16aea4e39fe680 Reviewed-on: http://gerrit.ent.cloudera.com:8080/644 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:17 -08:00
Nong Li	15db34e356	AggregationNode refactoring This patch redoes how the aggregation node is implemented. The functionality is now split between aggregation-node, agg-expr and aggregate-functions. This is a working progress (there's still a lot of debug stuff I added that needs to be cleaned up) but it does pass the tests. Aggregation-node is now very simple and now only deals with the grouping part. Aggregate-expr serves as the glue between the agg node and the aggregate functions. The aggregation functions are implemented with the UDA interface. I've reimplemented our existing aggregate functions with this setup. For true UDAs, the binaries would be loaded in aggregate-expr. This also includes some preliminary changes in the FE. We now need to annotate each AggNode as executing the update vs. merge phase (root aggs execute update, others execute merge) and if it needs a finalize step (only the root does). This is more general than our builtins which are too simple to need this structure. There is a big TODO here to allow the intermediate types between agg nodes to change. For example, in distinct estimate, the input type is the column type and the output type is a bigint. We'd like the intermediate type to be CHAR(256). This is different since currently, the intermediate type and output type have always been the same. We've hacked around this by having both the intermediate and output type be TYPE_STRING. I've left this for another patch (changing the BE to support this is trivial). For aggregates that result in strings, we used to store some additional stuff past the end of the tuple. The layout was: <tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc The rationale for this is that we want to reuse the buffer for min/max and grow the buffer more quickly for group_concat. This breaks down the abstraction between agg-expr and agg-node and is not something UDAs can use in general. Rather than try to hack around this, I think the proper solution is to the intermediate type not be StringValue and to contain the buffer length itself. This patch also resurrects the distinct estimate code. The distinct estimate functions exercise all of the code paths. Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346 Reviewed-on: http://gerrit.ent.cloudera.com:8080/564 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:13 -08:00
Lenni Kuff	a2cbd2820e	Add Catalog Service and support for automatic metadata refresh The Impala CatalogService manages the caching and dissemination of cluster-wide metadata. The CatalogService combines the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogService uses the StateStore to broadcast metadata updates across the cluster. The CatalogService also directly handles executing metadata updates request from impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to directly connect execute their DDL operations. The CatalogService has two main components - a C++ server that implements StateStore integration, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is the Java Catalog that manages caching and updating of of all the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast to the rest of the cluster. Some Notes On the Changes --- * The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views, Databases, UDFs) have thrift struct to represent them. These are sent with each statestore delta update. * The existing Catalog class has been seperated into two seperate sub-classes. An ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more details. What is working: * New CatalogService created * Working with statestore delta updates and latest UDF changes * DDL performed on Node 1 is now visible on all other nodes without a "refresh". * Each DDL operation against the Catalog Service will return the catalog version that contains the change. An impalad will wait for the statestore heartbeat that contains this version before returning from the DDL comment. * All table types (Hbase, Hdfs, Views) getting their metadata propagated properly * Block location information included in CS updates and used by Impalads * Column and table stats included in CS updates and used by Impalads * Query tests are all passing Still TODO: * Directly return catalog object metadata from DDL requests * Poll the Hive Metastore to detect new/dropped/modified tables * Reorganize the FE code for the Catalog Service. I don't think we want everything in the same JAR. Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda Reviewed-on: http://gerrit.ent.cloudera.com:8080/601 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
Nong Li	1eb2b7a964	Add execution for vararg UDFs. Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995 Reviewed-on: http://gerrit.ent.cloudera.com:8080/572 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:09 -08:00
Nong Li	4bb1e8c854	Add varargs to UDF/UDA parser/analyzer. Change-Id: I4c3f2e74f6c29cee4b0b787c058b0455b16a11fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/548 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:05 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	8963d79f51	Fix build break from UdfContext rename. Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861 Reviewed-on: http://gerrit.ent.cloudera.com:8080/549 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:01 -08:00
Nong Li	e39de94316	Add parser/analysis to support UDAs. I looked around some and I think having create/drop/show [aggregate] function seems reasonable and extends nicely for UDTs. The create aggregate function can accept a lot of arguments. The non-essential one, I went with resolving them by name rather than position (i.e. argName="value"). I think this is better for the user than specifying it by position. The grammar is: CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>] LOCATION '/path' UpdateFn='Fn' [comment='comment'] [SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol'] The optional args at the end can be in any order. If the other symbols are not specified, we derive them from the UpdateFn symbol that's required. The analyzer would try to figure it out and fail if we can't find the derived symbol in the binary. The simplest example would be: CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path' UpdateFn='CountUpdateFn'; In which case we assume the intermediate type is the return type and the other functions are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'. Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/513 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:59 -08:00
Alex Behm	39f9a067fa	IMPALA-444: Fixed accuracy of string to double conversion. Falling back to strod for scientific notation. Change-Id: I9a5d948620907d34601ef041e58b1c9bb2172f71 Reviewed-on: http://gerrit.ent.cloudera.com:8080/507 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:56 -08:00
Alex Behm	6253b21834	IMPALA-505: Fixed conjunct evaluation against partition columns in hdfs scan node when there are no matarialized slots. Change-Id: Ia003347bd7ee4986f5411c7175057192635a4c6c Reviewed-on: http://gerrit.ent.cloudera.com:8080/509 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:54 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00
Nong Li	a0bf45a0b4	Add udf type. Change-Id: Ic5f52c127750cc9c847a3e34d3fdcfc78bee5a8a Reviewed-on: http://gerrit.ent.cloudera.com:8080/454 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:48 -08:00
Alex Behm	33000b8c15	Fixed codegen of floating-point modulo. Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf Reviewed-on: http://gerrit.ent.cloudera.com:8080/385 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:46 -08:00
Nong Li	308650f208	Fix create function ddl test setup issue. Change-Id: I30c9a4342efbdb17bd53fb14bdcee172506cdadb Reviewed-on: http://gerrit.ent.cloudera.com:8080/447 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:44 -08:00
Nong Li	8eb727b585	UDF ddl cleanup Change-Id: I381fed277b5809727d2d8bf430258c01d2d0ae1f Reviewed-on: http://gerrit.ent.cloudera.com:8080/436 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:43 -08:00
Nong Li	b22d1f41a7	Change all "Status Close()" to "void Close()" Doing it this way makes sure we don't bail early on the Close path which is rarely the right thing to do. This found a few places where we were not doing proper cleanup because of this. Change-Id: Ie663c68398c14589b5cbc1bd980644b0b10fd865 Reviewed-on: http://gerrit.ent.cloudera.com:8080/373 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:38 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	af90c8a133	Fix memory usage tracking. Changes MemLimit to MemTracker: - the limit is optional - it also records a label and an optional parent - Consume() and Release() also update the ancestors and there's also a new AnyLimitExceeded(), which also checks the ancestors - the consumption counter is a HighwaterMarkCounter and can optionally be created as part of a profile Each fragment instance now has a MemTracker that is part of a 3-level hierarchy: process, query, fragment instance. Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07 Reviewed-on: http://gerrit.ent.cloudera.com:8080/339 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	2394ae2e66	UDF parsing and analysis. Change-Id: If8058c1cb66bf5e9c7049d4b78f5882b46c03fc1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/318 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:32 -08:00

1 2 3 4

174 Commits