impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 12:01:11 -05:00

Author	SHA1	Message	Date
Nong Li	a0bf45a0b4	Add udf type. Change-Id: Ic5f52c127750cc9c847a3e34d3fdcfc78bee5a8a Reviewed-on: http://gerrit.ent.cloudera.com:8080/454 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:48 -08:00
Alex Behm	33000b8c15	Fixed codegen of floating-point modulo. Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf Reviewed-on: http://gerrit.ent.cloudera.com:8080/385 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:46 -08:00
Nong Li	308650f208	Fix create function ddl test setup issue. Change-Id: I30c9a4342efbdb17bd53fb14bdcee172506cdadb Reviewed-on: http://gerrit.ent.cloudera.com:8080/447 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:44 -08:00
Nong Li	8eb727b585	UDF ddl cleanup Change-Id: I381fed277b5809727d2d8bf430258c01d2d0ae1f Reviewed-on: http://gerrit.ent.cloudera.com:8080/436 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:43 -08:00
Nong Li	b22d1f41a7	Change all "Status Close()" to "void Close()" Doing it this way makes sure we don't bail early on the Close path which is rarely the right thing to do. This found a few places where we were not doing proper cleanup because of this. Change-Id: Ie663c68398c14589b5cbc1bd980644b0b10fd865 Reviewed-on: http://gerrit.ent.cloudera.com:8080/373 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:38 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	af90c8a133	Fix memory usage tracking. Changes MemLimit to MemTracker: - the limit is optional - it also records a label and an optional parent - Consume() and Release() also update the ancestors and there's also a new AnyLimitExceeded(), which also checks the ancestors - the consumption counter is a HighwaterMarkCounter and can optionally be created as part of a profile Each fragment instance now has a MemTracker that is part of a 3-level hierarchy: process, query, fragment instance. Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07 Reviewed-on: http://gerrit.ent.cloudera.com:8080/339 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	2394ae2e66	UDF parsing and analysis. Change-Id: If8058c1cb66bf5e9c7049d4b78f5882b46c03fc1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/318 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:32 -08:00
Aaron Davidson	cafb7b72f8	External sorting This is an experimental implementation of external sorting. This patch includes the following additions: (1) creation and implementation of the Sorter interface, which can sort Impala Tuples. (2) normalization of Tuples to allow memcmp-able sorting. (3) a testing framework for the Sorter, (4) a benchmark to compare the current state of the Sorter with other sorts, (5) an implementation of a Vector which can store data whose size is only known at runtime, (6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector, (7) implementation of a simple in-memory Merger, and (8) logic to stream blocks of memory in and out of memory for the actual external merging. I have a local branch for experimental optimizations and benchmarking -- this should be considered a "basic", working sort. The following optimizations have been implemented: (i) Optionally extracting keys instead of writing them in place. (ii) Optionally opportunistically parallelize run building (sorting & prepare for output). (iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping them in memory until right when they're needed. (iv) Prepare auxililary data backwards so the buffers can be released as we go, and still go out in an order which preserves the first buffers of the run. (v) Always merge maximum number of runs at a time, taking from the next merge level if available. Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/126 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>	2014-01-08 10:52:27 -08:00
Aaron Davidson	00275ce3a9	(IMPALA-422) Add string concatenation function Implements a group_concat() function which concatenates all the values in a group together. The format is group_concat(str_col, [separator]). The default separator is ', '. NULLs are ignored. Change-Id: If152df6f528401117dba81d66ef691bfb548cc7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/117 Reviewed-by: Aaron Davidson <aaron.davidson@cloudera.com> Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>	2014-01-08 10:52:21 -08:00
Lenni Kuff	d66d3bfce3	IMPALA-161: Add Impala support for CREATE TABLE AS SELECT This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a regular CREATE TABLE statement includes, except it does not allow for for specifying partition columns. Hive also has this limitation and it wouldn't be too hard to support in the future. Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74 Reviewed-on: http://gerrit.ent.cloudera.com:8080/157 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:17 -08:00
ishaan	e9e23bff5d	Fix build because of a change in parquetfile. This changes QueryTest/create.test to unblock the builds. Change-Id: If91ac43e349c2f81034ba7504c27890781f33260 Reviewed-on: http://gerrit.ent.cloudera.com:8080/255 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:16 -08:00
Nong Li	a3bc1ce133	Some parquet encoder/decoder refactoring. Added dictionary to other types. Split out the encoder/type for parquet reader/writer. I think this puts us in a better place to support future encodings. On the tpch lineitem table, the results are: Before: BytesWritten: 236.45 MB Per Column Sizes: l_comment: 75.71 MB l_commitdate: 8.64 MB l_discount: 11.19 MB l_extendedprice: 33.02 MB l_linenumber: 4.56 MB l_linestatus: 869.98 KB l_orderkey: 8.99 MB l_partkey: 27.02 MB l_quantity: 11.58 MB l_receiptdate: 8.65 MB l_returnflag: 1.40 MB l_shipdate: 8.65 MB l_shipinstruct: 1.45 MB l_shipmode: 2.17 MB l_suppkey: 21.91 MB l_tax: 10.68 MB After: BytesWritten: 198.63 MB (84%) Per Column Sizes: l_comment: 75.71 MB (100%) l_commitdate: 8.64 MB (100%) l_discount: 2.89 MB (25.8%) l_extendedprice: 33.13 MB (100.33%) l_linenumber: 1.50 MB (32.89%) l_linestatus: 870.26 KB (100.032%) l_orderkey: 9.18 MB (102.11%) l_partkey: 27.10 MB (100.29%) l_quantity: 4.32 MB (37.31%) l_receiptdate: 8.65 MB (100%) l_returnflag: 1.40 MB (100%) l_shipdate: 8.65 MB (100%) l_shipinstruct: 1.45 MB (100%) l_shipmode: 2.17 MB (100%) l_suppkey: 10.11 MB (46.14%) l_tax: 2.89 MB (27.06%) The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even more. The restructuring to use a virtual call doesn't seem to change things much and will go away when we codegen the scanner. Here's what they look like with this patch (note this is on the before data files, so only string cols are dictionary encoded). Before query times: Insert Time: 8.5 sec select : 2.3 sec select avg(l_orderkey): .33 sec After query times: Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding select : 2.4 sec <-- kind of noisy, possibly a slight slow down select avg(l_orderkey): .33 sec Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/238 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:16 -08:00
Skye Wanderman-Milne	b9ea32e9b7	Fix IMPALA-129, IMPALA-534, and other scanner bugs. Change-Id: Idbd29af3fcc35b9e1173d08ac55b5780751c5938 Reviewed-on: http://gerrit.ent.cloudera.com:8080/196 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:14 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Alex Behm	f0e2d539fc	IMPALA-495: Views Sometimes Not Utilizing Partition Pruning. Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	c9965e5a5c	Fix build break due to views defined by a constant select. Change-Id: I5deeeb03469494f5ba6ed7a911354bbdd6c98195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/149 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	2b427208e5	IMPALA-507: Creating a VIEW that does not reference a table fails with IllegalStateException. Change-Id: I11470ba919bbfced76730adae2a46647c4ef110b Reviewed-on: http://gerrit.ent.cloudera.com:8080/146 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	52c9d26d16	IMPALA-475: Impala should avoid the use of c_# style autogenerated column aliases unless necessary. Change-Id: I959e35bcee1698ebc35534dc4f390c5c2c7dc919 Reviewed-on: http://gerrit.ent.cloudera.com:8080/141 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:03 -08:00
Alex Behm	9754f5bf52	IMPALA-504: Right and full outer joins do not return row with NULL value for rhs table. Change-Id: Ia3f8d474fb30189b36fb587b2920d7b9b224ea71 Reviewed-on: http://gerrit.ent.cloudera.com:8080/129 Tested-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:03 -08:00
Skye Wanderman-Milne	6e7406df8b	IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string) Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296 Reviewed-on: http://gerrit.ent.cloudera.com:8080/127 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:02 -08:00
Nong Li	fd53edbbe4	Fix parquet writer bug with not setting dictionary metadata. Change-Id: Ia5c0886497678d31b82cb5052e06df437bb201be Reviewed-on: http://gerrit.ent.cloudera.com:8080/114 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:02 -08:00
Lenni Kuff	faeb7f5fa3	Add scanner test case for scenario where data and table schema do not match Change-Id: I16f007ad1cb2caac47506914512c5665fc3d5f56 Reviewed-on: http://gerrit.ent.cloudera.com:8080/98 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:01 -08:00
Skye Wanderman-Milne	3fecdeb793	IMPALA-441: support default values for Avro tables	2014-01-08 10:51:39 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Alex Behm	3bba336bbf	IMPALA-359: Return proper tuple id of inline view with distinct aggregation.	2014-01-08 10:51:26 -08:00
Alan Choi	254ee6ef89	IMPALA-434 Support binary hbase encoding	2014-01-08 10:51:18 -08:00
Skye Wanderman-Milne	e8344bb0d0	Dictionary encoding/decoding	2014-01-08 10:51:15 -08:00
Lenni Kuff	c2cfc7e2a3	IMPALA-373: Add support for 'LOAD DATA' statements This change adds Impala support for LOAD DATA statements. This allows the user to load one or more files into a table or partition from a given HDFS location. The load operation only moves files, it does not convert data to match the target table/partition's file format.	2014-01-08 10:51:02 -08:00
Alex Behm	045038e479	IMPALA-374: Added WITH clause without recursion.	2014-01-08 10:51:00 -08:00
Henry Robinson	79b36a5eb3	IMPALA-375: Add column permutation clause to INSERT statement	2014-01-08 10:50:59 -08:00
Nong Li	ce092065be	Fix bug with how exec sets if the conjuncts are thread safe.	2014-01-08 10:50:53 -08:00
Alan Choi	b1de018298	IMPALA-31 Support EXPLAIN <query> Hue is moving to HiveServer2 but HiveServer2 does not have an "explain" RPC call. To support "explain", I added it to the language. An "explain" statement will return a result set: one row per explain line.	2014-01-08 10:50:32 -08:00
Alex Behm	937a44f9f8	IMPALA-68: Support Values() statement.	2014-01-08 10:50:31 -08:00
Alex Behm	c7819f4db7	IMPALA-87: Support INSERT from SELECT without FROM.	2014-01-08 10:50:30 -08:00
Alex Behm	9ff09cd3f4	IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL	2014-01-08 10:50:28 -08:00
Lenni Kuff	627e74a068	Fix insert test failure by cleaning up table before executing query	2014-01-08 10:50:27 -08:00
Lenni Kuff	e0507e192b	Fix unstable alter table test	2014-01-08 10:50:26 -08:00
Nong Li	261119b91f	Forgot to update the test in previous commit.	2014-01-08 10:50:23 -08:00
Nong Li	8af35425e6	Fix unstable ordering with nans.	2014-01-08 10:50:22 -08:00
Nong Li	68e4c14527	Fix parquet incompatibilities.	2014-01-08 10:50:22 -08:00
Henry Robinson	ead69d377f	IMPALA-249, IMPALA-252: Fixes for static partition keys.	2014-01-08 10:50:14 -08:00
Alex Behm	861ba05989	IMPALA-197: Outer join on constant expressions returns incorrect results.	2014-01-08 10:50:09 -08:00
Alex Behm	c9040aee22	IMPALA-111: COUNT(DISTINCT col) returns wrong results -- does not ignore NULLs.	2014-01-08 10:50:09 -08:00
Alex Behm	14557c7bab	IMPALA-297: Remove distinction between value_expr and expr in parser.	2014-01-08 10:50:08 -08:00
Skye Wanderman-Milne	0c343913fa	IMPALA-266: Round() does not output the right precision	2014-01-08 10:50:02 -08:00
Henry Robinson	7d2c47ad72	IMPALA-258: Make partition key string encoding Hive-compatible	2014-01-08 10:49:54 -08:00
Alex Behm	abafcf81ff	IMPALA-287: Full outer join is missing results.	2014-01-08 10:49:54 -08:00
Alex Behm	4c45bc06c4	IMPALA-84: Predicates not evaluated if select exprs are constant.	2014-01-08 10:49:53 -08:00
Alex Behm	dbe3127383	IMPALA-285: Multiple outer joins with nesting crash impalad	2014-01-08 10:49:53 -08:00

1 2 3

132 Commits