impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 12:01:11 -05:00

Author	SHA1	Message	Date
Lenni Kuff	a2cbd2820e	Add Catalog Service and support for automatic metadata refresh The Impala CatalogService manages the caching and dissemination of cluster-wide metadata. The CatalogService combines the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogService uses the StateStore to broadcast metadata updates across the cluster. The CatalogService also directly handles executing metadata updates request from impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to directly connect execute their DDL operations. The CatalogService has two main components - a C++ server that implements StateStore integration, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is the Java Catalog that manages caching and updating of of all the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast to the rest of the cluster. Some Notes On the Changes --- * The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views, Databases, UDFs) have thrift struct to represent them. These are sent with each statestore delta update. * The existing Catalog class has been seperated into two seperate sub-classes. An ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more details. What is working: * New CatalogService created * Working with statestore delta updates and latest UDF changes * DDL performed on Node 1 is now visible on all other nodes without a "refresh". * Each DDL operation against the Catalog Service will return the catalog version that contains the change. An impalad will wait for the statestore heartbeat that contains this version before returning from the DDL comment. * All table types (Hbase, Hdfs, Views) getting their metadata propagated properly * Block location information included in CS updates and used by Impalads * Column and table stats included in CS updates and used by Impalads * Query tests are all passing Still TODO: * Directly return catalog object metadata from DDL requests * Poll the Hive Metastore to detect new/dropped/modified tables * Reorganize the FE code for the Catalog Service. I don't think we want everything in the same JAR. Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda Reviewed-on: http://gerrit.ent.cloudera.com:8080/601 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
Nong Li	1eb2b7a964	Add execution for vararg UDFs. Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995 Reviewed-on: http://gerrit.ent.cloudera.com:8080/572 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:09 -08:00
Alex Behm	9065648d77	Improvements to cost estimation and explain output. Fixed cost estimation of union queries and exchange nodes. Fixed propagation of stats through cloning of exprs and plan nodes. Fixed propagation of expr stats to slots they are materialized into (e.g., grouping columns in multi-level aggs). Improved explain output for constant selects. Change-Id: I96d1652c00d48e4093b85ae7fc8bad28d74b8b81 Reviewed-on: http://gerrit.ent.cloudera.com:8080/547 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:08 -08:00
Nong Li	4bb1e8c854	Add varargs to UDF/UDA parser/analyzer. Change-Id: I4c3f2e74f6c29cee4b0b787c058b0455b16a11fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/548 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:05 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	e5ed8e4105	Move minicluster_xml_conf to HADOOP_CONF_DIR. The current location gets deleted if you rebuild, making you have to restart mini dfs. Change-Id: If71b144534255fa8df2bfa187c0814ffdf28463e Reviewed-on: http://gerrit.ent.cloudera.com:8080/550 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	8963d79f51	Fix build break from UdfContext rename. Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861 Reviewed-on: http://gerrit.ent.cloudera.com:8080/549 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:01 -08:00
Nong Li	e39de94316	Add parser/analysis to support UDAs. I looked around some and I think having create/drop/show [aggregate] function seems reasonable and extends nicely for UDTs. The create aggregate function can accept a lot of arguments. The non-essential one, I went with resolving them by name rather than position (i.e. argName="value"). I think this is better for the user than specifying it by position. The grammar is: CREATE AGGREGATE <name>(<arg_types>) RETURNS <type> [INTERMEDIATE <type>] LOCATION '/path' UpdateFn='Fn' [comment='comment'] [SerializeFn='symbol'] [MergeFn='symbol'] [InitFn='symbol'] [FinalizeFn='symbol'] The optional args at the end can be in any order. If the other symbols are not specified, we derive them from the UpdateFn symbol that's required. The analyzer would try to figure it out and fail if we can't find the derived symbol in the binary. The simplest example would be: CREATE AGGREGATE FUNCTION count(float) RETURNS BIGINT LOCATION '/path' UpdateFn='CountUpdateFn'; In which case we assume the intermediate type is the return type and the other functions are called 'CountInitFn', 'CountSerializeFn', 'CountMergeFn' 'CountFinalizeFn'. Change-Id: Iefc5741293050f5b295df28e9d1a7d039ead8675 Reviewed-on: http://gerrit.ent.cloudera.com:8080/513 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:59 -08:00
Alex Behm	39f9a067fa	IMPALA-444: Fixed accuracy of string to double conversion. Falling back to strod for scientific notation. Change-Id: I9a5d948620907d34601ef041e58b1c9bb2172f71 Reviewed-on: http://gerrit.ent.cloudera.com:8080/507 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:56 -08:00
Lenni Kuff	79cdeac3d6	Consolidate test cluster under IMPALA_HOME/cluster_logs + store logs during data loading Change-Id: I8f6239e4ccb0515c85bf80193a475788fb18dedb Reviewed-on: http://gerrit.ent.cloudera.com:8080/518 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:56 -08:00
Alex Behm	6253b21834	IMPALA-505: Fixed conjunct evaluation against partition columns in hdfs scan node when there are no matarialized slots. Change-Id: Ia003347bd7ee4986f5411c7175057192635a4c6c Reviewed-on: http://gerrit.ent.cloudera.com:8080/509 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:54 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00
Henry Robinson	a46276325c	IMPALA-415: Don't delete hidden files in the root directory for INSERT OVERWRITE INSERT OVERWRITE into an unpartitioned table is supposed to remove all data files from the root. This should not include hidden files or directories. This patch excludes hidden files from deletion, and adds a test case. Partition directories are still removed in their entirety: the cost of statting a large number of files and directories rather than issuing a single "rm -rf" outweighs the benefits of preserving hidden files for now. Hive does not preserve hidden files in either configuration. Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/418 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:50 -08:00
Nong Li	2b9105cd11	IMPALA-487: don't compact data from rhs of join if it is going through an exchange node. Change-Id: I442445e7370218352cd6d3137f2a454c9afb73ba Reviewed-on: http://gerrit.ent.cloudera.com:8080/476 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:50 -08:00
Lenni Kuff	a1f2f72f49	Add Impala DDL support for creation of AVRO tables + support for CREATE/ALTER SERDEPROPERTIES This change adds Impala DDL support for creation of AVRO tables. Additionally, it add Impala support for CREATE and ALTER SERDEPROPERTIES which are used when creating Avro backed tables. This syntax is not exactly the same as the Hive support since it introduces a new fileformat (AVROFILE) that implies the needed Serialization library, input format, and output format. Change-Id: I5047e419198a89599e9d014fdedfee1a20437a7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/464 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:48 -08:00
Alex Behm	77c0e54bb9	Set HDFS block size to 128MB because HDFS versions since 2.1.0-beta use 128MB as a default (HDFS-4053). Change-Id: If112d2eab242b44f05f64ee071ebea5b253c7927 Reviewed-on: http://gerrit.ent.cloudera.com:8080/470 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:48 -08:00
Nong Li	a0bf45a0b4	Add udf type. Change-Id: Ic5f52c127750cc9c847a3e34d3fdcfc78bee5a8a Reviewed-on: http://gerrit.ent.cloudera.com:8080/454 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:48 -08:00
Lenni Kuff	f5c9e4d075	Fix comment location in schema generation script Our data generation doesn't appear to fully support comments in the base table name section. This fixes the data generation, but we should follow on by improving our comment support in the framework. Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229 Reviewed-on: http://gerrit.ent.cloudera.com:8080/465 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:47 -08:00
Lenni Kuff	8e2a313673	IMPALA-590: Impala should more gracefully fail when loading HBase tables with complex types Change-Id: Ifc3338ee1339ff0544ed14066824f1aa2d9d7c25 Reviewed-on: http://gerrit.ent.cloudera.com:8080/457 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:47 -08:00
Alex Behm	33000b8c15	Fixed codegen of floating-point modulo. Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf Reviewed-on: http://gerrit.ent.cloudera.com:8080/385 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:46 -08:00
Greg Rahn	8492db3b7d	fix typo for tpcds-q3 Change-Id: Ia678957dcda6ddf261422b6c43a718f5779d3553 Reviewed-on: http://gerrit.ent.cloudera.com:8080/453 Reviewed-by: Greg Rahn <grahn@cloudera.com> Tested-by: Greg Rahn <grahn@cloudera.com>	2014-01-08 10:52:44 -08:00
Nong Li	308650f208	Fix create function ddl test setup issue. Change-Id: I30c9a4342efbdb17bd53fb14bdcee172506cdadb Reviewed-on: http://gerrit.ent.cloudera.com:8080/447 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:44 -08:00
Nong Li	8eb727b585	UDF ddl cleanup Change-Id: I381fed277b5809727d2d8bf430258c01d2d0ae1f Reviewed-on: http://gerrit.ent.cloudera.com:8080/436 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:43 -08:00
ishaan	d3a94aa4fe	Don't load tpcds.store_sales_unpartitioned into any file format except text. Change-Id: I398d26ca8e36a45cb3d0a076cdd604ff6eba793d Reviewed-on: http://gerrit.ent.cloudera.com:8080/444 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:43 -08:00
Lenni Kuff	d6d1557fe7	Capture cluster logs with each test run / don't use mvn for starting cluster services Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643 Reviewed-on: http://gerrit.ent.cloudera.com:8080/404 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:40 -08:00
Lenni Kuff	9f54242941	Add retry loop around split-hbase to fix build breaks Change-Id: I539407ce05d705b6b4e88d0791fc4ec236c79c80 Reviewed-on: http://gerrit.ent.cloudera.com:8080/399 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:39 -08:00
Nong Li	b22d1f41a7	Change all "Status Close()" to "void Close()" Doing it this way makes sure we don't bail early on the Close path which is rarely the right thing to do. This found a few places where we were not doing proper cleanup because of this. Change-Id: Ie663c68398c14589b5cbc1bd980644b0b10fd865 Reviewed-on: http://gerrit.ent.cloudera.com:8080/373 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:38 -08:00
ishaan	6735e3983f	Fix build failure because of hbase data loading. Change-Id: I796656332c3733a1ffdc338d206009efa6c451ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/360 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:37 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	af90c8a133	Fix memory usage tracking. Changes MemLimit to MemTracker: - the limit is optional - it also records a label and an optional parent - Consume() and Release() also update the ancestors and there's also a new AnyLimitExceeded(), which also checks the ancestors - the consumption counter is a HighwaterMarkCounter and can optionally be created as part of a profile Each fragment instance now has a MemTracker that is part of a 3-level hierarchy: process, query, fragment instance. Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07 Reviewed-on: http://gerrit.ent.cloudera.com:8080/339 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:36 -08:00
Nong Li	2394ae2e66	UDF parsing and analysis. Change-Id: If8058c1cb66bf5e9c7049d4b78f5882b46c03fc1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/318 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:32 -08:00
Aaron Davidson	cafb7b72f8	External sorting This is an experimental implementation of external sorting. This patch includes the following additions: (1) creation and implementation of the Sorter interface, which can sort Impala Tuples. (2) normalization of Tuples to allow memcmp-able sorting. (3) a testing framework for the Sorter, (4) a benchmark to compare the current state of the Sorter with other sorts, (5) an implementation of a Vector which can store data whose size is only known at runtime, (6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector, (7) implementation of a simple in-memory Merger, and (8) logic to stream blocks of memory in and out of memory for the actual external merging. I have a local branch for experimental optimizations and benchmarking -- this should be considered a "basic", working sort. The following optimizations have been implemented: (i) Optionally extracting keys instead of writing them in place. (ii) Optionally opportunistically parallelize run building (sorting & prepare for output). (iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping them in memory until right when they're needed. (iv) Prepare auxililary data backwards so the buffers can be released as we go, and still go out in an order which preserves the first buffers of the run. (v) Always merge maximum number of runs at a time, taking from the next merge level if available. Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/126 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>	2014-01-08 10:52:27 -08:00
ishaan	13343fb5ec	Annotate tpcds count queries. Annotation helps in easily identifying queries and searching for them in the performance database. Change-Id: I89dcfe4c2885f1d5b3d5158c026aac922ff6559d Reviewed-on: http://gerrit.ent.cloudera.com:8080/299 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:26 -08:00
Aaron Davidson	00275ce3a9	(IMPALA-422) Add string concatenation function Implements a group_concat() function which concatenates all the values in a group together. The format is group_concat(str_col, [separator]). The default separator is ', '. NULLs are ignored. Change-Id: If152df6f528401117dba81d66ef691bfb548cc7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/117 Reviewed-by: Aaron Davidson <aaron.davidson@cloudera.com> Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>	2014-01-08 10:52:21 -08:00
Skye Wanderman-Milne	efac6f82fd	Print errors to shell in BaseSequenceScanner. Change-Id: I0d1b041695c0d61b8c4994833f0a703e3bfa9c6a Reviewed-on: http://gerrit.ent.cloudera.com:8080/278 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:20 -08:00
Lenni Kuff	d66d3bfce3	IMPALA-161: Add Impala support for CREATE TABLE AS SELECT This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a regular CREATE TABLE statement includes, except it does not allow for for specifying partition columns. Hive also has this limitation and it wouldn't be too hard to support in the future. Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74 Reviewed-on: http://gerrit.ent.cloudera.com:8080/157 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:17 -08:00
Lenni Kuff	f264db1647	Automatically force load partitioned tables to ensure valid partition metadata Change-Id: Ief91102f30d4669503d473299256a74a50d8fe3c Reviewed-on: http://gerrit.ent.cloudera.com:8080/261 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:17 -08:00
Nong Li	707a566b5d	Add test to tpcds queries to validate table row counts. I tried to investigate the jenkins issue where we weren't returning any rows. I setup the cluster on that box manually and noticed there weren't any results because the store_sales table was empty. Refresh did not fix. This looks like a data loading issue. Adding this test would make discovering this like this much easier. Change-Id: I8ccddd43892b279d506371b9de717629815c6a08 Reviewed-on: http://gerrit.ent.cloudera.com:8080/260 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:17 -08:00
Lenni Kuff	a3016cc4d4	Add partitioned tpcds insert workload and tests Change-Id: Iff45853153bf0830be3e423c994392998385a64f Reviewed-on: http://gerrit.ent.cloudera.com:8080/256 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:16 -08:00
ishaan	e9e23bff5d	Fix build because of a change in parquetfile. This changes QueryTest/create.test to unblock the builds. Change-Id: If91ac43e349c2f81034ba7504c27890781f33260 Reviewed-on: http://gerrit.ent.cloudera.com:8080/255 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:16 -08:00
Nong Li	a3bc1ce133	Some parquet encoder/decoder refactoring. Added dictionary to other types. Split out the encoder/type for parquet reader/writer. I think this puts us in a better place to support future encodings. On the tpch lineitem table, the results are: Before: BytesWritten: 236.45 MB Per Column Sizes: l_comment: 75.71 MB l_commitdate: 8.64 MB l_discount: 11.19 MB l_extendedprice: 33.02 MB l_linenumber: 4.56 MB l_linestatus: 869.98 KB l_orderkey: 8.99 MB l_partkey: 27.02 MB l_quantity: 11.58 MB l_receiptdate: 8.65 MB l_returnflag: 1.40 MB l_shipdate: 8.65 MB l_shipinstruct: 1.45 MB l_shipmode: 2.17 MB l_suppkey: 21.91 MB l_tax: 10.68 MB After: BytesWritten: 198.63 MB (84%) Per Column Sizes: l_comment: 75.71 MB (100%) l_commitdate: 8.64 MB (100%) l_discount: 2.89 MB (25.8%) l_extendedprice: 33.13 MB (100.33%) l_linenumber: 1.50 MB (32.89%) l_linestatus: 870.26 KB (100.032%) l_orderkey: 9.18 MB (102.11%) l_partkey: 27.10 MB (100.29%) l_quantity: 4.32 MB (37.31%) l_receiptdate: 8.65 MB (100%) l_returnflag: 1.40 MB (100%) l_shipdate: 8.65 MB (100%) l_shipinstruct: 1.45 MB (100%) l_shipmode: 2.17 MB (100%) l_suppkey: 10.11 MB (46.14%) l_tax: 2.89 MB (27.06%) The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even more. The restructuring to use a virtual call doesn't seem to change things much and will go away when we codegen the scanner. Here's what they look like with this patch (note this is on the before data files, so only string cols are dictionary encoded). Before query times: Insert Time: 8.5 sec select : 2.3 sec select avg(l_orderkey): .33 sec After query times: Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding select : 2.4 sec <-- kind of noisy, possibly a slight slow down select avg(l_orderkey): .33 sec Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/238 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:16 -08:00
Lenni Kuff	be1d42c05a	IMPALA-538: Look for Avro schema in SERDEPROPERTIES as well as TBLPROPERTIES Change-Id: If5c0b36d5a3963176b07a0cb1ea680e3e36b2f96 Reviewed-on: http://gerrit.ent.cloudera.com:8080/248 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:15 -08:00
Skye Wanderman-Milne	b9ea32e9b7	Fix IMPALA-129, IMPALA-534, and other scanner bugs. Change-Id: Idbd29af3fcc35b9e1173d08ac55b5780751c5938 Reviewed-on: http://gerrit.ent.cloudera.com:8080/196 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:14 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Alex Behm	e52ed0800b	IMPALA-524: Fix computation of stats for ExchangeNode and merge AggregationNodes. The issue caused unnecessary repartitioning for static partition insert queries having grouped aggregation in the feeding query stmt. Change-Id: I5f4017e2c4d5a1bf88f51c4e0ff7ab28911e14f1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/202 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:11 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Alex Behm	48ee7ce891	IMPALA-508: Fix join-cardinality estimation and choice of join strategy when a join involves a table lacking table stats. Change-Id: I871273e1d9f048377ce638c201118fc21086db9a Reviewed-on: http://gerrit.ent.cloudera.com:8080/152 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:05 -08:00
Alex Behm	f0e2d539fc	IMPALA-495: Views Sometimes Not Utilizing Partition Pruning. Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	c9965e5a5c	Fix build break due to views defined by a constant select. Change-Id: I5deeeb03469494f5ba6ed7a911354bbdd6c98195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/149 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	2b427208e5	IMPALA-507: Creating a VIEW that does not reference a table fails with IllegalStateException. Change-Id: I11470ba919bbfced76730adae2a46647c4ef110b Reviewed-on: http://gerrit.ent.cloudera.com:8080/146 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00

... 4 5 6 7 8 ...

606 Commits