impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Tim Armstrong	e151ebaa71	IMPALA-1001: Bit and byte manipulation functions Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot, countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright. Interfaces and behavior follow Teradata documentation. All bit* functions are compatible with DB2. bitand only is compatible with Oracle. Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3	2015-07-28 08:11:01 -07:00
Sailesh Mukil	8a01527bad	IMPALA-2141: UnionNode::GetNext() doesn't check for query errors When a UDF with constant parameters in the select list calls SetError(), it does not fail the query. This is because UnionNode::GetNext() does not check for errors after UnionNode::EvalAndMaterializeExprs() evaluates the expression, which itself does not report the error. Change-Id: I8850cf1a603e320bb23f4a9a4d47600d14590f3a	2015-07-27 22:09:19 -07:00
Alex Behm	3ac341287c	IMPALA-2088: Fix planning of empty union operands with analytics. The check for ignoring empty union operands was simply misplaced. This misplacement resulted in empty union operands not being dropped if the containing UnionStmt had analytic functions. Change-Id: I3dad546c0c31a495e5f30d97c3e49465fcc2ebb3 Reviewed-on: http://gerrit.cloudera.org:8080/554 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 15:46:41 -07:00
Tim Armstrong	822cb8f5e2	IMPALA-1660: Netezza compatibility - factorial Implements suffix n! operator for factorial and factorial function. Slightly refactor operators in fe to share code between unary operators. Based partially on work by Arthur Peng <arthur.peng@intel.com>. Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b Reviewed-on: http://gerrit.cloudera.org:8080/531 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2015-07-27 19:03:48 +00:00
Sailesh Mukil	c21c080a46	IMPALA-1756: Constant expressions not checked for errors, no state cleanup on exception. Changed the way the function context error message is returned. Also, changed the exception thrown in SingleNodePlanner from IllegalStateException to AnalysisException in case of an exception in registerConjuncts(). This commit follows from: `d497ba6cef` This is a new commit since the previous one was closed before making these changes. Change-Id: Ifa9b7c0884d76b6d7911d8cd80355a8ba13c4c18 Reviewed-on: http://gerrit.cloudera.org:8080/560 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 19:04:38 +00:00
Ippokratis Pandis	00db434cd4	Mistake in schema_constraints by the IMPALA-2130 patch (7c7e69b) The patch that addressed IMPALA-2130 (7c7e69b) creates a new table intended only to be used in a test that uses the functional_parquet database. This patch has a mistake though in schema_constraints which essentially allows the creation of this table for all types and not only for parquet/none/none. Change-Id: I1d72b30557cb9d8f47fe27170808fec75af3bb1d Reviewed-on: http://gerrit.cloudera.org:8080/524 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-23 20:39:17 +00:00
Tim Armstrong	5990b43fe2	IMPALA-1898: Explicit aliases + ordinals analysis bug Analysis errors occurred with select queries that combined ordinals in the group by/order by clauses with select list aliases that had the same name as a column in one of the underlying tables. The root cause was a double substitution: e.g. the ordinal 1 in a GROUP BY clause was replaced with the corresponding select list expression, then a reference to column 'x' in an underlying table was replaced erroneously with the select list expression with alias 'x' Change-Id: I0f298290c58f18239e1ff83f0388d037c311f5fb Reviewed-on: http://gerrit.cloudera.org:8080/542 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2015-07-22 21:23:36 +00:00
Alex Behm	1b6f14ab16	Nested Types: Compute stats for the nested TPCH database. Change-Id: I7b2b77de1a9c25c2a5d9849b62437a58a18bdaae Reviewed-on: http://gerrit.cloudera.org:8080/506 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 18:48:17 +00:00
Sailesh Mukil	6d7bb76e87	IMPALA-1756: Constant filter expressions are not checked for errors and state cleanup is not done before throwing exception. When a builtin has an error (in the constant case), it is checked for but the state cleanup isn't taken care of which results in a DCHECK. When a UDF has an error (in the constant case), the error does not propagate back up the stack due to a lack of error checking in ScalarFnCall::Open() after it calls GetConstVal(). Change-Id: Ib500c84a41df574690369f124044991ed8c82cc1 Reviewed-on: http://gerrit.cloudera.org:8080/537 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 04:01:39 +00:00
Casey Ching	a6d534682b	IMPALA-2086, IMPALA-2090: Avoid boost year/month interval logic Boost handles a couple of edge cases differently than other databases such as Postgres and MySQL when adding year/month intervals to timestamps. This change makes Impala consistent for the other databases. The performance difference was not noticeable (<5% if any). Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06 Reviewed-on: http://gerrit.cloudera.org:8080/489 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-20 10:16:54 +00:00
Shant Hovsepian	6d87fe090c	Improve Hll estimate for small cardinalities. Based on Google's HyperLogLog++ paper. Uses a bias correcting interpolation as a sub algorithm for Hll estimates within a specific range. Change-Id: If4fe692b4308f6a57aea6167e9bc00db11eaaab9 Reviewed-on: http://gerrit.cloudera.org:8080/415 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2015-07-16 19:38:17 +00:00
Ippokratis Pandis	7e9f8478e1	Removing duplicate query test Change-Id: Ia8b33ca2a2eadae288acea4bd2111a1a974bc484 Reviewed-on: http://gerrit.cloudera.org:8080/526 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-15 03:28:36 +00:00
Ippokratis Pandis	e99c68fe52	IMPALA-2130: Wrong verification of Parquet file version This patch corrects a mistake in the Parquet magic file number verification and adds a test about it. Note that with this patch Impala may fail to read Parquet files with wrong magic number that it used to read before. Change-Id: Iff31accda1e1d541946ef1f750e38886ce4cb8d5 Reviewed-on: http://gerrit.cloudera.org:8080/515 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-14 02:52:02 +00:00
Martin Grund	51aa077448	IMPALA-2133: Properly unescape string value for HBase filters This patch fixes the problem, that the Frontend would simply pass the escaped value to the backend as an HBase filter and not the unescaped one. Now queries including an escaped character will work as well. Change-Id: I96e544973b523f3ef1abdec86ea1ec5596d9bee9 Reviewed-on: http://gerrit.cloudera.org:8080/520 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-07-13 18:38:39 +00:00
Taras Bobrovytsky	279806b708	Fixes to Nested TPCH workload - Changed the nested TPCH queries to use unqualified table names - Replaced dash with an underscore in the workload name Change-Id: Id1cbe5318fc9940ca7dc9dd4ff09d61593600a24 Reviewed-on: http://gerrit.cloudera.org:8080/502 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-08 02:54:20 +00:00
Ippokratis Pandis	4951f895e7	Nested Types: Reset() for partitioned hash join node TODO: Need to modify Reset()'s functionality in case of NAAJs. Change-Id: I7d0ea0dabd0b3404957e228bbaa51781c5fc34c0 Reviewed-on: http://gerrit.cloudera.org:8080/490 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-08 01:51:09 +00:00
Taras Bobrovytsky	704e3fa6bf	Add loading by partitions option to the loaded_nested script When loading a large nested table using the GROUP_CONCAT function, Impala runs out of memory. We prevent this from happening by adding an option to partition the table and load one partition at a time. Change-Id: I8d517f94ef97e98d36eb8ebc8180865023655114 Reviewed-on: http://gerrit.cloudera.org:8080/448 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-07-02 03:34:53 +00:00
Alex Behm	a274cfd787	Nested Types: Fix self-joining of collection table refs. When referencing the same path in multiple CollectionTableRefs (e.g., self-join on a nested collection), we used to register only a single SlotDescriptor in the root tuple descriptor and share it among those multiple CollectionTableRefs. A collection-typed SlotDescriptor has a single item tuple descriptor, set to the tuple descriptor of the corresponding CollectionTableRef. Therefore, sharing a single collection-typed SlotDescriptor among multiple CollectionTableRefs with the same path does not work (the item tuple desc was arbitrarily set to the last CollectionTableRef's tuple desc). In order to maintain our assumed 1:1 relationship between a table ref and a tuple descriptor, the siple fix for now is to give each CollectionTableRef a new slot in the root tuple descriptor, regardless of its path. We could conceivably allow more intelligent sharing of tuple descriptors for nested collections, but that change is too invasive for now. Change-Id: I2135d026191f51d1daa741455a7e1b0f6905af1e Reviewed-on: http://gerrit.cloudera.org:8080/495 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 06:56:28 +00:00
Ippokratis Pandis	f2c483802f	Nested Types: Reset() for partitioned aggregation node Change-Id: Ia5b4b9b3a7b8e9acb1b614c979cccca615fe2fbe Reviewed-on: http://gerrit.cloudera.org:8080/480 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-07-01 00:43:55 +00:00
Alex Behm	7e11e91356	Enable nested TPCH Q20. I found the bug that was preventing us from running Q20, and have integrated the fix into the subplan planning patch, since the bug was specific to subplans: http://gerrit.cloudera.org:8080/401 Change-Id: I2acb1f1212b43ddb0c705cfb07653f872ee3cbc2 Reviewed-on: http://gerrit.cloudera.org:8080/491 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-27 04:11:22 +00:00
Ippokratis Pandis	c3a7916812	IMPALA-2065: Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory() If the build side of any partition of PHJ was very large we could end up trying to Init() hash tables that are larger than 1GB. The result was overflows (see IMPALA-1619) and eventually DCHECKS. This patch returns false whenever we try to allocate memory in the BufferedBlockMgr that it is larger than 1GB. Change-Id: Id4590ea434bef4dca7dc3f137cfe7b638ae3d916 Reviewed-on: http://gerrit.cloudera.org:8080/465 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-06-27 01:17:50 +00:00
Dimitris Tsirogiannis	fcba301b18	IMPALA-2018: Where clause does not propagate to joins inside nested views This commit fixes an issue where during predicate propagation a predicate from the where clause is not properly assigned at the join node that outer joins the generated predicate. Change-Id: Ifccc1b0e0a0579c3baa48f0fb3dedcbd44941b53 Reviewed-on: http://gerrit.cloudera.org:8080/476 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-26 23:35:25 +00:00
Alex Behm	406c2386e3	Add nested TPCH workload. The nesting collapses the 1:N relationships as follows: customer -> orders -> lineitems region -> nation supplier -> partsupp part Change-Id: I7459ffb4edb45a818f4a48717f22c7449732d5ae Reviewed-on: http://gerrit.cloudera.org:8080/320 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-25 21:47:15 +00:00
Alex Behm	569e86a60b	Nested Types: Change ExecNode::Reset() to only clear state and not tuple data. This patch changes the ExecNode::Reset() to: Status ExecNode::Reset(RuntimeState* state); The new Reset() should only clear the internal state of an exec node in preparation for another Open()/GetNext(). Reset() should not clear memory backing rows returned by a node in GetNext() because those rows could still be in flight. Subplan Memory Management: To ensure that the memory backing rows produced by the subplan tree of a SubplanExecNode remains valid for the lifetime a row batch, we intend to use our conventional transfer mechanism. That is, the ownership of memory that is no longer used by an exec node is transferred to an output row batch in GetNext() at a "convenient" point, typically at eos or when the memory usage exceeds some threshold. Note that exec nodes may choose not to transfer memory at eos to amortize the cost of memory allocation over multiple Reset()/Open()/GetNext() cycles. To show the main ideas, this patch fixes transferring of tuple data ownership in several places and implements Reset() for the following nodes: - AnalyticEvalNode - BlockingJoinNode - CrossJoinNode - SelectNode - SortNode - TopNNode - UnionNode To make the transfer of ownership work for SortNode a row batch can now also own a list of BufferdBlockMgr::Block*. Also included are basic query tests that are not meant to be exhaustive. The tests are disabled for now because we cannot run them without several other code changes. I have manually run the test queries on a branch that has all necessary changes. Change-Id: I3ac94b8dd7c7eb48f2e639ea297b447fbf443185 Reviewed-on: http://gerrit.cloudera.org:8080/454 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-23 07:43:22 +00:00
Dimitris Tsirogiannis	2c1f0a4942	IMPALA-1987: Fix TupleIsNullPredicate to return false if no tuples are nullable. This commit fixes the issue where an outer join returns wrong results if the equi-join predicate contains a TupleIssNullPredicate expr. Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4 Reviewed-on: http://gerrit.cloudera.org:8080/445 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 03:37:18 +00:00
ishaan	377214c469	Use Isilon as the default file system when running Isilon tests. This patch enables running Impala tests against Isilon as the default file system. The intention is to run tests against a realistic deployment, i.e, Isilon replacing HDFS as the underlying filesystem. Specifically, it does the following: - Adds a new environment variable DEFAULT_FS, which points to HDFS by default. - Makes the fs.defaultFs property in core-site.xml use the DEFAULT_FS environment variable, such that all clients talk to Isilon implicitly. - Unset FILESYSTEM_PREFIX when the TARGET_FILESYSTEM is Isilon, since path prefixes are no longer needed. - Only starts the Hive Metastore and the Impala service stack when running tests against Isilon. We don't start KMS/HBase because they're not relevant to Isilon. We also don't start YARN, Hive and LLama because hive queries are disabled with Isilon. The scripts that start/stop Hive, YARN and Llama should be modified to point to a filesystem other than HDFS in the future. Change-Id: Id66bfb160fe57f66a64a089b465b536c6c514b63 Reviewed-on: http://gerrit.cloudera.org:8080/449 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 01:23:11 +00:00
Dan Hecht	4823889e14	IMPALA-1968: Part 2: Improve planner numNodes estimate See the previous commit for IMPALA-1968 for details. This commit addresses cases 2 & 3 by enabling the new estimate logic even when there are no remote scan ranges. Change-Id: I54bb26ee7d89ae9d74dcfcc3753ea73dae8315bc Reviewed-on: http://gerrit.cloudera.org:8080/426 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-06-10 21:10:31 +00:00
ishaan	f327a53c70	Fix metadata/test_load.py to work with Isilon. test_load was using /tmp as the staging directory, which did not cleaned up in Isilon, leading to a build failure. This patch does the following: - use /test-warehouse as the staging directory. - replace calls to the hdfs commandline with calls to the in-house hdfs client. - cleanup the test file and remove duplicates. Additionally, a new method is introduced in the hdfs client to simulate hdfs dfs -cp, i.e, it does a get and a put to mimic the hdfs command line's semantics. Change-Id: I0cc27ab00df5f5ec3138b995144ab45ad622605d Reviewed-on: http://gerrit.cloudera.org:8080/431 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-06-05 00:52:14 +00:00
Casey Ching	060f08ef69	Add tpch_nested_parquet database The database will be used for testing in the future. Change-Id: I60b54b36db9493a5bea308151b4027cd47d73047 Reviewed-on: http://gerrit.cloudera.org:8080/400 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-06-04 21:18:36 +00:00
Dan Hecht	d46de9bba1	IMPALA-1968: Part 1: Improve planner numNodes estimate for remote scans This commit will be backported to 5.4.x to improve plans when using Isilon and S3. The planner currently estimates the number of backends that an hdfs scan node will execute on as the number of datanodes holding block replica for the corresponding table. This can be a bad estimate for various reasons: 1) It's completely wrong when the scan is remote (e.g. S3 or Isilon). 2) It doesn't account for partition pruning. 3) The size of the set of hosts holding block replica may larger than the number of scan ranges. Improve the estimate by examing the scan ranges and taking locality into account. While this new estimate will eventually be used in all cases, this change uses the new estimate only when there is a remote scan range as to not change plans produced for local ranges (since this commit will be backported to 5.4.x). So, this commit purposely addresses only case 1. A follow on commit will enable the new logic for all cases. Also set up the S3PlannerTest so that we can enable it in the nightly jenkins S3 run. It was inadvertantly never enabled there. Change-Id: I3fd3f7c5431a535fb044c98c326338c21b8a1898 Reviewed-on: http://gerrit.cloudera.org:8080/425 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-03 20:04:03 +00:00
ishaan	dbc78aaa2c	Enable isilon end to end tests for Impala. This patch introduces changes to run tests against Isilon, combined with minor cleanup of the test and client code. For Isilon, it: - Populates the SkipIfIsilon class with appropriate pytest markers. - Introduces a new default for the hdfs client in order to connect to Isilon. - Cleans up a few test files take the underlying filesystem into account. - Cleans up the interface for metadata/test_insert_behaviour, query_test/test_ddl On the client side, we introduce a wrapper around a few pywebhdfs's methods, specifically: - delete_file_dir does not throw an error if the file does not exist. - get_file_dir_status automatically strips the leading '/' Change-Id: Ic630886e253e43b2daaf5adc8dedc0a271b0391f Reviewed-on: http://gerrit.cloudera.org:8080/370 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-27 22:25:12 +00:00
Shant Hovsepian	69079411bf	Improve distinctpc/sa for small cardinalities. Improving the cardinality estimate for Flajolet and Martin's algorithm used in distinctpc and distinctpcsa. The estimate for small cardinalities is improved by providing a correction hinted to in the original paper. We use the correction constant 1.75 proposed by Scheuermann et al DialM-POMC '07 [Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications] Change-Id: I90410328a1a01a72601e7e95ae719fb8caf1587f Reviewed-on: http://gerrit.cloudera.org:8080/395 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-05-24 06:26:47 +00:00
Juan Yu	934b28fe5e	IMPALA-1381: Expand set of supported timezones. The hardcoded timezone information is from Java version 1.7.0_76. Change-Id: I32c40d0036473079e5bfd4d0252a648cbb0e7c23 Reviewed-on: http://gerrit.cloudera.org:8080/393 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-05-22 01:32:54 +00:00
Matthew Jacobs	bc3a46daab	Change minicluster llama log level to INFO Change-Id: Ifa83cb437f807c5cbd9f2259a570c1af39340811 Reviewed-on: http://gerrit.cloudera.org:8080/402 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2015-05-20 21:11:49 +00:00
Alex Behm	26467d1f98	Upgrade a few important mvn plugins. Change-Id: I84cb4834744e3a8a3dfde82d20c9205a155b7a31 Reviewed-on: http://gerrit.cloudera.org:8080/399 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-20 03:12:57 +00:00
Matthew Jacobs	456e99b21b	Mini cluster configuration change for Yarn and log4j Update the yarn-site.xml to reduce the latency of resource acquisition. Also changes the log4j properties to reduce the very verbose logging for the hadoop daemons which was consuming huge amounts of space very quickly. Change-Id: I8532fb5125b604974e26ddad76aee93b9c4e64fb Reviewed-on: http://gerrit.cloudera.org:8080/381 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 23:05:44 +00:00
Matthew Jacobs	f37682a16f	Fix packaging build for Python 2.4 cgroups.py was using unsupported "except <Exception> as <var>" syntax. generate_metrics.py was using the json module which is not available in Python 2.4, but contains simplejson which provides the same functionality. Change-Id: If2c176c15a9573dd2a2acf5ee459ff24ce891ce3 Reviewed-on: http://gerrit.cloudera.org:8080/396 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2015-05-19 17:13:33 +00:00
Alex Behm	1bd3eca22f	Quietly resolve dependencies in Jenkins runs to avoid log spew. Change-Id: If38a683785f3c6c9d92f762a2dfd86f009ce9d84 Reviewed-on: http://gerrit.cloudera.org:8080/392 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 09:12:43 +00:00
Alex Behm	b558ea3f92	Nested Types: Refactoring of join nodes in preparation for more 'cross join' modes. This patch introduces a new superclass, JoinNode, as the parent of HashJoinNode and CrossJoinNode. It is a first step in supporting the semi/outer modes for non-equi joins via a nested-loops implementation (like our existing cross join). I have a left a few TODOs that should be addressed when adding such suppoort. This patch also includes a cosmetic improvement to explain plans: The distribution mode of CROSS JOINs is now only displayed for distributed plans, and not for single-node plans (which is important for Subplans). Change-Id: I93546871c459f4bc564f6dcb6bf4c35addbad4ec Reviewed-on: http://gerrit.cloudera.org:8080/388 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 05:31:40 +00:00
Matthew Jacobs	38f0c3d046	Envvar for Impala test cluster base cgroup hierarchy Allows the base cgroup hierarchy path used by the impala test cluster to be specified with the environment variable IMPALA_CGROUP_BASE_PATH. This is needed to support older kernels that do not use the proper default cgroup path and do not even support finding the hierarchy via mount. This will be used in jenkins test runs with RM enabled which run on Centos6 images. Change-Id: I30984a58fbcf990410f75f7feb5c1d549afa6ddd Reviewed-on: http://gerrit.cloudera.org:8080/397 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 02:12:03 +00:00
Alex Behm	013d6f968f	Clean up FE pom.xml to eliminate console spew. This patch makes the following changes in our pom to reduce the build time and signficantly reduce console spew. 1. Remove jar-with-dependencies from package goal. We have no need for creating an uber jar that contains the FE as well as all its dependencies. Locally, we carefully construct our class path manually (relying on copy-dependencies), and in Impala deployments the FE jar is put together with the other dependencies, so the FE jar does not need to be self-contained. 2. Silence copy-dependencies. Changes the configuration of the maven-dependency-plugin to not log every copied file to the console. Change-Id: If351e4e800fd1ca1108f9a0f4d88f52a53fc211c Reviewed-on: http://gerrit.cloudera.org:8080/378 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-18 07:20:07 +00:00
Alex Behm	b3bb0ea525	Fix S3 build v2: Adjust expected SHOW TABLE STATS output. Change-Id: Idc1f255a7d170e6083439220140c5eb895133b22 Reviewed-on: http://gerrit.cloudera.org:8080/382 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-16 02:47:01 +00:00
Matthew Jacobs	76529cdd8c	Remove usage of context manager in cgroups.py Context managers are not supported before Python 2.7. Removes the use of the 'with' clause in cgroups.py because this code is executed on Centos 6 packaging boxes with an older version of Python. Change-Id: Ic6bcf161086f671ec2010df16f9bb23534c57697 Reviewed-on: http://gerrit.cloudera.org:8080/385 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2015-05-15 16:49:35 +00:00
Casey Ching	ac0c075997	Parquet: Fix value def level when max def level is 0 When running with a release build, NULL would be returned when reading values from required fields in parquet files (with a debug build a DCHECK would be hit). Previously when the max definition level for a field was 0 (which happens if a field is required), the definition level for value was incorrectly set to 1. The max definition level is related to nested data and is defined to be the number of nullable fields that will be encountered when traversing a path to reach the desired end field. For example, if a nested schema has a path a.b.c.d where b and d are nullable then the max def level is 2. A def level is attached to each value to indicate the number of optional values that are present (in the previous example an def level of 2 means both b and d are not null). So having a def level for a value that is greater than the max def level for a field should never happen. Change-Id: Ia91a97cf79e672c420d10416c6817f0930dcc920 (cherry picked from commit cdd67e4c7fd62d5b08adfaa303d7bb2382e6932c) Reviewed-on: http://gerrit.cloudera.org:8080/386 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 06:41:02 +00:00
Skye Wanderman-Milne	7801aa499f	Use codegen to inject runtime constants in exprs This patch introduces the function GetConstant(), which is used by expr compute function and UDFs to access query constants. There is a corresponding GetIrConstant() function that returns the IR versions of the same constants. Currently the only implemented constants are the expr's return type and argument types, but other constants can be easily be added to these functions. Interpreted expr functions run normally, but cross-compiled functions can be passed to InlineConstants(), which looks for calls to GetConstant() and replaces them with the result of calling GetIrConstant(). I used this technique in the decimal functions that previously were not switching on the type at all. The performance of LeastGreatest() after this patch is the same as it was before it switched on the type. Change-Id: I8b55744551830d894318a7bab6b6f045fb8bed41 Reviewed-on: http://gerrit.cloudera.org:8080/352 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Internal Jenkins	2015-05-15 02:24:04 +00:00
Matthew Jacobs	cf0b6bc595	Add flag to easily enable Yarn and Llama in mini cluster Adds a flag to start-impala-cluster.py (--enable_rm) to set up the mini Impala cluster using Yarn and Llama. This hides a number of flags that must be set on the impalads: -enable_rm -llama_addressess: set to the local llama service -fair_scheduler_allocation_path: set to the path of the fair-scheduler.xml in each node's hadoop conf directory -cgroup_hierarchy_path: set to a path in the CPU cgroup hierarchy which has the correct permissions for Impala to manage a child cgroup. The path comes from cgroups.py. The new module cgroups.py was added to contain cgroups-related utilities. Right now it provides paths to the CPU controller hierarchy root and a path within the hierarchy that can be used for impalads (i.e. have the proper permissions, one for each cluster node). Change-Id: Ic2181ec5613c180f240958c84f885c6b136a64d4 Reviewed-on: http://gerrit.cloudera.org:8080/369 Tested-by: Internal Jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com>	2015-05-14 21:15:24 +00:00
Juan Yu	78446e5f34	Fix FE test failure in PlannerTest#testAnalyticFns Change-Id: Ica3aa33686c3be6372b8c36ec66b367ce1d21a3b Reviewed-on: http://gerrit.cloudera.org:8080/379 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 19:18:37 +00:00
Alex Behm	5f54b2c4d3	Fix S3 build: Adjust expected SHOW TABLE STATS output. Change-Id: I3fb0c551dfbe53aecd9c0bced3bc29d5a5fa41e5 Reviewed-on: http://gerrit.cloudera.org:8080/375 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 07:55:36 +00:00
zuowang	304d985523	IMPALA-1139: Implement TRUNCATE TABLE statement Synopsis: TRUNCATE [TABLE] [database.]table TRUNCATE quickly removes all rows from a set of tables. TRUNCATE also drops all table and column stats, but preserves HMS partitions and HDFS directories. You must have the INSERT privilege on a table to truncate it. It requires taking the metastoreDdlLock before truncate tables. Examples: TRUNCATE TABLE t1; TRUNCATE t1; Change-Id: I546e4ee0279083f437cdf0e7487faad47957dbf6 Reviewed-on: http://gerrit.cloudera.org:8080/241 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 07:50:34 +00:00
ishaan	058978dccb	Enable using isilon as the underlying filesystem. This patch enables the Impala test suite to run the end to end tests against an isilon namenode. There are a few caveats: - The fe test will currently not work. - Only loading data from both the test-warehouse snapshot and the metadata snapshot is supported. - The test suite cannot be run by multiple people (unless we have access to multiple isilon namenodes) Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237 Reviewed-on: http://gerrit.cloudera.org:8080/356 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-12 01:28:19 +00:00

1 2 3 4 5 ...

952 Commits