impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 12:01:11 -05:00

Author	SHA1	Message	Date
Henry Robinson	8e5848eaf8	RM fixes to get tests passing * One last NotifyThreadUsageChange() mismatched pair * Don't set resource in plan fragment params if there isn't a resource available. This fixes the problem where if no fragment with resources was assigned to the same node as the coordinator, the coordinator would have a dummy resource allocation which didn't work with expansion. * Substitute #ID in all impalad arguments to start-impala-cluster.py with the 0-indexed ID of the impalad being started. This is required to have different Impala processes use different cgroups. Change-Id: If8c8fd8bef0809bdaf16115a45a9695fc2bf3e1b (cherry picked from commit c71ce45e97570b8c09900eb5ae2e26984d3306a4) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2060 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-03-24 15:07:45 -07:00
Alex Behm	dc7b398bd3	Impala reserves resources from YARN via LLama. Impala reserves resources from YARN via Llama and handles resources preemptions by cancelling affected queries. Adds the Impala Resource Broker for interacting with Llama. Refactors scheduler and coordinator to move fragment-to-host assignment logic into scheduler. Local test setup uses MiniLLama. Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1	2014-01-15 15:12:04 -08:00
Alex Behm	6483f53581	Additional options for JVM debugging in impala startup scripts. Enables JVM debugging by default for the catalogd and impalads created via bin/start-impala-cluster.py. Adds a -jvm_args command line option for passing additional JVM args to the catalogd and impalads. Change-Id: I68e901661bd1fd7eefa05ba84dbacf29dd124685 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1213 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:54:40 -08:00
Lenni Kuff	6e09b90ea3	Properly set timeout in start-impala-cluster Change-Id: I8cedf484d0ce9d2752e3970883f419ab51a82c3b Reviewed-on: http://gerrit.ent.cloudera.com:8080/980 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:13 -08:00
Lenni Kuff	6c25e78715	Add option to start-impala-cluster to only restart impalad This helps speed up the restart time becuase we don't need to restart the catalog server and reload the table metadata. This is useful if you want to restart the impalad with a different command line parameter or if you are making changes to only the impalad binary. Change-Id: I0b714afaf7e508c450a353a53d67d95165de3486 Reviewed-on: http://gerrit.ent.cloudera.com:8080/897 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:59 -08:00
Henry Robinson	9bc840dc85	Support for custom cluster configurations in some tests Test suites that derive from common.CustomClusterTestSuite have a brand new cluster for every tests case, which they can configure as they wish with custom arguments using the @with_args() decorator. A future improvement is to optionally only have one cluster per test suite, to allow multiple tests to run more quickly if they share configuration options. Change-Id: I6abd5740e644996d7ca2800edf4ff11b839d1bc4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/882 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:57 -08:00
Lenni Kuff	2336ed99a4	Re-enable process failure tests + add simple failure tests for catalogd This brings back online the process failure tests and adds a basic failure test for the catalog service. The timeouts had to be adjusted to account for the extra time it takes to load the the catalog and also there is an additional state store subscriber. Note: the statestore 'live.backends' metric which is used in these tests needs to be renamed, it really means 'live.subscribers'. However, it requires some coordination with other teams to make the change. Also updated start-impala-cluster to check the catalog.ready flag to ensure the impalad catalog is ready to accept queries. Change-Id: If22e25dba7dc83aa40bec937b5f82b815bed4645 Reviewed-on: http://gerrit.ent.cloudera.com:8080/730 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:52 -08:00
Lenni Kuff	13605ad834	Support catalogd in ImpalaCluster test library Adds basic support for catalogd to our ImpalaCluster test library/object model. This will allow us to write more programatic tests targeting the catalogd process including process failure tests and metric check validators. Change-Id: I8e5f7bc73f999f105437c6d3d52c6d436a354d2d Reviewed-on: http://gerrit.ent.cloudera.com:8080/617 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:16 -08:00
Lenni Kuff	5a97258c1a	Update table metadata loading to workaround Hive MetaStore bug HIVE-5457 There is a Hive Metastore concurrency bug (HIVE-5457) which causes concurrent calls to getTable() to sometimes fail due with data nucleus exceptions. This causes catalogd to fail to load ALL metadata for all tables. This fix is to serialize our calls to getTable(). Additionally, tweaked the logging a bit and improved start-impala-cluster to do a better job of reporting the status of catalog initialization. It's too bad we have to serialize these calls, but we seem to be able to run everything else in parallel with no problems (get col stats, block md, etc). Also added a couple of changes in our hive-site to match the defaults for our cluster metastore deployments. Change-Id: Ic70e2a9b8190a56510e430d8da3942dca252eb4c Reviewed-on: http://gerrit.ent.cloudera.com:8080/609 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
Lenni Kuff	a2cbd2820e	Add Catalog Service and support for automatic metadata refresh The Impala CatalogService manages the caching and dissemination of cluster-wide metadata. The CatalogService combines the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogService uses the StateStore to broadcast metadata updates across the cluster. The CatalogService also directly handles executing metadata updates request from impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to directly connect execute their DDL operations. The CatalogService has two main components - a C++ server that implements StateStore integration, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is the Java Catalog that manages caching and updating of of all the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast to the rest of the cluster. Some Notes On the Changes --- * The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views, Databases, UDFs) have thrift struct to represent them. These are sent with each statestore delta update. * The existing Catalog class has been seperated into two seperate sub-classes. An ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more details. What is working: * New CatalogService created * Working with statestore delta updates and latest UDF changes * DDL performed on Node 1 is now visible on all other nodes without a "refresh". * Each DDL operation against the Catalog Service will return the catalog version that contains the change. An impalad will wait for the statestore heartbeat that contains this version before returning from the DDL comment. * All table types (Hbase, Hdfs, Views) getting their metadata propagated properly * Block location information included in CS updates and used by Impalads * Column and table stats included in CS updates and used by Impalads * Query tests are all passing Still TODO: * Directly return catalog object metadata from DDL requests * Poll the Hive Metastore to detect new/dropped/modified tables * Reorganize the FE code for the Catalog Service. I don't think we want everything in the same JAR. Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda Reviewed-on: http://gerrit.ent.cloudera.com:8080/601 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
ishaan	533ca4d3e6	Fix start-impala-cluster.py to take the user specified log level into account. Previously, the user specified command line paramter --log_level was not being taken into account while starting the mini impala cluster. Change-Id: I433412b6a7057585136d2ad887010881217d9676 Reviewed-on: http://gerrit.ent.cloudera.com:8080/520 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:56 -08:00
Lenni Kuff	533f1751a4	Buffer logs in start impala cluster Change-Id: I84a79219c20bf2aeed2b90f6895577112b150663 Reviewed-on: http://gerrit.ent.cloudera.com:8080/481 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:49 -08:00
Lenni Kuff	1b49174a0a	Cleanup start-impala-cluster and use exec to start processes This change cleans up start-impala-cluster to remove all the uneeded log4j setup code. As part of this change, updated the start-impalad script to "exec" the impala binaries, which removes the .sh wrapper script from the list of running processes. Change-Id: I5dee49b72ff51012bf43ab9d2a3a21fd2b841ff5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/270 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:25 -08:00
ishaan	43deb6692f	Make start-impala-cluster tolerate the failure of importing ImpalaCluster() Change-Id: I795ed5eed9fb33ab62f2229c8745800393ddb712 Reviewed-on: http://gerrit.ent.cloudera.com:8080/164 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:05 -08:00
ishaan	caeadcc1f3	Throw an error when impalads don't start in start-impala-cluster.	2014-01-08 10:51:42 -08:00
ishaan	2819f4c56e	Fix cluster logging and redirect errors to logfile	2014-01-08 10:51:40 -08:00
Lenni Kuff	31cf1cc560	Fix passing of impalad args in start-impala-cluster	2014-01-08 10:51:29 -08:00
ishaan	e7c6d57f9c	IMP-773: Add better logging/error detection to start-impala-cluster.py	2014-01-08 10:51:25 -08:00
ishaan	f026354721	IMP-912 Make force killing an option. Update run-all-tests to pre-emptively force kill.	2014-01-08 10:50:29 -08:00
Alan Choi	b282175461	IMPA-213 Disable DN server check; disable all checks if impala cannot detech cdh version	2014-01-08 10:49:44 -08:00
Lenni Kuff	1693ad4eb3	Force kill existing Impala processes when running start-impala-cluster	2014-01-08 10:49:43 -08:00
Alan Choi	c419ae1891	Add 4.1 direct read configuration check Impala detects the HDFS version by reading the Namenode web UI and run the corresponding check. On 4.1, Impala tries to check the datanode (server side) config by reading the datanode web UI.	2014-01-08 10:49:31 -08:00
ishaan	610da19f15	Revert "IMP-773: Add better logging/error detection to start-impala-cluster.py" This reverts commit 4375e98e90f7ad9528d8495fb7c5d23551d51123.	2014-01-08 10:48:40 -08:00
ishaan	41904e5109	IMP-773: Add better logging/error detection to start-impala-cluster.py	2014-01-08 10:48:40 -08:00
Marcel Kornacker	c02d25baa8	IMPALA-20: Limit clause in inline view not handled correctly by planner - this adds a SelectNode that evaluates conjuncts and enforces the limit - all limits are now distributed: enforced both by the child plan fragment and by the merging ExchangeNode - all limits w/ Order By are now distributed: enforced both by the child plan fragment and by the merging TopN node	2014-01-08 10:48:29 -08:00
Sean Mackrory	d6286e4116	Replacing ternary conditional operator with Python 2.4-compatible syntax	2014-01-08 10:48:19 -08:00
Lenni Kuff	4bd6279646	Add targeted failure injection test suite using debug failpoints Change-Id: I013e913ad3c89f44524bf19638a1da2b83df7463	2014-01-08 10:47:54 -08:00
Alan Choi	251a8a2bf1	IMP-57: rename fe_port to beeswax_port	2014-01-08 10:47:53 -08:00
Alan Choi	be98df19c8	HiveServer2 This patch implements the HiveServer2 API. We have tested it with Lenni's patch against the tpch workload. It has also been tested manually against Hive's beeline with queries and metadata operations. All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax code is refactored to impala-beeswax-server.cc. HiveServer2 has a few more metadata operations. These operations go through impala-hs2-server to ddl-executor and then to FE. The logics are implemented in fe/src/main/java/com/cloudera/impala/service/MetadataOp.java. Because of the Thrift union issue, I have to modify the generated c++ file. Therefore, all the HiveServer2 thrift generated c++ code are checked into be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove these files. Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881	2014-01-08 10:47:24 -08:00
Nong Li	664928060c	Added start/end query time to debug webpage.	2014-01-08 10:47:12 -08:00
Henry Robinson	986f3cddf6	Move sparrow/ to statestore/ and remove sparrow namespace	2014-01-08 10:47:12 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Henry Robinson	03b9b8acb6	IMP-532: Rename state-store-service to statestored	2014-01-08 10:46:46 -08:00
Lenni Kuff	8e63db4faa	Add mini-impala-cluster utility for starting multiple impala backends in-process	2014-01-08 10:46:35 -08:00
Alan Choi	ed0a98952a	Remove -default_num_node option from start-impala-cluster.py	2014-01-08 10:46:32 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
Henry Robinson	825c63ad3b	IMP-298: Make standalone impalad start a scheduler for a single node	2014-01-08 10:44:49 -08:00
Lenni Kuff	b96b9640ef	Add script to start multiple Impalad instances locally and update query test to support an external ImpalaD Added a script that starts an impalad "cluster" (impalad + state store) with each impalad running on a different port. Also updated QueryTest to enable running against an external impalad. This enables running all the tests against a remote cluster or a local cluster setup with the script I added. By default we run with the in-process impalad - to enable running against a remove impalad use the flag: mvn test -Duse_external_impalad=true The same host/port flags work with this, for example: mvn test -Duse_external_impalad=true -Dimpalad=hostName -Dfe_port=21000	2014-01-08 10:44:34 -08:00

39 Commits