impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Henry Robinson	d264ab90fe	Add support for client SSL to Python Beeswax client Change-Id: I0d9352471067bfe19e25221e0ecbbb08f945b962 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2810 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 545bd30d5cf3cae9a3581d7bc942a909a1a98806) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2850 Tested-by: Henry Robinson <henry@cloudera.com>	2014-06-05 10:48:23 -07:00
Henry Robinson	93a3d65492	Support for LDAP tests * Allow Beeswax connections to optionally use LDAP * Run custom cluster tests from the aux repo, if it exists Change-Id: I054af64e030ad0cd722ae8dd75afda9c58ea2913 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2547 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2640	2014-05-21 05:52:55 -07:00
casey	2351266d0e	Replace single process mini-dfs with multiple processes This should allow individual service components, such as a single nodemanager, to be shutdown for failure testing. The mini-cluster bundled with hadoop is a single process that does not expose the ability to control individual roles. Now each role can be controlled and configured independently of the others. Change-Id: Ic1d42e024226c6867e79916464d184fce886d783 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432 Tested-by: Casey Ching <casey@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-04-23 18:24:05 -07:00
Henry Robinson	99c37aac37	IMPALA-827: Add an option for directories created by INSERT to inherit their parent's permissions This patch adds --insert_inherit_permissions. If true, all new partition directories created by INSERT will inherit their permissions from their parent. When false, the directories are created with the default permissions. Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-04-04 10:25:20 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Nong Li	b310935424	Minor workload runner logging improvements. Change-Id: I75d27593599e654f7fab1cd104dd9fe9fa88cfdb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1145 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Conflicts: tests/common/workload_runner.py	2014-01-08 10:54:38 -08:00
ishaan	6b9f01374b	Selectively mute fabric's logging while running remote commands. This change encloses fabric's task method with its 'hide' context manager. The current state of the running commands are muted (i.e, hosts connected to, which command is running etc.). Error messages are NOT muted, and will still be displayed (connection error, command failure). Change-Id: Ibfbbb995ab6fe057faec9af8be90449654b21f8c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1155 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:37 -08:00
ishaan	c9aa31ac02	Suppress log spew while running plugins in the workload runner. The plugin runner uses fabric as the underlying mechanism for running remote commands on cluster hosts. fabric in turn uses paramiko, which generates a lot of log spew. This change seta parmiko's logging level to ERROR, eliminating excess logging. Additionally, it also constrains fabric's logging. Change-Id: I6229d64f95f9c1512cc01842c4a661e96e421086 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1064 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:23 -08:00
Lenni Kuff	0bae3978c9	Update compute-stats.py to execute using Impala Updates our compute stats script to execute using Impala. This allows us to easily compute stats on all tables in a database or all tables in the metastore. The updated stats caused one of the TPCH plans to change so this also updates the TPCH planner test results. Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Lenni Kuff	9d71dd3d0c	IMP-1158: Dropping a database in Impala does not cleanup the db's HDFS directory We need to pass a flag to the metastore for the cleanup to happen. Previously we were passing 'false' when we need to pass 'true' to get the same behavior as Hive when dropping databases. Added a test case to validate the cleanup when dropping databases and tables. Change-Id: I500a3d3ac52c1b2031fae842403a670cfe43fa98 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1035 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Henry Robinson	f241782966	IMPALA-620: Fix re-registration starvation bug in statestore This patch fixes a slightly pathological state that occurs when the statestore is under heavy load. The result of the bug is that subscribers cannot successfully re-register because the statestore never marks them as failed. The exact sequence of events is as follows: 1. Subscriber registers with state-store. 2. Statestore does not send heartbeats in timely fashion to subscriber. Subscriber times-out. 3. Subscriber is restarted quickly. Statestore does not detect restart. 4. Subscriber's RegisterSubscriber() call fails, because statestore detects duplicate registration. 5. Subscriber restarts again. Since state-store is slow to send heartbeats, the state-store has not detected the restart and the subscriber receives a heartbeat message from the statestore and does not reject it. 6. Statestore continues to believe subscriber is alive, since the heartbeats are not being rejected. To fix this, we add a registration ID to each successfully registered subscriber that is known to both subscriber and statestore. If the subscriber should restart and re-register, it receives a new registration ID. Whenever a heartbeat arrives, it compares its registration ID to that sent by the statestore with the heartbeat, and rejects the heartbeat if they do not match. We also allow re-registration of existing subscribers (getting rid of the dreaded "Duplicate subscription" message). A new registration overwrites an old one. Change-Id: Ie32df3a586ccb375375ebfbcbec1aaeb930b6bfe Reviewed-on: http://gerrit.ent.cloudera.com:8080/778 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-01-08 10:53:53 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
Lenni Kuff	72e211ca4a	Use Hive Metastore Service instead of HiveServer 1 in test infrastructure Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be Reviewed-on: http://gerrit.ent.cloudera.com:8080/689 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:26 -08:00
ishaan	565d15579c	Add the ability to use a workload as the unit of execution in the Impala benchmark runner. At the moment, a query is the default unit of execution and parallelism in the Impala performance suite. With this change, we now have the ability to treat a workload as the unit of execution. A workload is defined as a unique combination of the dataset, scale factor, a subset (or all) of the queries in the dataset, and a table format (file format, compression codec and compression scheme). It introduces two new command line options in bin/run-workload.py: * --execution_scope The default scope is 'query', and it maintains previous semantics. The new scope is 'workload', which toggles the unit of execution to a workload. * --shuffle_query_exec_order. Shuffles the order in which queries are executed (only applicable when the execution_scope if workload), defaults to False. Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45 Reviewed-on: http://gerrit.ent.cloudera.com:8080/503 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:07 -08:00
Henry Robinson	073abb803d	Use only Python 2.6-compatible interfaces from ElementTree Change-Id: I746c2bf36472c3e08a77bd46ae407e29f5c0596a Reviewed-on: http://gerrit.ent.cloudera.com:8080/494 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:51 -08:00
Henry Robinson	a46276325c	IMPALA-415: Don't delete hidden files in the root directory for INSERT OVERWRITE INSERT OVERWRITE into an unpartitioned table is supposed to remove all data files from the root. This should not include hidden files or directories. This patch excludes hidden files from deletion, and adds a test case. Partition directories are still removed in their entirety: the cost of statting a large number of files and directories rather than issuing a single "rm -rf" outweighs the benefits of preserving hidden files for now. Hive does not preserve hidden files in either configuration. Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/418 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:50 -08:00
ishaan	1780084753	Fail a performance job when a plugin fails. Change-Id: I81fd9ec742a9efbdb16466c8ee7387fabe3e16e4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/342 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:31 -08:00
Lenni Kuff	d66d3bfce3	IMPALA-161: Add Impala support for CREATE TABLE AS SELECT This adds support for CREATE TABLE AS SELECT to Impala. It supports all functionality a regular CREATE TABLE statement includes, except it does not allow for for specifying partition columns. Hive also has this limitation and it wouldn't be too hard to support in the future. Change-Id: I4ca3c3b8f1576441b8bb5ed9dc521d7dfa96ab74 Reviewed-on: http://gerrit.ent.cloudera.com:8080/157 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:17 -08:00
Alex Leblang	a0ad16af41	Testing framework changes for VTune plugin This patch contains changes to the general test and plugin framework that were needed to make the VTune plugin run. These changes create a conext dictionary that is passed to the plugin. Change-Id: I12ee2076fb0d777813c56bbb338e6d20426afaff Reviewed-on: http://gerrit.ent.cloudera.com:8080/111 Reviewed-by: Alex Leblang <alex.leblang@cloudera.com> Tested-by: Alex Leblang <alex.leblang@cloudera.com>	2014-01-08 10:52:12 -08:00
Alex Leblang	a4e441cc15	VTune Plugin This is a plugin that runs Intel's VTune from within the Impala test plugin runner framework. Change-Id: I673a2f6c4785bfcf565e8b002c14ada7a7c8f48a Reviewed-on: http://gerrit.ent.cloudera.com:8080/195 Reviewed-by: Alex Leblang <alex.leblang@cloudera.com> Tested-by: Alex Leblang <alex.leblang@cloudera.com>	2014-01-08 10:52:12 -08:00
ishaan	a6cb5f70a4	Introduce the notion of scope to the plugin framework. Change-Id: I2cf39c38e7e0a359950d9e05e2daed433fc0c38f Reviewed-on: http://gerrit.ent.cloudera.com:8080/144 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:05 -08:00
ishaan	6f9569ea6f	Add a plugin framework for run-workload	2014-01-08 10:51:39 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
ishaan	e7c6d57f9c	IMP-773: Add better logging/error detection to start-impala-cluster.py	2014-01-08 10:51:25 -08:00
Henry Robinson	397b82f197	Respect qualification of table names in RESET / RELOAD test sections	2014-01-08 10:50:55 -08:00
Lenni Kuff	9a4feb7391	Add impala local failure test framework and tests	2014-01-08 10:50:18 -08:00
Lenni Kuff	0d45a3a54b	Add --continue_on_error and --hive_cmd option to compute stats script	2014-01-08 10:49:50 -08:00
Lenni Kuff	36e9fe1c1a	Run compute table stats statements using Hive CLI This works around a problem with computing table stats via the Hive Meta Store client API. When executing these stements via the MetaStoreClient, all tables were getting a num_rows=0 value returned from the ANALYZE TABLE query.	2014-01-08 10:49:19 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Lenni Kuff	e0a7b7cb55	Compute column stats on tables used by Planner tests	2014-01-08 10:48:48 -08:00
ishaan	5ed84d7f65	IMP-739 Results for show queries should check for subset, not equality.	2014-01-08 10:48:46 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	dd9798c9f3	IMP-785: calculation_util.calculate_mean does not calculate mean (instead median)	2014-01-08 10:48:35 -08:00
Henry Robinson	e15a39143a	Fix definition of calculate_mean	2014-01-08 10:48:34 -08:00
Lenni Kuff	4cf7d2634e	Update benchmark runner to use mean of all results if num_clients > 1	2014-01-08 10:48:30 -08:00
Lenni Kuff	51908060b3	Ignore header "comments" in schema template files	2014-01-08 10:48:24 -08:00
Lenni Kuff	6e1f8d178a	Update utility script to compute column and table stats for given table(s)	2014-01-08 10:48:23 -08:00
Lenni Kuff	1d394cf77c	IMP-775: Fix updating test results to preserve comments, add test file parser unittests	2014-01-08 10:48:23 -08:00
ishaan	5138a720bb	IMP-768: Enable the python test framework to check for insert results.	2014-01-08 10:48:22 -08:00
Lenni Kuff	3ee82e7543	Add support for running Impala query tests against secure cluster Adds support for running all the Impala query tests against a secure cluster. This run mode can be selected by adding a --use_kerberos flag to run-tests.py and pointing to the correct (secure) Hive Metastore Service.	2014-01-08 10:48:21 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	99bb22dcac	Add db name filter to compute stats, run compute stats on functional/text tables	2014-01-08 10:48:08 -08:00
Lenni Kuff	b7c348edfa	Fix build break due to using Python 2.7 API	2014-01-08 10:46:54 -08:00
Lenni Kuff	837f35eab3	Updated results for more query tests to reflect proper ordering + improved result updating	2014-01-08 10:46:53 -08:00
Lenni Kuff	1b248d067b	Add TPC-DS dataset and workload	2014-01-08 10:46:52 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00

48 Commits