Commit Graph

329 Commits

Author SHA1 Message Date
Lenni Kuff
9953224f68 Cleanup IMPALA_HOME/bin directory
Deleted some old files and moved some files out of the bin directory into better locations
2014-01-08 10:46:55 -08:00
Lenni Kuff
bed633c1ae Extract config/metastore creation from buildall + script for loading warehouse snapshot 2014-01-08 10:46:53 -08:00
Lenni Kuff
61b1be359c Add support for reporting perf result as "official" 2014-01-08 10:46:52 -08:00
Lenni Kuff
97380676d8 Fixed case sensitivity issue with exec options in the test Impala Beeswax client code
Also added support for executing run-workload in a mode that continues after query errors
2014-01-08 10:46:50 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
0098b46907 Bump Impala version to .3 and fix Impala shell version generation 2014-01-08 10:46:49 -08:00
Lenni Kuff
1e25c98fb4 Test data loading framework improvements
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
2014-01-08 10:46:49 -08:00
Henry Robinson
03b9b8acb6 IMP-532: Rename state-store-service to statestored 2014-01-08 10:46:46 -08:00
Lenni Kuff
df0617bc83 Bumping version to .2 2014-01-08 10:46:43 -08:00
Lenni Kuff
d4d3dde484 Bumping version to .2 2014-01-08 10:46:42 -08:00
Henry Robinson
35e7e2a7a9 Move thirdparty library versions to environment variables 2014-01-08 10:46:38 -08:00
Lenni Kuff
8e63db4faa Add mini-impala-cluster utility for starting multiple impala backends in-process 2014-01-08 10:46:35 -08:00
Michael Ubell
8a5297a526 Add HdfsLzoTextScanner 2014-01-08 10:46:35 -08:00
Henry Robinson
5f314a4d7e Move to Postgresql for metastore 2014-01-08 10:46:34 -08:00
Alan Choi
ed0a98952a Remove -default_num_node option from start-impala-cluster.py 2014-01-08 10:46:32 -08:00
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
ishaan
ccb020c4a0 Adding copyrights to remaining files. 2014-01-08 10:46:30 -08:00
ishaan
05c65789bb Change Copyrights from 2011 ti 2012 2014-01-08 10:46:29 -08:00
ishaan
c5ddba4296 run-workload + keberos 2014-01-08 10:46:29 -08:00
Lenni Kuff
9f91081183 Modify TPCH tests to always insert into text table so workload can run on all file formats 2014-01-08 10:46:21 -08:00
Nong Li
bc08241ffb IR cross compile fixes for inlined string-value functions. 2014-01-08 10:46:19 -08:00
Lenni Kuff
8f88f5c00e Fix formatting of float in benchmark report due to increased precision of beeswax results 2014-01-08 10:46:18 -08:00
Michael Ubell
37aaf06f79 IMP-390 Get rid of test dependencies on InProcessQE and Runquery 2014-01-08 10:46:18 -08:00
Lenni Kuff
5e91fc8ff8 Fix $TABLE table suffix replacement for workloads that don't have tables in a database 2014-01-08 10:46:17 -08:00
Lenni Kuff
b3fce13b1d Initial Impala failure testing library + modularize run-workload
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.

The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.

The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.

This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.

Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5
2014-01-08 10:46:16 -08:00
Lenni Kuff
25edaae9d7 Enable running specific query name(s) + log exec results before completion 2014-01-08 10:46:16 -08:00
Lenni Kuff
231b66f37f A few small fixes
Queries now return rows on both our small (query test) data set as well as the 10TB
data set. This change also fixes a problem with python not being set properly and
adds support for reporting query results using the geometric mean

Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886
2014-01-08 10:46:15 -08:00
ishaan
e8d83beb23 Fix PYTHONPATH in impala-config.sh so we can work with beeswax etc. 2014-01-08 10:46:14 -08:00
ishaan
3ec95e3226 Enable run-workload to run with beeswax. 2014-01-08 10:46:14 -08:00
Lenni Kuff
7660daf84e Remove old CLI based on sqlline and JDBC 2014-01-08 10:46:02 -08:00
Henry Robinson
d3876a3bff Add shebang to Python file to allow packaging to complete 2014-01-08 10:46:02 -08:00
Lenni Kuff
62f9a2d534 Add additional run info to benchmark reports, plus support for saving results to perf db
This enables a more detailed summary to be generated for the benchmark runs that
include info such as the impala version, cluster name, and lab run name.  It also adds
support for storing results in a mysql database along with the schema definition
and data access module that comes along with that.

Change-Id: I948d27d2e2d633acad4730b28fbb5739527cd311
2014-01-08 10:46:01 -08:00
Lenni Kuff
0c5e41c5d5 Fix git hash / version info generation for packaging builds
The packaging builds are done in a way that doesn't allow for us to query the
git hash at the time of build. The fix is to generate a version file before
build time and use that to populate the changes. This changes adds a
save-version.sh script that is run before the package build starts. This change also consolidates the version info gathering between core impala and the CLI.
2014-01-08 10:45:59 -08:00
Nong Li
eb4f11c194 Add build time to version string.
Change-Id: I8860450a2839b11e313f5ddef6cd1eddffe4795f
2014-01-08 10:45:53 -08:00
ishaan
76a4719f09 Shell enhancements: case sensitivity, help commands, interactive mode. 2014-01-08 10:45:11 -08:00
Lenni Kuff
846b5c55be Disabling running of COMPUTE STATISTICS statements by default during data loading 2014-01-08 10:45:10 -08:00
Michael Ubell
2a4ab483eb Fix building of the sasl library. 2014-01-08 10:45:10 -08:00
Michael Ubell
ad46b98366 Add Kerberos authentication. 2014-01-08 10:45:10 -08:00
Lenni Kuff
5f72b34faa Additional changes to run-workload for flexible query execution, filtering of file formats
This changes cleans up run-workload to push more query execution logic into query_executor.
It also adds a new feature to run-workload to support filtering of the file format / compression
to run on.
2014-01-08 10:45:08 -08:00
Henry Robinson
6bf2b3c74e Add tarball build-step for shell, also shell version number 2014-01-08 10:45:07 -08:00
Lenni Kuff
1b4f318bf2 Update run-workload to facilitate beeswax execution and support saving of partial results
This change updates run-workload to provide a more generic interface for query
execution. Now the query executor just takes an execution function and a new
QueryExecOptions object that defines the values to use for execution.
I also made a change to store partial result sets so we can salvage some work if
a run fails.
2014-01-08 10:45:06 -08:00
Nong Li
126971edbb Update Impala to use CDH4.1 rc3. 2014-01-08 10:45:04 -08:00
Lenni Kuff
7d595ba740 Update run-workload result reporting to make reference result comparison more flexible
Now we save Hive results into a separate file (previously everything was stored
in the same file. Also added ability to do a run-benchmark and specify to skip
impala and which will help generate hive reference results.

Updated the reporting script to reflect this change.
2014-01-08 10:44:50 -08:00
Henry Robinson
825c63ad3b IMP-298: Make standalone impalad start a scheduler for a single node 2014-01-08 10:44:49 -08:00
ishaan
4c84cdae51 Handle queries with '%', python does not parse it properly. 2014-01-08 10:44:44 -08:00
ishaan
e84cc0a9eb Enable code coverage on release builds. 2014-01-08 10:44:41 -08:00
Lenni Kuff
58001240d5 Improve performance reporting and add support for running multiple workloads with different scale factors
This improves the summary reporting for perf results, fixes a problem with how the short query names were being
stored, and also adds support for running multiple workloads of different scale factors.
2014-01-08 10:44:41 -08:00
Lenni Kuff
aa60a59188 Add support for executing multiple workload queries in parallel
This change add a -num_clients flag that specifies the number of clients
(threads) to use when executing each query in a workload. This is used to
validate Impala concurrency/stress. The logging was getting messed up with
multiple threads so I also updated this to use the logger module.

Currently we only capture and save the results of the first thread that
executes. In the future we might want to update this to capture results from all
the threads.
2014-01-08 10:44:40 -08:00
ishaan
25aa8cba0d Re-factor run-benchmark and add annotation for .test query files. 2014-01-08 10:44:39 -08:00
Henry Robinson
fb681fba4e Simple Python shell for Impala 2014-01-08 10:44:37 -08:00