impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 21:00:54 -05:00

Author	SHA1	Message	Date
Nong Li	b0a7c4567f	Add a few directories to .gitignore. Change-Id: Ifd81c623c69629d58e7dca6aa63c3d7117f5999e (cherry picked from commit 235d94c4edf039c6ef84f140a4c70ddd1639ba63) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1346 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-22 16:08:13 -08:00
Nong Li	8ada9b4383	Add cluster_logs/ to gitignore. Change-Id: I2957f1939355455afbd01aaaf91074ffaf25be41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/450 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:44 -08:00
Lenni Kuff	2f7198292a	Add support for auxiliary workloads, tests, and datasets This change adds support for auxiliary worksloads, tests, and datasets. This is useful to augment the regular test runs with some additional tests that do not belong in the main Impala repo.	2014-01-08 10:50:32 -08:00
Alex Behm	861ba05989	IMPALA-197: Outer join on constant expressions returns incorrect results.	2014-01-08 10:50:09 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
Nong Li	7001fb103e	Move Impala to CDH4.2 RC2	2014-01-08 10:47:50 -08:00
Nong Li	fbfef4e22e	Fix crash in TopN node with null tuples.	2014-01-08 10:46:54 -08:00
Lenni Kuff	b3fce13b1d	Initial Impala failure testing library + modularize run-workload This adds initial changes for the Impala failure testing library. It also refactors run workload into its own module to it can be used in other tests. The failure testing has two main components - the first is an object model on top on top of Impala services in a cluster. This allows for enumerating the serivces in the cluster and executing commands on remote machines. This initial cut is built on top of the CM service to help with starting/stopping services. The long term goal is to let this run on both a CM cluster and non-CM cluster as well as locally. The other part of the failure injection change is failure_inctor module that uses the Impala service abstraction to select and inject failures into random impala services. This failure testing framework hasn't been completely validated because the product code is not yet ready, but it is important to get this checked in so all new changes to run-workload are based off this refactor. Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5	2014-01-08 10:46:16 -08:00
Lenni Kuff	231b66f37f	A few small fixes Queries now return rows on both our small (query test) data set as well as the 10TB data set. This change also fixes a problem with python not being set properly and adds support for reporting query results using the geometric mean Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886	2014-01-08 10:46:15 -08:00
Nong Li	c5edb8e3d4	Add version file to gitignore.	2014-01-08 10:46:01 -08:00
Henry Robinson	6bf2b3c74e	Add tarball build-step for shell, also shell version number	2014-01-08 10:45:07 -08:00
Henry Robinson	9ca5c88258	.gitignore for shell/gen-py/	2014-01-08 10:44:38 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Lenni Kuff	e293164b37	Added TPCH functional query tests and schema generation This adds most of the Hive TPCH queries into the functional Impala tests. This code review doesn't actually include the TPCH data. The data set is relatively large. Instead I updated scripts to copy the data from a data host. This change has a few parts: 1) Update the benchmark schema generation/test vector generation to be more generic. This way we can use the same schema creation/data loading steps for TPCH as we do for benchmark tests. 2) Add in schema template for the TPCH workload along with test vectors and dimensions which are used for schema generation. 3) Add in a new test file for each TPC-H query. The Hive TPCH work broke down the queries to generate some "temp" tables, then execute using joins/selects from these temp tables. Since creating the temp tables does some real work it is good to execute these via Impala. Each test a) Runs all the Insert statements to generate the temp tables b) runs the additional TPCH queries 4) Updated all the TPCH insert statements and queries to be parameterized on $TABLE name. This way we can run the tests across all combinations of file format/compression/etc. 5) Updated data loading Change-Id: I6891acc4c7464eaf1dc7dbbb532ddbeb6c259bab	2014-01-08 10:44:06 -08:00
Lenni Kuff	0da77037e3	Updated Impala performance schema and test vector generation This change updates the Impala performance schema and test vector generation techniques. It also migrates the existing benchmark scripts that were Ruby over to use Python. The changes has a few parts: 1) Conversion of test vector generation and benchmark statement generation from Ruby to Python. A result of this was also to update the benchmark test vector and dimension files to be written in CSV format (python doesn't have built-in YAML support) 2) Standardize on the naming for benchmark tables to (somewhat match Query tests). In general the form is: * If file_format=text and compression=none, do not use a table suffix * Abbreviate sequence file as (seq) rc file as (rc) etc * If using BLOCK compression don't append anything to table name, if using 'record' append 'record' 3) Created a new way to adding new schemas. this is the benchmark_schema_template.sql file. The generate_benchmark_statements.py script reads this in and breaks up the sections. The section format is: ==== Data Set Name --- BASE table name --- CREATE STATEMENT Template --- INSERT ... SELECT * format --- LOAD Base statement --- LOAD STATEMENT Format Where BASE Table is a table the other file formats/compression types can be generated from. This would generally be a local file. The thinking is that if the files already exist in HDFS then we can just load the file directly rather than issue an INSERT ... SELECT * statement. The generate_benchmark_statements.py script has been updated to use this new template as well as query HDFS for each table to determine how it should be created. It then outputs an ideal file call load-benchmark-*-generated.sql. Since this file is geneated dynamically we can remove the old benchmark statement files. 4) This has been hooked into load-benchmark-data.sh and run_query has been updated to use the new format as well	2012-07-12 23:12:20 -07:00
Lenni Kuff	462465164d	Updated .gitignore to ignore benchmark result files I accidently missed this file in the last checkin.	2012-06-29 10:15:50 -07:00
Nong Li	f9efe06649	Move IR cross compile output to a better folder for packaging.	2012-06-01 13:14:18 -07:00
Michael Ubell	7b14187bf1	Install snappy library add create-load-data.sh	2012-05-02 07:31:10 -07:00
Nong Li	88237350f0	Change the build to allow debug and release builds to coexist.	2012-02-17 18:14:04 -08:00
Nong Li	783480d6bf	- Cleaned up some TODOs. - Fix tuple template. Fixed strcmp - atoi/atof handle overflows. - added likely/unlikely compiler directive - Runquery now reports mean/stddev for profile runs - removed quoted char	2012-01-18 23:08:29 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00
Nong Li	2880f54d35	Perf Work: - Added perf counter utility - Added google perf tools - Added html data set - Added escape char test - Initial perf tuning	2011-12-30 00:26:27 -08:00
Marcel Kornacker	0827146a2b	adding outer joins plus new tests	2011-09-28 09:02:07 -07:00
Carl Steinbach	6e2c757c5c	IMP-23. Generate Cscope index file during build	2011-09-19 11:49:27 -07:00
carl	edc3a55184	IMP-8. Update build scripts, etc., to reflect thirdparty/hadoop and thirdparty/hive Update Checkpoint	2011-07-31 17:31:09 -07:00

25 Commits