impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 06:00:52 -05:00

Author	SHA1	Message	Date
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Lenni Kuff	7584312540	IMPALA-167: Impala should gracefully handle unsupported Hive table types	2014-01-08 10:48:56 -08:00
Skye Wanderman-Milne	811d5dd00b	Create Avro schema directory in test warehouse	2014-01-08 10:48:50 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
Lenni Kuff	e0a7b7cb55	Compute column stats on tables used by Planner tests	2014-01-08 10:48:48 -08:00
Nong Li	edaadc5091	Fix old rcfile table	2014-01-08 10:48:47 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Nong Li	18dfcbd361	Fix build break from bad merge in generate-schema-statements.	2014-01-08 10:48:45 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Nong Li	0385d14d69	Fix pre-hive 9 rc file scanner.	2014-01-08 10:48:41 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	6e1f8d178a	Update utility script to compute column and table stats for given table(s)	2014-01-08 10:48:23 -08:00
Skye Wanderman-Milne	f4784947ce	buildall.sh improvements: move 'set -e' higher, kill processes that are accessing postgres before creating metastore	2014-01-08 10:48:18 -08:00
Nong Li	27bc33d6b4	Fix snapshot cleanup.	2014-01-08 10:48:18 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	99bb22dcac	Add db name filter to compute stats, run compute stats on functional/text tables	2014-01-08 10:48:08 -08:00
Alan Choi	73c8ee3d96	IMPALA-18 Ignore hidden file prefixed with . or _	2014-01-08 10:48:00 -08:00
Lenni Kuff	d2e4776731	Support passing snapshot file to buildall, add script to run all tests, remove old tests	2014-01-08 10:47:59 -08:00
Nong Li	7001fb103e	Move Impala to CDH4.2 RC2	2014-01-08 10:47:50 -08:00
Lenni Kuff	d5177c3c30	Update run-workload to support specifying table format test vectors from command line	2014-01-08 10:47:20 -08:00
Lenni Kuff	deea7a86b9	Remove Trevni from mixed format table and fix data loading bug	2014-01-08 10:47:10 -08:00
Lenni Kuff	12d18631e3	Test enhancements: dynamic table format data loading, per-workload exploration stategies	2014-01-08 10:47:07 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Lenni Kuff	bed633c1ae	Extract config/metastore creation from buildall + script for loading warehouse snapshot	2014-01-08 10:46:53 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00
Henry Robinson	997df15b69	IMP-581: HBase table loading error / IMP-401: Re-enable tests for structured columns	2014-01-08 10:46:48 -08:00
Michael Ubell	8a5297a526	Add HdfsLzoTextScanner	2014-01-08 10:46:35 -08:00
Michael Ubell	85807f6169	Start a single impalad to avoid data load race	2014-01-08 10:46:18 -08:00
Michael Ubell	325a2f01ad	Add refresh to load script	2014-01-08 10:46:18 -08:00
Michael Ubell	37aaf06f79	IMP-390 Get rid of test dependencies on InProcessQE and Runquery	2014-01-08 10:46:18 -08:00
Michael Ubell	0c4f025a5e	Fix loading of nulltable data, remove loading functional-planner data	2014-01-08 10:45:58 -08:00
Michael Ubell	bf57ae27a5	IMP-291 Read sequence file to next sync mark when; ragged columns	2014-01-08 10:45:57 -08:00
Michael Ubell	0e714f5720	Add error recovery to sequence files.	2014-01-08 10:45:07 -08:00
Henry Robinson	afc30baf52	Impalad for Trevni loading shouldn't use a state-store	2014-01-08 10:44:51 -08:00
Henry Robinson	30653278b4	run-query should not start a state-store client by default	2014-01-08 10:44:50 -08:00
Nong Li	cbb06c191c	Undo load dependent tables change.	2014-01-08 10:44:39 -08:00
Nong Li	81bba16dac	Parallel scanners.	2014-01-08 10:44:38 -08:00
Lenni Kuff	6e07e0b8d8	Added support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements during data loading Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading workflow. This allows for capturing simple table stats such as number of rows, number of partitions, and table size in bytes. These are stored into a new mysql database with the same name as the metastore except with a '_Stats' suffix. If using Derby a new database results are stored in a new derby database.	2014-01-08 10:44:34 -08:00
Alan Choi	3d9808d7a6	Upgrade HDFS past the build which contains disk id api	2014-01-08 10:44:31 -08:00
Michael Ubell	1a05c4e776	Chage to use impalad for data loading.	2014-01-08 10:44:29 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Michael Ubell	81d54e85e5	Change trevni data loading to avoid possible connection issues.	2014-01-08 10:44:21 -08:00
Michael Ubell	ede83a84b2	Make the trevni load script exit on error.	2014-01-08 10:44:20 -08:00
Michael Ubell	bc7eff1240	Kill off planserver after loading trevni files so jenkins is happy	2014-01-08 10:44:20 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Lenni Kuff	84d91fca4f	Fix sequence file data loading for the alltypesmixedformat table Moved this out of the data loading framework because it is kind of a special case. I will consider how we can update the framework to address mixed format tables.	2014-01-08 10:44:18 -08:00
Lenni Kuff	cef688d0fd	IMP-95: Fix/recognize intermittent data load failures on jenkins Builds now fail on data loading problems. Also a simple test fix.	2014-01-08 10:44:18 -08:00
Lenni Kuff	bf27a31f98	Move functional data loading to new framework + initial changes for workload directory structure This change moves (almost) all the functional data loading to the new data loading framework. This removes the need for the create.sql, load.sql, and load-raw-data.sql file. Instead we just have the single schema template file: testdata/datasets/functional/functional_schema_template.sql This template can be used to generate the schema for all file formats and compression variations. It also should help make loading data easier. Now you can run: bin/load-impala-data.sh "query-test" "exhaustive" And get all data needed for running the query tests. This change also includes the initial changes for new dataset/workload directory structure. The new structure looks like: testdata/workload <- Will contain query files and test vectors/dimensions testdata/datasets <- WIll contain the data files and schema templates Note: This is the first part of the change to this directory structure - it's not yet complete. # Please enter the commit message for your changes. Lines starting	2014-01-08 10:44:18 -08:00
Henry Robinson	69777066df	IMP-163: Fix loading table with string partition key	2014-01-08 10:44:17 -08:00

1 2 3

104 Commits