impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Author	SHA1	Message	Date
Lenni Kuff	5f9cd044ee	Add scanner test suite that runs across all file format/compression permuations	2014-01-08 10:48:25 -08:00
ishaan	5138a720bb	IMP-768: Enable the python test framework to check for insert results.	2014-01-08 10:48:22 -08:00
Henry Robinson	222d15c6ca	IMPALA-72: String partition keys should be URL encoded	2014-01-08 10:48:20 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	d2e4776731	Support passing snapshot file to buildall, add script to run all tests, remove old tests	2014-01-08 10:47:59 -08:00
Lenni Kuff	1896701399	IMPALA-44: Database names are case sensitive	2014-01-08 10:47:34 -08:00
Lenni Kuff	9d981984e7	Update expected results of the 'show table/database' test to remove trevni tables	2014-01-08 10:47:10 -08:00
Lenni Kuff	12d18631e3	Test enhancements: dynamic table format data loading, per-workload exploration stategies	2014-01-08 10:47:07 -08:00
Lenni Kuff	c806738af2	Add scan range length tests to Python test framework	2014-01-08 10:47:06 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Nong Li	fbfef4e22e	Fix crash in TopN node with null tuples.	2014-01-08 10:46:54 -08:00
Lenni Kuff	837f35eab3	Updated results for more query tests to reflect proper ordering + improved result updating	2014-01-08 10:46:53 -08:00
Lenni Kuff	bed633c1ae	Extract config/metastore creation from buildall + script for loading warehouse snapshot	2014-01-08 10:46:53 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00
Nong Li	b4dc3eeb35	Fix IMP-575	2014-01-08 10:46:45 -08:00
Nong Li	34879a4ddc	Fix IMP-297	2014-01-08 10:46:44 -08:00
Nong Li	b22b565a92	Fix codegen for min/max of bool col.	2014-01-08 10:46:43 -08:00
Alan Choi	a5a9ccf8c2	IMP-550 short-circuit queries with limit 0 Impala server would examine the plan. If the first fragment's top plan node has a "limit 0", then the query is set to EOS immediately.	2014-01-08 10:46:41 -08:00
Alan Choi	dfe7690add	IMP-522 Fix null pointer exception in HBase query The ScanNode.keyRanges is an array list that can contain null. The existing HBase scan node did not check for that. A keyRanges would contain null if 1. the row-key is a string type and it is referenced in the query and, 2. there is no predicate on the row-key.	2014-01-08 10:46:36 -08:00
Marcel Kornacker	2fda5d9b99	IMP-491 Fixes bug in Planner.createHashJoinFragment(), which didn't set the left child of the hj node to the output of the left child fragment. Also: row descriptor was set incorrectly (too wide; included tuples that weren't materialized) for roots of plan trees of non-root fragments if those fragments materialized an aggregate	2014-01-08 10:46:33 -08:00
Michael Ubell	0750384b41	IMP-497 Insert with limit, remove extra files from test.	2014-01-08 10:46:33 -08:00
Michael Ubell	116241f1d1	IMP-497 Insert with limit.	2014-01-08 10:46:33 -08:00
Michael Ubell	7536510b69	IMP-258 Test writing nulls.	2014-01-08 10:46:31 -08:00
Alan Choi	595edaa9d1	Disable all string to numeric and boolean implicit cast	2014-01-08 10:46:24 -08:00
Marcel Kornacker	ea050a43ad	Switching over backend runtime structures to new planner. Added container-util.h	2014-01-08 10:46:20 -08:00
Michael Ubell	477422beda	IMP-380 handle '\r' at end of row.	2014-01-08 10:46:14 -08:00
Henry Robinson	3519701529	Support backtick quoting for identifiers	2014-01-08 10:46:00 -08:00
Henry Robinson	91c3b979ca	IMP-370: SHOW TABLES IN support and IMP-363: SHOW DATABASES Change-Id: Ic41c4b0767a0480f0a18e1e985f25de3bc2ca947	2014-01-08 10:45:59 -08:00
Henry Robinson	540673763f	Add session key handling to ThriftServer, and session support to the frontend	2014-01-08 10:45:59 -08:00
Marcel Kornacker	5984c0be52	First cut of partitioned plan generation: - created new class PlanFragment, which encapsulates everything having to do with a single plan fragment, including its partition, output exprs, destination node, etc. - created new class DataPartition - explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints - Adding IdGenerator class. - moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for PlanFragment.getExplainString() - Changed planner interface to return scan ranges with a complete list of server locations, instead of making a server assignment. Also included: cleaned up AggregateInfo: - the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation. - moved analysis functionality into AggregateInfo Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).	2014-01-08 10:45:56 -08:00
Michael Ubell	5f951ffc4a	Handle missing columns at the end of a row	2014-01-08 10:45:11 -08:00
Henry Robinson	e7348a209b	IMP-232: Parallel INSERT OVERWRITE	2014-01-08 10:45:04 -08:00
Henry Robinson	e3e6ba984b	Show / describe	2014-01-08 10:44:49 -08:00
Alan Choi	22765fc33a	IMP-251: re-enable DataErrorTest verify that the exception message contains the correct error; verify that excpected exception is thrown; verify that no exception is thrown when abort_on_error is set to false	2014-01-08 10:44:45 -08:00
Marcel Kornacker	7725f25ff5	This combines changes related to periodic reporting of plan fragment exec profiles: - executor takes report callback; passed in by ImpalaServer::FragmentExecState - the PlanFragmentExecutor invokes profile reporting cb in background thread. - RuntimeProfile is now thread-safe and has an RuntimeProfile::Update() Also included: - a number of bug fixes related to async cancellation of query and propagation of errors through PlanFragmentExecutor/Coordinator/ImpalaServer. - changing COUNTER_SCOPED_TIMER to SCOPED_TIMER - derived counters: RuntimeProfile now lets you add counters that return a value via a function call, which is useful for reporting something like normalized ScanNode throughput; retrofitted to ScanNode and all subclasses - changed coordinator to make cancellation atomic wrt recognition of an error status for the overall query. - Removed InProcessQueryExecutor from data-stream-test. Added aggregate throughput counters to coordinator: - all throughput counters are grouped in a sub-profile "AggregateThroughput" - each scan node gets its own counter - the value is aggregated across all registered backends which contain that node in their plan fragments	2014-01-08 10:44:42 -08:00
Nong Li	4c9c82910a	Text parser fix for columns off end.	2014-01-08 10:44:40 -08:00
Nong Li	4d0319d32b	Fix null string parsing.	2014-01-08 10:44:40 -08:00
Alan Choi	dd1537d116	IMP-132: collect unique agg expr	2014-01-08 10:44:39 -08:00
Nong Li	81bba16dac	Parallel scanners.	2014-01-08 10:44:38 -08:00
Alan Choi	9ac664f1f7	Fix IMP-239: text_converter_->WriteSlot returns true when it's ok QueryTest and HBaseQueryTest set AbortOnError to false except the expected error case	2014-01-08 10:44:37 -08:00
Henry Robinson	c472213eeb	Parallel INSERT, sink-per-scan-node plan	2014-01-08 10:44:35 -08:00
Alexander Behm	ee705e3083	Added timestamp arithmetic expressions.	2014-01-08 10:44:31 -08:00
Alan Choi	f15ef994fb	"mvn test" now uses impalad and beeswax api to submit query and fetch, including insert query. review issue: 260	2014-01-08 10:44:30 -08:00
Alan Choi	88101bc90e	This patch implements the probabilistic counting algorithm as an aggregate "distinctpc" and "distinctpcsa". We've gathered statistics on an internal dataset (all columns) which is part of our regression data. It's roughly 400mb, ~100 columns, int/bigint/string type. On Hive, it took roughly 64sec. On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't), we can achieve 24~26sec. Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3	2014-01-08 10:44:27 -08:00
Alan Choi	cbadb4eac4	When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes this problem. review: 162	2014-01-08 10:44:24 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

... 2 3 4 5 6

297 Commits