impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 21:03:01 -05:00

Author	SHA1	Message	Date
Henry Robinson	e3e6ba984b	Show / describe	2014-01-08 10:44:49 -08:00
Alan Choi	8dae344ceb	Do not validate filename in DataErrorTest because it is not deterministic.	2014-01-08 10:44:45 -08:00
Alan Choi	22765fc33a	IMP-251: re-enable DataErrorTest verify that the exception message contains the correct error; verify that excpected exception is thrown; verify that no exception is thrown when abort_on_error is set to false	2014-01-08 10:44:45 -08:00
Marcel Kornacker	7725f25ff5	This combines changes related to periodic reporting of plan fragment exec profiles: - executor takes report callback; passed in by ImpalaServer::FragmentExecState - the PlanFragmentExecutor invokes profile reporting cb in background thread. - RuntimeProfile is now thread-safe and has an RuntimeProfile::Update() Also included: - a number of bug fixes related to async cancellation of query and propagation of errors through PlanFragmentExecutor/Coordinator/ImpalaServer. - changing COUNTER_SCOPED_TIMER to SCOPED_TIMER - derived counters: RuntimeProfile now lets you add counters that return a value via a function call, which is useful for reporting something like normalized ScanNode throughput; retrofitted to ScanNode and all subclasses - changed coordinator to make cancellation atomic wrt recognition of an error status for the overall query. - Removed InProcessQueryExecutor from data-stream-test. Added aggregate throughput counters to coordinator: - all throughput counters are grouped in a sub-profile "AggregateThroughput" - each scan node gets its own counter - the value is aggregated across all registered backends which contain that node in their plan fragments	2014-01-08 10:44:42 -08:00
Nong Li	4c9c82910a	Text parser fix for columns off end.	2014-01-08 10:44:40 -08:00
Nong Li	4d0319d32b	Fix null string parsing.	2014-01-08 10:44:40 -08:00
Alan Choi	dd1537d116	IMP-132: collect unique agg expr	2014-01-08 10:44:39 -08:00
Nong Li	81bba16dac	Parallel scanners.	2014-01-08 10:44:38 -08:00
Alan Choi	9ac664f1f7	Fix IMP-239: text_converter_->WriteSlot returns true when it's ok QueryTest and HBaseQueryTest set AbortOnError to false except the expected error case	2014-01-08 10:44:37 -08:00
Henry Robinson	c472213eeb	Parallel INSERT, sink-per-scan-node plan	2014-01-08 10:44:35 -08:00
Nong Li	4fd7bd9606	Updated tpch core workload to include seq/snappy and seq/gzip. Change-Id: Ifb01ee95542fced2ae8cfa4928ffbc7e357df3a8	2014-01-08 10:44:34 -08:00
Alexander Behm	ee705e3083	Added timestamp arithmetic expressions.	2014-01-08 10:44:31 -08:00
Alan Choi	f15ef994fb	"mvn test" now uses impalad and beeswax api to submit query and fetch, including insert query. review issue: 260	2014-01-08 10:44:30 -08:00
Lenni Kuff	87d0ed137f	Temporarily disabled TPC-H planner tests that require data to be loaded in tmp tables I am temporarily disabling the TPC-H planner tests that require data to be pre-loaded in temp tables. This resolves a problem where the TPC-H query tests need to be run before the TPC-H planner tests. I have filed "IMP-171" to track the work to re-enable these tests.	2014-01-08 10:44:30 -08:00
Marcel Kornacker	52bd3ad173	fixing PlannerTest	2014-01-08 10:44:28 -08:00
Marcel Kornacker	04d12f03fc	cleaning up logging output	2014-01-08 10:44:28 -08:00
Alan Choi	88101bc90e	This patch implements the probabilistic counting algorithm as an aggregate "distinctpc" and "distinctpcsa". We've gathered statistics on an internal dataset (all columns) which is part of our regression data. It's roughly 400mb, ~100 columns, int/bigint/string type. On Hive, it took roughly 64sec. On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't), we can achieve 24~26sec. Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3	2014-01-08 10:44:27 -08:00
Alan Choi	cbadb4eac4	When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes this problem. review: 162	2014-01-08 10:44:24 -08:00
Lenni Kuff	91f51a1b39	Fixed issue with data loading of workloads that have non-word characters in their names Fixed a problem where we were not properly looking up the dataset associated with the given workload if it had non-word characters in its name (a-z & _). Also cut down on the execution time of the hive-benchmark workload under the "core" vector.	2014-01-08 10:44:23 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Lenni Kuff	bf27a31f98	Move functional data loading to new framework + initial changes for workload directory structure This change moves (almost) all the functional data loading to the new data loading framework. This removes the need for the create.sql, load.sql, and load-raw-data.sql file. Instead we just have the single schema template file: testdata/datasets/functional/functional_schema_template.sql This template can be used to generate the schema for all file formats and compression variations. It also should help make loading data easier. Now you can run: bin/load-impala-data.sh "query-test" "exhaustive" And get all data needed for running the query tests. This change also includes the initial changes for new dataset/workload directory structure. The new structure looks like: testdata/workload <- Will contain query files and test vectors/dimensions testdata/datasets <- WIll contain the data files and schema templates Note: This is the first part of the change to this directory structure - it's not yet complete. # Please enter the commit message for your changes. Lines starting	2014-01-08 10:44:18 -08:00

... 5 6 7 8 9

422 Commits