impala

mirror of https://github.com/apache/impala.git synced 2026-01-08 03:02:48 -05:00

Author	SHA1	Message	Date
Michael Ho	f15589573b	IMPALA-5376: Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Reviewed-on: http://gerrit.cloudera.org:8080/6877 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-27 05:19:53 +00:00
Dimitris Tsirogiannis	8a49ceaae5	IMPALA-3739: Enable stress tests on Kudu This commit modifies the stress test framework to run TPC-H and TPC-DS workloads against Kudu. The follwing changes are included in this commit: 1. Created template files with DDL and DML statements for loading TPC-H and TPC-DS data in Kudu 2. Created a script (load-tpc-kudu.py) to load data in Kudu. The script is invoked by the stress test runner to load test data in an existing Impala/Kudu cluster (both local and CM-managed clusters are supported). 3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL files with TPC-H queries for Kudu were added in a previous patch. 4. Modified the stress test runner to take additional parameters specific to Kudu (e.g. kudu master addr) The stress test runner for Kudu was tested on EC2 clusters for both TPC-H and TPC-DS workloads. Missing functionality: * No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu. * Not all supported TPC-DS queries are included. Currently, only the TPC-DS queries from the testdata/workloads/tpcds/queries directory were modified to run against Kudu. Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34 Reviewed-on: http://gerrit.cloudera.org:8080/4327 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 11:01:37 +00:00
Ippokratis Pandis	4d5ee2b3a2	IMPALA-2364: Wrong DCHECK in PHJ::ProcessProbeBatch There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that the state of PHJ was PROCESSING_PROBE. It looks like we can hit the same dcheck when we are in REPARTITIONING phase. This patch fixes this dcheck. It also adds tpc-ds q53 in the test_mem_usage_scaling test (along with the needed refactoring in this test) because tpc-ds q53 hit this dcheck in an endurance test. Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28 Reviewed-on: http://gerrit.cloudera.org:8080/893 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-09-23 10:38:58 -07:00
Skye Wanderman-Milne	d2dcda5421	Nested types: BE changes for Parquet struct support These are the backend changes necessary for reading structs in Parquet files. I wrote this against Alex's preliminary frontend work, and ad-hoc tables containing structs work. We won't be able to add automated tested until the FE changes are in as well, but I'd like to get these changes in so we can at least get converage of our existing workloads. The bulk of the changes are in the Parquet scanner. The rest is around changing the column index of a slot descriptor to a column path, in order to support nested columns. Change-Id: Ifbd865b52c2b4679d81643184b1f36bf539ffcfd Reviewed-on: http://gerrit.cloudera.org:8080/62 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Internal Jenkins	2015-02-26 00:19:25 +00:00
Alex Behm	f696861c5c	Throw error on unrecognized test sections. Our .test file parser used to not abort tests when there is a malformed test/section. This patch changes that behavior to report an error and treat the test as failed. Quite a few tests were not well-formed, and were not executed as a result. This patch fixes those tests. Arguably, the test file parser should be more flexible in which places to accept comments, but this patch does not address that problem. Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-12-02 18:08:09 -08:00
Martin Grund	f853eeb2f0	IMPALA-1284: Allow implicit cross joins This patch adds support for executing implicit cross joins in Impala. An implicit cross join occurs when two tables are referenced in the FROM clause of a select statement without specifying the join type and in absence of applicable equi-join predicate. To convert an implicit join into a cross join, we manually create the cross join node during the join reordering. When two sub-plans are compared to each other a Hash Join plan is always preferred. As a side effect, explicit cross joins that have equi join conjuncts are now rewritten to hash joins. This patch enables us to run TPC-DS queries Q61 and Q88 which are added as planner and query tests. Change-Id: Ifd53a78e8eb38d553eb039bfeef0216e438790ba Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4695 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: jenkins (cherry picked from commit 77ff7f09350d028be033d772c5c456ceb8828013) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5319	2014-11-20 11:25:33 -08:00
ishaan	10303ed440	Add partition filters to tpcds-q89 and re-enable tpcds-q47 This patch adds partitions filters to tpcds-q89 to account for the lack of dynamic partition pruning. Additionally, it also re-enables running tpcds-q47, which was blocked by IMPALA-1238 Change-Id: Ied05d80565ebb29cd06b3c38d76bd31f0285028e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4453 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-10-08 16:50:16 -07:00
ishaan	7f576dc41e	Change the result verification for tpcds-q6 to account for the order by. Change-Id: I304788ed5d5b54dc81e23ba192e322967c028c6b Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4711 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-10-08 16:48:34 -07:00
ishaan	e126a3c8b5	Enable more tpcds queries that use correlated subqueries and analytic functions. This patch only operates on queries that use store_sales as the fact table. Change-Id: I763245ef5f68bb1519bcb4d4b26ede96913a1d57 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4312 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4106	2014-09-27 01:15:41 -07:00
Nong Li	d7a9627161	Partitioned aggregation. Change-Id: Ie3acfa04e359194b5b40011107910293f1e5609e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3788 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3948	2014-08-20 03:23:15 -07:00
Taras Bobrovytsky	e3cdbf5eb9	[CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit Queries were taken from https://github.com/cloudera/impala-tpcds-kit Change-Id: Ib86606eafb8383d480af6edc62a9ac3c13b849ff Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3782 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins	2014-08-08 02:20:40 -07:00
ishaan	2b5df0c6ff	[CDH5] Convert tpch schemas to decimal and change the queries where possible. I used the following document for reference: http://www.tpc.org/tpch/spec/tpch2.1.0.pdf Change-Id: Ic84db0628323c90e89552707f214bbb9fa2f2ae0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3132 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-07-08 14:51:43 -07:00
Nong Li	11b4d85bf1	Change precision/scale truncate in decimal divide analysis. Previously, we tried to maintain as much of the scale as possible but this leads to very easy overflow cases since it requires dropping all digits before the decimal point. This patch picks a midway point. I did a little bit of research this is close to what SQL server does (the reference is linked in the function I changed). Change-Id: I2100beead82559ef7b017c5f335acd532076c0d4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3150 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-19 17:16:29 -07:00
ishaan	db97981ab9	[CDH5] Switch the tpcds schemas to use decimal instead of float/double. This patch converts the tpcds schemas to use decimal instead of float/double. Currently, Impala can only r/w decimal in text, therefore, the tables are constrained to text. The schemas were obtained from the official tpc spec: http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-06-08 11:47:23 -07:00
ishaan	734e720297	Fix the tpcds count queries test. Because of a malformed .test file, TPCDS-COUNT-PROMOTION was never run because of a missing section delimiter. This patch fixes the .test file and adds the delimiter. Change-Id: Ifd0fa5db1c2bb84815fc66e981e6a989e6c217e4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2017 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2080	2014-03-25 22:26:42 -07:00
Matthew Jacobs	65353fd9fb	IMPALA-598: Order by behavior for NULLs should be revisited This change modifies that behavior of NULL ordering such that nulls always compare greater than other values, but "nulls first" or "nulls last" can be used to explicitly specify if nulls should be sorted first or last regardless of the asc/desc. Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/812 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:48 -08:00
Greg Rahn	8492db3b7d	fix typo for tpcds-q3 Change-Id: Ia678957dcda6ddf261422b6c43a718f5779d3553 Reviewed-on: http://gerrit.ent.cloudera.com:8080/453 Reviewed-by: Greg Rahn <grahn@cloudera.com> Tested-by: Greg Rahn <grahn@cloudera.com>	2014-01-08 10:52:44 -08:00
ishaan	13343fb5ec	Annotate tpcds count queries. Annotation helps in easily identifying queries and searching for them in the performance database. Change-Id: I89dcfe4c2885f1d5b3d5158c026aac922ff6559d Reviewed-on: http://gerrit.ent.cloudera.com:8080/299 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:26 -08:00
Nong Li	707a566b5d	Add test to tpcds queries to validate table row counts. I tried to investigate the jenkins issue where we weren't returning any rows. I setup the cluster on that box manually and noticed there weren't any results because the store_sales table was empty. Refresh did not fix. This looks like a data loading issue. Adding this test would make discovering this like this much easier. Change-Id: I8ccddd43892b279d506371b9de717629815c6a08 Reviewed-on: http://gerrit.ent.cloudera.com:8080/260 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:17 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Nong Li	58631d9ce0	Fix parquet insert .test files.	2014-01-08 10:49:46 -08:00
Skye Wanderman-Milne	a7e15b1417	Update Parquet scanner to only scan a file if assigned the first split. Also re-enable Parquet tests.	2014-01-08 10:49:25 -08:00
Nong Li	329763e5ab	Disable parquet tests.	2014-01-08 10:49:20 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	e10960b2c9	Disable test execution against Trevni and replace with seq/snap format	2014-01-08 10:47:10 -08:00
Lenni Kuff	1fcf094d67	Add support for comparing query test results by column type	2014-01-08 10:47:01 -08:00
Lenni Kuff	1b248d067b	Add TPC-DS dataset and workload	2014-01-08 10:46:52 -08:00

31 Commits