Commit Graph

31 Commits

Author SHA1 Message Date
Michael Ho
f15589573b IMPALA-5376: Loads all TPC-DS tables
This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.

Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.

Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Reviewed-on: http://gerrit.cloudera.org:8080/6877
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-27 05:19:53 +00:00
Dimitris Tsirogiannis
8a49ceaae5 IMPALA-3739: Enable stress tests on Kudu
This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
   TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
   script is invoked by the stress test runner to load test data in an
   existing Impala/Kudu cluster (both local and CM-managed clusters are
   supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
   files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
   specific to Kudu (e.g. kudu master addr)

The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.

Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
  TPC-DS queries from the testdata/workloads/tpcds/queries directory
  were modified to run against Kudu.

Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Reviewed-on: http://gerrit.cloudera.org:8080/4327
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2016-10-21 11:01:37 +00:00
Ippokratis Pandis
4d5ee2b3a2 IMPALA-2364: Wrong DCHECK in PHJ::ProcessProbeBatch
There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that
the state of PHJ was PROCESSING_PROBE. It looks like we can hit the
same dcheck when we are in REPARTITIONING phase.
This patch fixes this dcheck. It also adds tpc-ds q53 in the
test_mem_usage_scaling test (along with the needed refactoring in this
test) because tpc-ds q53 hit this dcheck in an endurance test.

Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28
Reviewed-on: http://gerrit.cloudera.org:8080/893
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 10:38:58 -07:00
Skye Wanderman-Milne
d2dcda5421 Nested types: BE changes for Parquet struct support
These are the backend changes necessary for reading structs in Parquet
files. I wrote this against Alex's preliminary frontend work, and
ad-hoc tables containing structs work. We won't be able to add
automated tested until the FE changes are in as well, but I'd like to
get these changes in so we can at least get converage of our existing
workloads.

The bulk of the changes are in the Parquet scanner. The rest is around
changing the column index of a slot descriptor to a column path, in
order to support nested columns.

Change-Id: Ifbd865b52c2b4679d81643184b1f36bf539ffcfd
Reviewed-on: http://gerrit.cloudera.org:8080/62
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
2015-02-26 00:19:25 +00:00
Alex Behm
f696861c5c Throw error on unrecognized test sections.
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.

Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.

Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.

Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-02 18:08:09 -08:00
Martin Grund
f853eeb2f0 IMPALA-1284: Allow implicit cross joins
This patch adds support for executing implicit cross joins in Impala. An
implicit cross join occurs when two tables are referenced in the FROM
clause of a select statement without specifying the join type and
in absence of applicable equi-join predicate.

To convert an implicit join into a cross join, we manually create the
cross join node during the join reordering. When two sub-plans are
compared to each other a Hash Join plan is always preferred.

As a side effect, explicit cross joins that have equi join conjuncts are
now rewritten to hash joins.

This patch enables us to run TPC-DS queries Q61 and Q88 which are added
as planner and query tests.

Change-Id: Ifd53a78e8eb38d553eb039bfeef0216e438790ba
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4695
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 77ff7f09350d028be033d772c5c456ceb8828013)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5319
2014-11-20 11:25:33 -08:00
ishaan
10303ed440 Add partition filters to tpcds-q89 and re-enable tpcds-q47
This patch adds partitions filters to tpcds-q89 to account for the lack of dynamic
partition pruning. Additionally, it also re-enables running tpcds-q47, which was blocked
by IMPALA-1238

Change-Id: Ied05d80565ebb29cd06b3c38d76bd31f0285028e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4453
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-10-08 16:50:16 -07:00
ishaan
7f576dc41e Change the result verification for tpcds-q6 to account for the order by.
Change-Id: I304788ed5d5b54dc81e23ba192e322967c028c6b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4711
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-10-08 16:48:34 -07:00
ishaan
e126a3c8b5 Enable more tpcds queries that use correlated subqueries and analytic functions.
This patch only operates on queries that use store_sales as the fact table.

Change-Id: I763245ef5f68bb1519bcb4d4b26ede96913a1d57
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4312
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4106
2014-09-27 01:15:41 -07:00
Nong Li
d7a9627161 Partitioned aggregation.
Change-Id: Ie3acfa04e359194b5b40011107910293f1e5609e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3788
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3948
2014-08-20 03:23:15 -07:00
Taras Bobrovytsky
e3cdbf5eb9 [CDH5] Modified TPCDS schema and queries to match Impala TPCDS kit
Queries were taken from https://github.com/cloudera/impala-tpcds-kit

Change-Id: Ib86606eafb8383d480af6edc62a9ac3c13b849ff
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3782
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
2014-08-08 02:20:40 -07:00
ishaan
2b5df0c6ff [CDH5] Convert tpch schemas to decimal and change the queries where possible.
I used the following document for reference: http://www.tpc.org/tpch/spec/tpch2.1.0.pdf

Change-Id: Ic84db0628323c90e89552707f214bbb9fa2f2ae0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3132
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-07-08 14:51:43 -07:00
Nong Li
11b4d85bf1 Change precision/scale truncate in decimal divide analysis.
Previously, we tried to maintain as much of the scale as possible but
this leads to very easy overflow cases since it requires dropping all
digits before the decimal point. This patch picks a midway point.

I did a little bit of research this is close to what SQL server does
(the reference is linked in the function I changed).

Change-Id: I2100beead82559ef7b017c5f335acd532076c0d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3150
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-19 17:16:29 -07:00
ishaan
db97981ab9 [CDH5] Switch the tpcds schemas to use decimal instead of float/double.
This patch converts the tpcds schemas to use decimal instead of float/double. Currently,
Impala can only r/w decimal in text, therefore, the tables are constrained to text. The
schemas were obtained from the official tpc spec:
http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf

Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-06-08 11:47:23 -07:00
ishaan
734e720297 Fix the tpcds count queries test.
Because of a malformed .test file, TPCDS-COUNT-PROMOTION was never run because of a
missing section delimiter. This patch fixes the .test file and adds the delimiter.

Change-Id: Ifd0fa5db1c2bb84815fc66e981e6a989e6c217e4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2017
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2080
2014-03-25 22:26:42 -07:00
Matthew Jacobs
65353fd9fb IMPALA-598: Order by behavior for NULLs should be revisited
This change modifies that behavior of NULL ordering such that nulls always
compare greater than other values, but "nulls first" or "nulls last" can be used
to explicitly specify if nulls should be sorted first or last regardless of the
asc/desc.

Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/812
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:53:48 -08:00
Greg Rahn
8492db3b7d fix typo for tpcds-q3
Change-Id: Ia678957dcda6ddf261422b6c43a718f5779d3553
Reviewed-on: http://gerrit.ent.cloudera.com:8080/453
Reviewed-by: Greg Rahn <grahn@cloudera.com>
Tested-by: Greg Rahn <grahn@cloudera.com>
2014-01-08 10:52:44 -08:00
ishaan
13343fb5ec Annotate tpcds count queries.
Annotation helps in easily identifying queries and searching for them in the performance
database.

Change-Id: I89dcfe4c2885f1d5b3d5158c026aac922ff6559d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/299
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:26 -08:00
Nong Li
707a566b5d Add test to tpcds queries to validate table row counts.
I tried to investigate the jenkins issue where we weren't returning any rows.
I setup the cluster on that box manually and noticed there weren't any results
because the store_sales table was empty. Refresh did not fix. This looks like
a data loading issue. Adding this test would make discovering this like this
much easier.

Change-Id: I8ccddd43892b279d506371b9de717629815c6a08
Reviewed-on: http://gerrit.ent.cloudera.com:8080/260
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:17 -08:00
Lenni Kuff
17ed6ea177 Partition TPC-DS dataset and add additional TPC-DS workload queries
Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300
Reviewed-on: http://gerrit.ent.cloudera.com:8080/99
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:13 -08:00
Nong Li
58631d9ce0 Fix parquet insert .test files. 2014-01-08 10:49:46 -08:00
Skye Wanderman-Milne
a7e15b1417 Update Parquet scanner to only scan a file if assigned the first split.
Also re-enable Parquet tests.
2014-01-08 10:49:25 -08:00
Nong Li
329763e5ab Disable parquet tests. 2014-01-08 10:49:20 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
Nong Li
6e293090e6 Parquet writer.
Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec
2014-01-08 10:48:44 -08:00
Lenni Kuff
328ceed4e7 Add support for generating lzo compressed text files and running tests against lzo 2014-01-08 10:48:38 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
e10960b2c9 Disable test execution against Trevni and replace with seq/snap format 2014-01-08 10:47:10 -08:00
Lenni Kuff
1fcf094d67 Add support for comparing query test results by column type 2014-01-08 10:47:01 -08:00
Lenni Kuff
1b248d067b Add TPC-DS dataset and workload 2014-01-08 10:46:52 -08:00