This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.
Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.
Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Reviewed-on: http://gerrit.cloudera.org:8080/6877
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
This commit modifies the stress test framework to run TPC-H and TPC-DS
workloads against Kudu. The follwing changes are included in this
commit:
1. Created template files with DDL and DML statements for loading TPC-H and
TPC-DS data in Kudu
2. Created a script (load-tpc-kudu.py) to load data in Kudu. The
script is invoked by the stress test runner to load test data in an
existing Impala/Kudu cluster (both local and CM-managed clusters are
supported).
3. Created SQL files with TPC-DS queries to be executed in Kudu. SQL
files with TPC-H queries for Kudu were added in a previous patch.
4. Modified the stress test runner to take additional parameters
specific to Kudu (e.g. kudu master addr)
The stress test runner for Kudu was tested on EC2 clusters for both TPC-H
and TPC-DS workloads.
Missing functionality:
* No CRUD operations in the existing TPC-H/TPC-DS workloads for Kudu.
* Not all supported TPC-DS queries are included. Currently, only the
TPC-DS queries from the testdata/workloads/tpcds/queries directory
were modified to run against Kudu.
Change-Id: I3c9fc3dae24b761f031ee8e014bd611a49029d34
Reviewed-on: http://gerrit.cloudera.org:8080/4327
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that
the state of PHJ was PROCESSING_PROBE. It looks like we can hit the
same dcheck when we are in REPARTITIONING phase.
This patch fixes this dcheck. It also adds tpc-ds q53 in the
test_mem_usage_scaling test (along with the needed refactoring in this
test) because tpc-ds q53 hit this dcheck in an endurance test.
Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28
Reviewed-on: http://gerrit.cloudera.org:8080/893
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: Internal Jenkins
These are the backend changes necessary for reading structs in Parquet
files. I wrote this against Alex's preliminary frontend work, and
ad-hoc tables containing structs work. We won't be able to add
automated tested until the FE changes are in as well, but I'd like to
get these changes in so we can at least get converage of our existing
workloads.
The bulk of the changes are in the Parquet scanner. The rest is around
changing the column index of a slot descriptor to a column path, in
order to support nested columns.
Change-Id: Ifbd865b52c2b4679d81643184b1f36bf539ffcfd
Reviewed-on: http://gerrit.cloudera.org:8080/62
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.
Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.
Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.
Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This patch adds support for executing implicit cross joins in Impala. An
implicit cross join occurs when two tables are referenced in the FROM
clause of a select statement without specifying the join type and
in absence of applicable equi-join predicate.
To convert an implicit join into a cross join, we manually create the
cross join node during the join reordering. When two sub-plans are
compared to each other a Hash Join plan is always preferred.
As a side effect, explicit cross joins that have equi join conjuncts are
now rewritten to hash joins.
This patch enables us to run TPC-DS queries Q61 and Q88 which are added
as planner and query tests.
Change-Id: Ifd53a78e8eb38d553eb039bfeef0216e438790ba
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4695
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 77ff7f09350d028be033d772c5c456ceb8828013)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5319
This patch adds partitions filters to tpcds-q89 to account for the lack of dynamic
partition pruning. Additionally, it also re-enables running tpcds-q47, which was blocked
by IMPALA-1238
Change-Id: Ied05d80565ebb29cd06b3c38d76bd31f0285028e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4453
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Previously, we tried to maintain as much of the scale as possible but
this leads to very easy overflow cases since it requires dropping all
digits before the decimal point. This patch picks a midway point.
I did a little bit of research this is close to what SQL server does
(the reference is linked in the function I changed).
Change-Id: I2100beead82559ef7b017c5f335acd532076c0d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3150
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This patch converts the tpcds schemas to use decimal instead of float/double. Currently,
Impala can only r/w decimal in text, therefore, the tables are constrained to text. The
schemas were obtained from the official tpc spec:
http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf
Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This change modifies that behavior of NULL ordering such that nulls always
compare greater than other values, but "nulls first" or "nulls last" can be used
to explicitly specify if nulls should be sorted first or last regardless of the
asc/desc.
Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/812
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
I tried to investigate the jenkins issue where we weren't returning any rows.
I setup the cluster on that box manually and noticed there weren't any results
because the store_sales table was empty. Refresh did not fix. This looks like
a data loading issue. Adding this test would make discovering this like this
much easier.
Change-Id: I8ccddd43892b279d506371b9de717629815c6a08
Reviewed-on: http://gerrit.ent.cloudera.com:8080/260
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>