Commit Graph

22 Commits

Author SHA1 Message Date
Taras Bobrovytsky
50e3abdc3d IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
The slot descriptor vectors are not guaranteed to be sorted on the slot
index within a tuple. As a result, TupleDescriptor::LayoutEquals()
sometimes returned a wrong result.

In this patch, we sort the vectors of slot descriptors on the slot index
within the tuple before comparing the vectors.

Testing:
- ran EE tests locally.

Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Reviewed-on: http://gerrit.cloudera.org:8080/6610
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-04 02:04:03 +00:00
Taras Bobrovytsky
75553165ee IMPALA-4883: Union Codegen
For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  194.504us  194.504us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   17.284us   17.284us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s202ms    2s934ms        3           1  115.00 KB       10.00 MB
00:UNION               3   32s514ms   34s926ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3  158.373ms  216.085ms   28.80M      28.80M  489.71 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3  167.002ms  171.738ms   28.80M      28.80M  489.74 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3  125.331ms  145.496ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3  148.478ms  194.311ms   28.80M      28.80M  489.69 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3  143.995ms  162.781ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3  169.731ms  250.201ms   28.80M      28.80M  489.58 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3  164.110ms  254.374ms   28.80M      28.80M  489.61 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3  135.631ms  162.117ms   28.80M      28.80M  489.63 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3  138.736ms  167.778ms   28.80M      28.80M  489.67 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3  202.015ms  248.728ms   28.80M      28.80M  489.68 MB        1.88 GB  tpcds_10_parquet.store_sales

After: 20s177ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  174.617us  174.617us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   16.693us   16.693us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s830ms    3s615ms        3           1  115.00 KB       10.00 MB
00:UNION               3    4s296ms    5s258ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3    1s212ms    1s340ms   28.80M      28.80M  488.82 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3    1s387ms    1s570ms   28.80M      28.80M  489.37 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3    1s224ms    1s347ms   28.80M      28.80M  487.22 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3    1s245ms    1s321ms   28.80M      28.80M  489.25 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3    1s232ms    1s505ms   28.80M      28.80M  484.21 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3    1s348ms    1s518ms   28.80M      28.80M  488.20 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3    1s231ms    1s335ms   28.80M      28.80M  483.58 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3    1s179ms    1s349ms   28.80M      28.80M  482.76 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3    1s121ms    1s154ms   28.80M      28.80M  486.59 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3    1s284ms    1s523ms   28.80M      28.80M  486.70 MB        1.88 GB  tpcds_10_parquet.store_sales

Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Reviewed-on: http://gerrit.cloudera.org:8080/6459
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-21 04:53:09 +00:00
Taras Bobrovytsky
a50c344077 IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  224.721us  224.721us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   24.578us   24.578us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s402ms    3s060ms        3           1  119.00 KB       10.00 MB
00:UNION               3   35s380ms   37s846ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3  184.197ms  219.931ms   28.80M      28.80M  535.03 MB        1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS        3  131.956ms  153.401ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS        3  178.456ms  247.721ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS        3  189.398ms  242.251ms   28.80M      28.80M  535.01 MB        1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS        3  122.786ms  156.528ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS        3  147.467ms  183.391ms   28.80M      28.80M  535.13 MB        1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS        3  147.502ms  186.273ms   28.80M      28.80M  535.01 MB        1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS        3  130.086ms  154.682ms   28.80M      28.80M  535.04 MB        1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS        3  122.701ms  161.056ms   28.80M      28.80M  534.89 MB        1.88 GB  store_sales_unpartitioned
01:SCAN HDFS           3  287.863ms  330.436ms   28.80M      28.80M  534.98 MB        1.88 GB  store_sales_unpartitioned

After:
Total Time: 19s139ms

Summary:
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  166.241us  166.241us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   71.695us   71.695us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s971ms    3s809ms        3           1    3.08 MB       10.00 MB
00:UNION               3  207.956ms  222.846ms  288.01M     288.01M          0              0
|--02:SCAN HDFS        3    1s533ms    1s535ms   28.80M      28.80M  532.28 MB        1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS        3    1s554ms    1s669ms   28.80M      28.80M  525.73 MB        1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS        3    1s568ms    1s716ms   28.80M      28.80M  525.03 MB        1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS        3    1s503ms    1s617ms   28.80M      28.80M  527.43 MB        1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS        3    1s560ms    1s634ms   28.80M      28.80M  528.52 MB        1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS        3    1s489ms    1s643ms   28.80M      28.80M  534.81 MB        1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS        3    1s534ms    1s581ms   28.80M      28.80M  528.10 MB        1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS        3    1s558ms    1s674ms   28.80M      28.80M  526.77 MB        1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS        3    1s504ms    1s692ms   28.80M      28.80M  527.83 MB        1.88 GB  store_sales_unpartitioned
01:SCAN HDFS           3    1s682ms    1s911ms   28.80M      28.80M  526.14 MB        1.88 GB  store_sales_unpartitioned

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Reviewed-on: http://gerrit.cloudera.org:8080/5816
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-21 22:24:01 +00:00
Alex Behm
795c085fa3 IMPALA-4336: Cast exprs after unnesting union operands.
The bug was that we cast the result exprs of operands before
unnesting them. If we unnested an operands, casts were missing
on those unnested operands' result exprs.

The fix is to first unnest operands and then cast result exprs.

Also clarifies the use of resultExprs vs. baseTblResultExprs.

Change-Id: I5e3ab7349df7d67d0d9c2baf4a56342d3f04e76d
Reviewed-on: http://gerrit.cloudera.org:8080/4815
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-11-03 08:59:45 +00:00
Michael Ho
3d7a4477ee IMPALA-2948: Fix a bug in the planner when fast partition key scan is enabled
When the query option OPTIMIZE_PARTITION_KEY_SCANS is true, we may
acquire the partition key values from the metadata and generate a
union node containing constant expressions only. There is a bug in
the planner when generating the union node as it skips evaluating
the constant expressions for unmaterialized slots but union node
expects an entry in the constant expression lists for each slot
in the tuple descriptor even if the slot is not materialized.

This change fixes the problem by inserting a dummy null values
in the constant expression list for unmaterialized slots and lets
the union node filter them out. A test is also added to verify
the fix.

Change-Id: I9ed49dca0101b96bd9b20e6d1e5b1d56f654e911
Reviewed-on: http://gerrit.cloudera.org:8080/2067
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-02-06 05:28:28 +00:00
Skye Wanderman-Milne
b6204dff59 IMPALA-1340: removing implicit casts during expr substitution is not always safe
Union statements were sometimes losing necessary casts during
expression substitution, causing the backend union node to receive
slot refs that did not have the same types as the result tuple. Add a
flag to Expr.Substitute() to preserve the root expr types, which adds
back the casts after substitution.

Currently only the union node sets this flag to true, but there may be
other places that are incorrect.

Change-Id: I1b4d9846860ef9694ff0c089f79654b1746d687d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4777
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-10-06 17:47:37 -07:00
Nong Li
fd35cee887 Reorganize/reduce end to end test time.
This patch does a few things:
1) Move the metadata tests into their own folder under tests/. I think it's useful to
loosely categorize them so it's easier to run a subset of the tests that are most
useful for the changes you are making.

2) Reduce the test vectors for query_tests. We should have identical coverage in
the daily exhaustive runs but the normal runs should be much better. In particular,
deemphasizing scanner tests since that code is more stable now.

3) Misc test cleanup/consolidate python test files/etc.

Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-08-17 12:43:57 -07:00
Lenni Kuff
cd30246f17 Fix flaky group_concat() query tests
The ordering of results returned by the  group_concat() tests were not deterministic. This
fixes the problem by switching the test cases to use a subquery with an order by.

Also fixed a similar problem with the limit and union tests.

Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-08-12 13:38:12 -07:00
Alex Behm
677062be3d Rework planning of unions s.t. a UnionStmt produces a single MergeNode.
This patch changes the planning of a UnionStmt s.t. it always produces a single fragment
with a MergeNode connecting all child fragments as its root.
The data partition of the returned fragment and how the child fragments are merged
depends on the data partitions of the child fragments:
- All child fragments are unpartitioned or partitioned: The returned fragment is
  has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs
  the plan trees of all child fragments.
- Mixed partitioned/unpartitioned child fragments: The returned fragment is
  RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed
  into the MergeNode. All unpartitioned child fragments are connected to the
  MergeNode via a RANDOM exchange, and remain unchanged otherwise.

Also adds support for random partitioned data exchanges.

Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143
2014-06-19 00:56:58 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Alex Behm
121fab8fdf IMPALA-888: Drop union operands with constant conjuncts evaluating to false.
This patch simplifies the complex slot materialization logic for unions by
making the materialization independent of conjuncts assigned to MergeNodes.
When 'pushing down' predicates into union operands, we drop union operands
with constant predicates evaluating to false. Constant predicates that
evaluate to true are simply ignored.

Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-23 18:25:14 -07:00
Matthew Jacobs
8fa8a0f828 IMPALA-843: Do not close reader contexts until plan fragment close
Fixes a crash that occurs in some cases when io buffers are still used and
child nodes are closed early. We close child nodes early when all rows have
been consumed and resources are transfered, but in some cases io buffers are
still in use when a scan node is closed. We avoid this problem by only
closing reader contexts when the entire fragment is closed.

Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859
2014-03-12 14:44:18 -07:00
Alan Choi
468ca0aa5d IMPALA-723 Fix union with aggregate
The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once.
Other "materializeSlots" related calls are idempotent, but this one is not.
That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value
to the array list. We can fix it by making materializeRequiredSlots() idempotent.

Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195
Tested-by: jenkins
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
2014-01-08 10:54:40 -08:00
Matthew Jacobs
00bc971d34 IMPALA-531: Allow arithmetic expressions for LIMIT
Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/829
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:53:49 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alex Behm
937a44f9f8 IMPALA-68: Support Values() statement. 2014-01-08 10:50:31 -08:00
Alex Behm
5db3f2cdf5 IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive. 2014-01-08 10:49:48 -08:00
Alex Behm
1b2e8280d4 Fix NULL issues. 2014-01-08 10:49:32 -08:00
Alex Behm
0821e2f826 IMPALA-66: Support for UNION with constant SELECT clauses. 2014-01-08 10:49:18 -08:00
Lenni Kuff
30dbf59ef2 Final changes to enable Python test infrastructure and tests
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.

This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
2014-01-08 10:46:57 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00