impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 12:01:11 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	50e3abdc3d	IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals() The slot descriptor vectors are not guaranteed to be sorted on the slot index within a tuple. As a result, TupleDescriptor::LayoutEquals() sometimes returned a wrong result. In this patch, we sort the vectors of slot descriptors on the slot index within the tuple before comparing the vectors. Testing: - ran EE tests locally. Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e Reviewed-on: http://gerrit.cloudera.org:8080/6610 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-04 02:04:03 +00:00
Taras Bobrovytsky	75553165ee	IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 194.504us 194.504us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 17.284us 17.284us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s202ms 2s934ms 3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M 3.08 MB 0 \|--02:SCAN HDFS 3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales \|--03:SCAN HDFS 3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales \|--04:SCAN HDFS 3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales \|--05:SCAN HDFS 3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales \|--06:SCAN HDFS 3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales \|--07:SCAN HDFS 3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales \|--08:SCAN HDFS 3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales \|--09:SCAN HDFS 3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales \|--10:SCAN HDFS 3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s177ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 174.617us 174.617us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 16.693us 16.693us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s830ms 3s615ms 3 1 115.00 KB 10.00 MB 00:UNION 3 4s296ms 5s258ms 288.01M 288.01M 3.08 MB 0 \|--02:SCAN HDFS 3 1s212ms 1s340ms 28.80M 28.80M 488.82 MB 1.88 GB tpcds_10_parquet.store_sales \|--03:SCAN HDFS 3 1s387ms 1s570ms 28.80M 28.80M 489.37 MB 1.88 GB tpcds_10_parquet.store_sales \|--04:SCAN HDFS 3 1s224ms 1s347ms 28.80M 28.80M 487.22 MB 1.88 GB tpcds_10_parquet.store_sales \|--05:SCAN HDFS 3 1s245ms 1s321ms 28.80M 28.80M 489.25 MB 1.88 GB tpcds_10_parquet.store_sales \|--06:SCAN HDFS 3 1s232ms 1s505ms 28.80M 28.80M 484.21 MB 1.88 GB tpcds_10_parquet.store_sales \|--07:SCAN HDFS 3 1s348ms 1s518ms 28.80M 28.80M 488.20 MB 1.88 GB tpcds_10_parquet.store_sales \|--08:SCAN HDFS 3 1s231ms 1s335ms 28.80M 28.80M 483.58 MB 1.88 GB tpcds_10_parquet.store_sales \|--09:SCAN HDFS 3 1s179ms 1s349ms 28.80M 28.80M 482.76 MB 1.88 GB tpcds_10_parquet.store_sales \|--10:SCAN HDFS 3 1s121ms 1s154ms 28.80M 28.80M 486.59 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 1s284ms 1s523ms 28.80M 28.80M 486.70 MB 1.88 GB tpcds_10_parquet.store_sales Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Reviewed-on: http://gerrit.cloudera.org:8080/6459 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-21 04:53:09 +00:00
Taras Bobrovytsky	a50c344077	IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0 \|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned \|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned \|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned \|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned \|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned \|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned \|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned \|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned \|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------ 13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0 \|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned \|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned \|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned \|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned \|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned \|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned \|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned \|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned \|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Reviewed-on: http://gerrit.cloudera.org:8080/5816 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-21 22:24:01 +00:00
Alex Behm	795c085fa3	IMPALA-4336: Cast exprs after unnesting union operands. The bug was that we cast the result exprs of operands before unnesting them. If we unnested an operands, casts were missing on those unnested operands' result exprs. The fix is to first unnest operands and then cast result exprs. Also clarifies the use of resultExprs vs. baseTblResultExprs. Change-Id: I5e3ab7349df7d67d0d9c2baf4a56342d3f04e76d Reviewed-on: http://gerrit.cloudera.org:8080/4815 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-03 08:59:45 +00:00
Michael Ho	3d7a4477ee	IMPALA-2948: Fix a bug in the planner when fast partition key scan is enabled When the query option OPTIMIZE_PARTITION_KEY_SCANS is true, we may acquire the partition key values from the metadata and generate a union node containing constant expressions only. There is a bug in the planner when generating the union node as it skips evaluating the constant expressions for unmaterialized slots but union node expects an entry in the constant expression lists for each slot in the tuple descriptor even if the slot is not materialized. This change fixes the problem by inserting a dummy null values in the constant expression list for unmaterialized slots and lets the union node filter them out. A test is also added to verify the fix. Change-Id: I9ed49dca0101b96bd9b20e6d1e5b1d56f654e911 Reviewed-on: http://gerrit.cloudera.org:8080/2067 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-02-06 05:28:28 +00:00
Skye Wanderman-Milne	b6204dff59	IMPALA-1340: removing implicit casts during expr substitution is not always safe Union statements were sometimes losing necessary casts during expression substitution, causing the backend union node to receive slot refs that did not have the same types as the result tuple. Add a flag to Expr.Substitute() to preserve the root expr types, which adds back the casts after substitution. Currently only the union node sets this flag to true, but there may be other places that are incorrect. Change-Id: I1b4d9846860ef9694ff0c089f79654b1746d687d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4777 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-10-06 17:47:37 -07:00
Nong Li	fd35cee887	Reorganize/reduce end to end test time. This patch does a few things: 1) Move the metadata tests into their own folder under tests/. I think it's useful to loosely categorize them so it's easier to run a subset of the tests that are most useful for the changes you are making. 2) Reduce the test vectors for query_tests. We should have identical coverage in the daily exhaustive runs but the normal runs should be much better. In particular, deemphasizing scanner tests since that code is more stable now. 3) Misc test cleanup/consolidate python test files/etc. Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-08-17 12:43:57 -07:00
Lenni Kuff	cd30246f17	Fix flaky group_concat() query tests The ordering of results returned by the group_concat() tests were not deterministic. This fixes the problem by switching the test cases to use a subquery with an order by. Also fixed a similar problem with the limit and union tests. Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-08-12 13:38:12 -07:00
Alex Behm	677062be3d	Rework planning of unions s.t. a UnionStmt produces a single MergeNode. This patch changes the planning of a UnionStmt s.t. it always produces a single fragment with a MergeNode connecting all child fragments as its root. The data partition of the returned fragment and how the child fragments are merged depends on the data partitions of the child fragments: - All child fragments are unpartitioned or partitioned: The returned fragment is has a UNPARTITIONED or RANDOM data partition, respectively. The MergeNode absorbs the plan trees of all child fragments. - Mixed partitioned/unpartitioned child fragments: The returned fragment is RANDOM partitioned. The plan trees of all partitioned child fragments are absorbed into the MergeNode. All unpartitioned child fragments are connected to the MergeNode via a RANDOM exchange, and remain unchanged otherwise. Also adds support for random partitioned data exchanges. Change-Id: I82b2d12c104d98c4e7133234653ee1b67658ef7a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2876 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3143	2014-06-19 00:56:58 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Alex Behm	121fab8fdf	IMPALA-888: Drop union operands with constant conjuncts evaluating to false. This patch simplifies the complex slot materialization logic for unions by making the materialization independent of conjuncts assigned to MergeNodes. When 'pushing down' predicates into union operands, we drop union operands with constant predicates evaluating to false. Constant predicates that evaluate to true are simply ignored. Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-04-23 18:25:14 -07:00
Matthew Jacobs	8fa8a0f828	IMPALA-843: Do not close reader contexts until plan fragment close Fixes a crash that occurs in some cases when io buffers are still used and child nodes are closed early. We close child nodes early when all rows have been consumed and resources are transfered, but in some cases io buffers are still in use when a scan node is closed. We avoid this problem by only closing reader contexts when the entire fragment is closed. Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859	2014-03-12 14:44:18 -07:00
Alan Choi	468ca0aa5d	IMPALA-723 Fix union with aggregate The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once. Other "materializeSlots" related calls are idempotent, but this one is not. That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value to the array list. We can fix it by making materializeRequiredSlots() idempotent. Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195 Tested-by: jenkins Reviewed-by: Alan Choi <alan@cloudera.com> Tested-by: Alan Choi <alan@cloudera.com>	2014-01-08 10:54:40 -08:00
Matthew Jacobs	00bc971d34	IMPALA-531: Allow arithmetic expressions for LIMIT Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/829 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:49 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Alex Behm	937a44f9f8	IMPALA-68: Support Values() statement.	2014-01-08 10:50:31 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
Alex Behm	1b2e8280d4	Fix NULL issues.	2014-01-08 10:49:32 -08:00
Alex Behm	0821e2f826	IMPALA-66: Support for UNION with constant SELECT clauses.	2014-01-08 10:49:18 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

22 Commits