Commit Graph

42 Commits

Author SHA1 Message Date
Matthew Jacobs
c7fa03286b IMPALA-3718: Support subset of functional-query for Kudu
Adds initial support for the functional-query test workload
for Kudu tables.

There are a few issues that make loading the functional
schema difficult on Kudu:
 1) Kudu tables must have one or more columns that together
    constitute a unique primary key.
   a) Primary key columns must currently be the first columns
      in the table definition (KUDU-1271).
   b) Primary key columns cannot be nullable (KUDU-1570).
 2) Kudu tables must be specified with distribution
    parameters.

(1) limits the tables that can be loaded without ugly
workarounds. This patch only includes important tables that
are used for relevant tests, most notably the alltypes*
family. In particular, alltypesagg is important but it does
not have a set of columns that are non-nullable and form a unique
primary key. As a result, that table is created in Kudu with
a different name and an additional BIGINT column for a PK
that is a unique index and is generated at data loading time
using the ROW_NUMBER analytic function. A view is then
wrapped around the underlying table that matches the
alltypesagg schema exactly. When KUDU-1570 is resolved, this
can be simplified.

(2) requires some additional considerations and custom
syntax. As a result, the DDL to create the tables is
explicitly specified in CREATE_KUDU sections in the
functional_schema_constraints.csv, and an additional
DEPENDENT_LOAD_KUDU section was added to specify custom data
loading DML that differs from the existing DEPENDENT_LOAD.

TODO: IMPALA-4005: generate_schema_statements.py needs refactoring

Tests that are not relevant or not yet supported have been
marked with xfail and a skip where appropriate.

TODO: Support remaining functional tables/tests when possible.

Change-Id: Iada88e078352e4462745d9a9a1b5111260d21acc
Reviewed-on: http://gerrit.cloudera.org:8080/4175
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-09-14 22:11:04 +00:00
Michael Ho
f129dfd202 IMPALA-3018: Don't return NULL on zero length allocations.
FunctionContext::Allocate() and FunctionContext::AllocateLocal()
used to return NULL for zero length allocations. This makes
it hard to distinguish between allocation failures and zero
length allocations. Such confusion may lead to DCHECK failure
in the macro RETURN_IF_NULL() in debug builds or access to NULL
pointers in non-debug builds.

This change fixes the problem above by returning NULL only if
there is allocation failure. Zero-length allocations will always
return a dummy non-NULL pointer.

Change-Id: Id8c3211f4d9417f44b8018ccc58ae182682693da
Reviewed-on: http://gerrit.cloudera.org:8080/3601
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-07-14 19:04:45 +00:00
Michael Ho
f9232c98b0 IMPALA-3018: Fix AllocBuffer() and CopyStringVal() to handle empty strings.
AllocBuffer() and CopyStringVal() are two helper functions used by
various UDAs to allocate buffers for StringVal during their Init()
and Update() functions. Previously, these functions assumed that
the buffer length is always greater than 0. That turned out to be
an invalid assumption. This change removes this assumption and
handles zero-length StringVal by initializing its 'ptr' to NULL and
'len' to 0. A new test is also added to exercise this case.

Change-Id: Ia1e4140376c65ca3c734c40ecc3cce15b8bf2d3f
Reviewed-on: http://gerrit.cloudera.org:8080/2211
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-02-18 01:25:10 -08:00
aacalfa
57dd4d1502 IMPALA-1309: Add support for distinct in group_concat function.
Change-Id: I2790f1d2a7bfd0ecc7ef66cc5d91dafe3414e111
Reviewed-on: http://gerrit.cloudera.org:8080/892
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-09-23 09:42:17 +00:00
Alex Behm
ae9fd52c51 IMPALA-2089: Retain eq predicates bound by grouping slots with complex grouping exprs.
The bug: When enforcing slot equivalences at an aggregation node, we used to
incorrectly assume that equivalences among grouping slots must have already been
enforced below the aggregation (e.g., in a scan). This assumption is correct if the
grouping slots are produced by simple SlotRef grouping exprs, because then there is
certainly a value transfer between the grouping slot and another slot below the
aggregation. However, for grouping slots with complex grouping exprs this assumption
is not correct, and as a result, we would incorrectly remove eq predicates bound by
gropuing slots with complex grouping exprs because we assumed they were redundant.

Ths fix is to enforce slot equivalences among grouping slots with complex grouping
exprs as usual, and not assume that they have already been enforced below the agg.

Change-Id: Idcd44acccb9326a35c9121025dc88c2c70c7c7c7
Reviewed-on: http://gerrit.cloudera.org:8080/656
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-08-23 04:43:37 +00:00
Tim Armstrong
5990b43fe2 IMPALA-1898: Explicit aliases + ordinals analysis bug
Analysis errors occurred with select queries that combined ordinals
in the group by/order by clauses with select list aliases that
had the same name as a column in one of the underlying tables.

The root cause was a double substitution: e.g. the ordinal 1 in
a GROUP BY clause was replaced with the corresponding select list expression,
then a reference to column 'x' in an underlying table was replaced erroneously
with the select list expression with alias 'x'

Change-Id: I0f298290c58f18239e1ff83f0388d037c311f5fb
Reviewed-on: http://gerrit.cloudera.org:8080/542
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2015-07-22 21:23:36 +00:00
Dimitris Tsirogiannis
57132bf021 IMPALA-1535: Partition pruning with NULL
This commit fixes the issue where partition pruning returns wrong
results when a binary predicate contains a NULL literal.

Change-Id: I24c647184dcef49d12d6ff422e28667777df7784
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5443
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5549
2014-12-10 17:33:11 -08:00
Matthew Jacobs
e004307bbe IMPALA-1419, IMPALA-1542: Fix NullLiteral to reset its type in resetAnalysisState
Queries with arithmetic exprs containing a NullLiteral child failed (IMPALA-1419)
or crashed (IMPALA-1542) because re-analysis of these exprs was incorrect.

Change-Id: Ice3461aed53863123bcf8f38af123d89ad3b7d6a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5429
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-11-26 14:29:48 -08:00
Henry Robinson
080299730c IMPALA-1298: Add var_{pop,samp} as aliases for variance_{pop,samp}
Change-Id: I5880ad7ebf0775704ee7fa08685928224e316458
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4656
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-10-06 15:12:25 -07:00
Matthew Jacobs
86b9f8282f Move aggregation tests on decimal tables to decimal.test
Fixes test failures in exhaustive mode when aggregation tests
are run on table formats that do not support decimal.

Change-Id: Ic5dfb398575770cf318ffcc0ce3a20737bb2f5cd
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4636
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-10-06 15:11:58 -07:00
Matthew Jacobs
928907905b Fix appx_median to return correct result type
Change-Id: Ifc54e1069e2f7a46242229d710943e921633a920
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4625
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
2014-10-06 15:11:27 -07:00
Ippokratis Pandis
5c4486a2b2 Proper handling of NULL tuples by buffered-tuple-stream.
Adding a bitstring at the head of each block in the TupleStream that indicates which
tuples of the appended rows in the block are NULLs. When reading the stream, through
GetNext() or GetTupleRow() calls, the NULL tuples are stitched back to their correct
position.

This fixes crashes in PHJ of bushy plans with NULLs on the build side(s) as well as
similar crashes in PAGG and the analytic node.
For example, it fixes IMPALA-1204, IMPALA-1223, and IMPALA-1249.
Also, adds regression tests for IMPALA-1175, IMPALA-1204, IMPALA-1223, IMPALA-1249
and IMPALA-1306.

Change-Id: I30ad0dbd4dfeabcda8fae444d1c6ec9291f38398
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4596
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-10-06 15:10:58 -07:00
Skye Wanderman-Milne
f2b01997df Allow UDA intermediates to use CHAR. Update stddev/var to use it.
Change-Id: I791c6389978f4994cba33f01273e94343a163916
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4368
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-25 19:37:02 -07:00
Skye Wanderman-Milne
7f87e7e5b5 IMPALA-1111: Fix alignment in ReservoirSample aggregate functions
Change-Id: Iac7aa96eb19079715a7e8152a5edfeafa0d50bc7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4478
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-09-25 19:37:02 -07:00
Alex Behm
e89ba550c9 Support aggregate functions with different intermediate and output types.
As a proof-of-concept, this patch implements avg() with a STRING intermediate
type, and changes variance() to output a DOUBLE.

I tested this change on single-node and distributed plans, with the
partitioned as well as the old aggregation node.

This patch leaves several things for follow-on changes:
- plumb through CHAR as an intermediate type
- modify other builtin aggregtes to use appropriate output/intermediate types
- allow analytic functions to have different output/intermediate types

Change-Id: I8d3396201cb370f44660ab4f7fe10216129abd09
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4016
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4079
2014-08-28 21:33:34 -07:00
Lenni Kuff
cd30246f17 Fix flaky group_concat() query tests
The ordering of results returned by the  group_concat() tests were not deterministic. This
fixes the problem by switching the test cases to use a subquery with an order by.

Also fixed a similar problem with the limit and union tests.

Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-08-12 13:38:12 -07:00
Dan Hecht
285aeda16e IMPALA-1110: group_concat agg function does not work with optional separator.
Rather than omit the first separator in each intermediate result,
always include the separator, but also remember the length of the
first separator.  Then, during finalize, remove whichever separator
string ends up at the beginning of the final merged result.

Change-Id: I6de7d1cda1a43b8de7d03c6798ec9667ffa457f8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3669
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c0d7cedb79fe557e22912afc716303b24a9dad0d)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3690
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
2014-07-31 18:15:16 -07:00
Matthew Jacobs
8258c53478 Disable histogram UDA test for decimal vals due to IMPALA-1111
Change-Id: I21391e671896c9ebe52fd45accc2d290267ee0ac
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3641
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5a7ede02ffda4bdbbde4bc56184639fdee0a9857)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3652
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-07-30 02:53:35 -07:00
Matthew Jacobs
b83aa4984b Add compute histograms aggregate function
Adds an aggregate function to compute equi-depth histograms. The UDA
creates a sample of the column values using weighted reservoir sampling
and computes the histogram from the sorted sample.

TODO:
* Extract highly frequent values into separate buckets (i.e. 'compressed
  histogram').
* Expose separate finalize fn to produce samples and histogram data for stats

Change-Id: I314ce5fb8c73b935c4d61ea5bbd6816c59b3b41e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3552
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c5c475712f88244e15160befaf4e99d6e165a148)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3608
2014-07-25 00:21:10 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Victor Bittorf
6f31dc7f8a Adding STDDEV builtin.
Change-Id: I79e5aee1e9e879aa2d09078ab45bc149675e1d4a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2341
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a42c375d933c0b7ffe7c9b6702777679492d7ad6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2464
2014-05-06 13:06:26 -07:00
Nong Li
457055f8f4 IMPALA-892: Fix subexpr for IR generated from compound predicate.
Change-Id: I638533827e97f3486eb75a571b18f9e8d1cd4aed
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1973
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-18 16:49:34 -07:00
Nong Li
5022aa08fb IMPALA-869: Fix result initialization for MIN().
Change-Id: I50eceb04c0eb1c9eedb9c963cb75d2fc5aeb4825
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1847
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-11 17:26:31 -07:00
Nong Li
f6de8d9e30 IMPALA-765: Fix subexpr elimination codegen optimization.
The previous implementation did not properly handle replacing the is_null
return argument from expr calls.

Change-Id: I96cd0dfca8876b4f914b0cbc4eb459ea3dcdf230
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1795
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-10 15:20:53 -07:00
Alex Behm
a615ebc549 IMPALA-822,IMP-1271: Binding predicates on an aggregation now properly trigger slot materialization.
The bug was that the number of materialized agg-tuple slots did not correspond to the number
of materialized agg functions, due to binding predicates against an AggNode causing slot
materialization after SelectStmt.materializeRequiredSlots().
This patch fixes the issue by taking binding predicates (bound to a slot in an agg tuple)
into consideration in SelectStmt.materializeRequiredSlots().
I added a new sanity check in AggregationNode.toThrift() surfaced another issue with slot
materialization that is also fixed in this patch. The ordering exprs must be marked before
the agg exprs in SelectStmt.materializeRequiredSlots() because the odering exprs may contain
agg exprs that are only referenced inside the ORDER BY clause.

Change-Id: I1bdc0466f583907bed625ce6608938e59faee83f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1639
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1818
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-08 00:25:26 -08:00
Alex Behm
cb8150e8ee IMPALA-817: Check equality of function name in Function.equals().
Change-Id: Ib9b4ee3a21f90fdb0d7ebccd89462dc67040bd1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1594
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1611
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-02-19 17:13:51 -08:00
Nong Li
0d2919fe7f Refactor scalar and aggregate function analysis and execution.
This patch cleans up analysis and execution of scalar and aggregate functions
so that there is no difference between how builtins and user functions are
handled. The only difference is that the catalog is populated with the builtins
all the time.

The BE always gets a TFunction object and just executes it (builtins will have
an empty hdfs file location).

This removes the opcode registry and all of the functionality is subsumed by
the catalog, most of which was already duplicated there anyway.

This also introduces the concept of a system database; databases that the
user cannot modify and is populated automatically on startup.

Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577
2014-02-18 18:40:08 -08:00
Nong Li
15db34e356 AggregationNode refactoring
This patch redoes how the aggregation node is implemented. The functionality is
now split between aggregation-node, agg-expr and aggregate-functions. This is a working
progress (there's still a lot of debug stuff I added that needs to be cleaned up) but
it does pass the tests.

Aggregation-node is now very simple and now only deals with the grouping part.
Aggregate-expr serves as the glue between the agg node and the aggregate functions.
The aggregation functions are implemented with the UDA interface. I've reimplemented
our existing aggregate functions with this setup. For true UDAs, the binaries would be
loaded in aggregate-expr.

This also includes some preliminary changes in the FE. We now need to annotate each
AggNode as executing the update vs. merge phase (root aggs execute update, others
execute merge) and if it needs a finalize step (only the root does). This is more
general than our builtins which are too simple to need this structure.

There is a big TODO here to allow the intermediate types between agg nodes to change.
For example, in distinct estimate, the input type is the column type and the output type
is a bigint. We'd like the intermediate type to be CHAR(256). This is different since
currently, the intermediate type and output type have always been the same. We've hacked
around this by having both the intermediate and output type be TYPE_STRING. I've left
this for another patch (changing the BE to support this is trivial).
For aggregates that result in strings, we used to store some additional stuff past the
end of the tuple. The layout was:
<tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc

The rationale for this is that we want to reuse the buffer for min/max and grow the buffer
more quickly for group_concat. This breaks down the abstraction between agg-expr and
agg-node and is not something UDAs can use in general. Rather than try to hack around
this, I think the proper solution is to the intermediate type not be StringValue and
to contain the buffer length itself.

This patch also resurrects the distinct estimate code. The distinct estimate functions
exercise all of the code paths.

Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346
Reviewed-on: http://gerrit.ent.cloudera.com:8080/564
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
Aaron Davidson
00275ce3a9 (IMPALA-422) Add string concatenation function
Implements a group_concat() function which concatenates all the values in a group together.

The format is group_concat(str_col, [separator]). The default separator is ', '. NULLs
are ignored.

Change-Id: If152df6f528401117dba81d66ef691bfb548cc7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/117
Reviewed-by: Aaron Davidson <aaron.davidson@cloudera.com>
Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>
2014-01-08 10:52:21 -08:00
Alex Behm
c9040aee22 IMPALA-111: COUNT(DISTINCT col) returns wrong results -- does not ignore NULLs. 2014-01-08 10:50:09 -08:00
Alex Behm
1b2e8280d4 Fix NULL issues. 2014-01-08 10:49:32 -08:00
Henry Robinson
8d87972695 Improve parser coverage
This patch adds support for the following SQL constructs

  - Unary + operator
  - The ALL keyword, in SELECT ALL and SELECT aggregate_func(ALL *)
  - REAL and INTEGER as type synonyms for DOUBLE and INT respectively
  - The AS keyword after a table spec. e.g. SELECT * FROM tbl AS t0
2014-01-08 10:48:54 -08:00
Alexander Behm
39e443407b IMPALA-136: GROUP BY float/double. 2014-01-08 10:48:43 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
30dbf59ef2 Final changes to enable Python test infrastructure and tests
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.

This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
2014-01-08 10:46:57 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Nong Li
b22b565a92 Fix codegen for min/max of bool col. 2014-01-08 10:46:43 -08:00
Marcel Kornacker
ea050a43ad Switching over backend runtime structures to new planner.
Added container-util.h
2014-01-08 10:46:20 -08:00
Henry Robinson
3519701529 Support backtick quoting for identifiers 2014-01-08 10:46:00 -08:00
Alan Choi
88101bc90e This patch implements the probabilistic counting algorithm as an aggregate
"distinctpc" and "distinctpcsa".

We've gathered statistics on an internal dataset (all columns) which is
part of our regression data. It's roughly 400mb, ~100 columns,
int/bigint/string type.

On Hive, it took roughly 64sec.
On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't),
 we can achieve 24~26sec.

Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3
2014-01-08 10:44:27 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00