Commit Graph

20 Commits

Author SHA1 Message Date
Taras Bobrovytsky
7faaa65996 Added order by query tests
- Added static order by tests to test_queries.py and QueryTest/sort.test
- test_order_by.py also contains tests with static queries that are run with
  multiple memory limits.
- Added stress, scratch disk and failpoints tests
- Incorporated Srinath's change that copied all order by with limit tests into
  the top-n.test file

Extra time required:

Serial:
scratch disk: 42 seconds
test queries sort : 77 seconds
test sort: 56 seconds
sort stress: 142 seconds
TOTAL: 5 min 17 seconds

Parallel(8 threads):
scratch disk: 40 seconds
test queries sort: 42 seconds
test sort: 49 seconds
sort stress: 93 seconds
TOTAL: 3 min 44 sec

Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205
2014-06-20 13:35:10 -07:00
Matthew Jacobs
19f34d9187 test_data_source_tables should only run for 'text/none'
Change-Id: I784fb4305f8cff92c2582b0a7008f836c7aa9fa4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2504
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9f3621ec5d270c60258e93e8a2a596329c31f4e6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2508
2014-05-09 19:32:18 -07:00
Matthew Jacobs
0c533bb152 External Data Source: Backend changes
Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-09 02:24:41 -07:00
Skye Wanderman-Milne
e60bf29a96 IMPALA-13: Use SSE string functions that take an explicit length
This patch modifies DelimitedTextParser and StringValue to work with
data containing null characters by using SSE instructions that take a
length, rather than expecting null-terminated strings. It also adds
some other minor changes to correctly handle data with nulls and to
faciliate testing. I checked the execution time of a count(*) and a
select(*) limit 1 query locally, and saw no difference for either text
or sequence files.

Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181
2014-04-11 11:16:24 -07:00
Henry Robinson
415540d789 IMPALA-901: Fix grouping with NULLs when codegen is disabled
The standard implementation of HashTable::Equals() did not correctly
check the NULL bit when the argument row did not evaluate to NULL for a
given probe expr. In the rare circumstance that this gave rise to a
false positive (more on that below), two rows with different grouping
values would be considered equal, and one would be excluded from the
final aggregation output.

HashTable::EvalRow() fills an expression value buffer with the values of
either probe or build exprs evaluated for the argument row. These cached
values are used to determine row equality in Equals(). In order to avoid
a lot of false collisions, an 'unlikely' value is written to that buffer
for NULL values, chosen to be HashUtil::FNV_SEED. So without correct
NULL-bit checking in Equals(), two single-slot rows are considered to be
equal if one of them has NULL for its slot, and the other has a value
equal to HashUtil::FNV_SEED truncated to the size of the slot.

For tinyint columns, this value is -59. As it happens, our random
generator happened to create a table with one tinyint column and which
contained NULL and -59 as values. In order to trigger this bug, the rows
must also have been written to disk in order such that the scanners
returned -59 *first*, and then NULL to the aggregation node; the bug is
not symmetric and works in the opposite case.

Change-Id: I17d43eaeee62b2ac01b67dd599bc4346b012a074
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2130
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e8098254280a9d5ead0b607263ca6728a3222a7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2161
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-07 17:30:52 -07:00
Lenni Kuff
0ce83818a6 IMPALA-675: Add function to get current default database
This uses the same syntax as postgres: current_database()

Change-Id: Ic6ca8ce1fe8c10a496800c45c58b0df4d214b51c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1274
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-13 22:27:22 -08:00
Nong Li
15db34e356 AggregationNode refactoring
This patch redoes how the aggregation node is implemented. The functionality is
now split between aggregation-node, agg-expr and aggregate-functions. This is a working
progress (there's still a lot of debug stuff I added that needs to be cleaned up) but
it does pass the tests.

Aggregation-node is now very simple and now only deals with the grouping part.
Aggregate-expr serves as the glue between the agg node and the aggregate functions.
The aggregation functions are implemented with the UDA interface. I've reimplemented
our existing aggregate functions with this setup. For true UDAs, the binaries would be
loaded in aggregate-expr.

This also includes some preliminary changes in the FE. We now need to annotate each
AggNode as executing the update vs. merge phase (root aggs execute update, others
execute merge) and if it needs a finalize step (only the root does). This is more
general than our builtins which are too simple to need this structure.

There is a big TODO here to allow the intermediate types between agg nodes to change.
For example, in distinct estimate, the input type is the column type and the output type
is a bigint. We'd like the intermediate type to be CHAR(256). This is different since
currently, the intermediate type and output type have always been the same. We've hacked
around this by having both the intermediate and output type be TYPE_STRING. I've left
this for another patch (changing the BE to support this is trivial).
For aggregates that result in strings, we used to store some additional stuff past the
end of the tuple. The layout was:
<tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc

The rationale for this is that we want to reuse the buffer for min/max and grow the buffer
more quickly for group_concat. This breaks down the abstraction between agg-expr and
agg-node and is not something UDAs can use in general. Rather than try to hack around
this, I think the proper solution is to the intermediate type not be StringValue and
to contain the buffer length itself.

This patch also resurrects the distinct estimate code. The distinct estimate functions
exercise all of the code paths.

Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346
Reviewed-on: http://gerrit.ent.cloudera.com:8080/564
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Alex Behm
045038e479 IMPALA-374: Added WITH clause without recursion. 2014-01-08 10:51:00 -08:00
Alex Behm
937a44f9f8 IMPALA-68: Support Values() statement. 2014-01-08 10:50:31 -08:00
Lenni Kuff
c74b7e41dd Enable insert tests to run against parquet 2014-01-08 10:49:47 -08:00
Nong Li
1f6481382e Fix parquet test setup. 2014-01-08 10:49:41 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
Marcel Kornacker
c02d25baa8 IMPALA-20: Limit clause in inline view not handled correctly by planner
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
  by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
  by the merging TopN node
2014-01-08 10:48:29 -08:00
Lenni Kuff
deea7a86b9 Remove Trevni from mixed format table and fix data loading bug 2014-01-08 10:47:10 -08:00
Lenni Kuff
12d18631e3 Test enhancements: dynamic table format data loading, per-workload exploration stategies 2014-01-08 10:47:07 -08:00
Lenni Kuff
e20720f5d3 Disable loading Trevni for load/select statements that are not union compatible 2014-01-08 10:46:58 -08:00
Lenni Kuff
30dbf59ef2 Final changes to enable Python test infrastructure and tests
With this change the Python tests will now be called as part of buildall and
the corresponding Java tests have been disabled. The new tests can also be
invoked calling ./tests/run-tests.sh directly.

This includes a fix from Nong that caused wrong results for limit on non-io
manager formats.
2014-01-08 10:46:57 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00