The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.
Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
Float/Doubles are lossy so using those as the default literal type
is problematic.
Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
This change set adds support for dealing with custom date/time formats in Impala. The following date/time tokens are supported:
y – Year
M – Month
d – Day
H – Hour
m – Minute
s – second
S – Fractional second
The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21.
Formatting character groups can appear in any order along with any separators e.g.
yyyy/MM/dd
dd-MMM-yy
(dd)(MM)(yyyy) HH:mm:sss
..etc..
The following features are not supported with this patch:
- Long literal months e.g. MMMM
- Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd
- Lazy formatting
Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001
Reviewed-by: Christopher Channing <cchanning@cloudera.com>
Tested-by: Christopher Channing <cchanning@cloudera.com>
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly
Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
did not catch that they were incorrect
Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
This patch adds support for the following SQL constructs
- Unary + operator
- The ALL keyword, in SELECT ALL and SELECT aggregate_func(ALL *)
- REAL and INTEGER as type synonyms for DOUBLE and INT respectively
- The AS keyword after a table spec. e.g. SELECT * FROM tbl AS t0
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.
As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).
You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.
These new tests can be run using the script at tests/run-tests.sh
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:
./run-benchmark --workloads=hive-benchmark,tpch
We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.
To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.
Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:
./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.
Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3
Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3
This changeset also includes a few other minor tweaks to some of the test
scripts.
Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6