Commit Graph

34 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
9d6586cdb8 Addendum to IMPALA-1755 patch
This patch introduces SetLookup functionality for timestamp and
decimal types, as well addressing remaining code review comments.

Change-Id: Ied40d2d55adbdea891ff2ab97b30f0d3986645f9
Reviewed-on: http://gerrit.cloudera.org:8080/245
Tested-by: Internal Jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2015-03-20 14:37:23 -07:00
Skye Wanderman-Milne
5118c55a0a IMPALA-1810: IN predicate was not comparing DecimalVals correctly
The IN predicate wasn't using the decimal type when comparing decimal
values. I benchmarked this on a modified version of TPCDS-Q8 (i.e. a
query with a huge decimal IN predicate) and there is a ~5% performance
degradation with codegen enabled (surprisingly, there appears to be a
slight performance gain with codegen disabled). We should be able to
remove this penalty when we add constant injection via codegen.

Change-Id: Ie1296fd50c68d06a343701442da49fe8d3cd16dd
Reviewed-on: http://gerrit.cloudera.org:8080/230
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
2015-03-20 14:37:18 -07:00
casey
dbc504fad1 IMPALA-1579: UNIX_TIMESTAMP() should return BIGINTs instead of INTs
This should fix the last y2k38 problem. Previously calling
unix_timestamp() with a input of '2038-01-19 03:14:08' or later would
return a negative value due to a 32 bit int overflow. This patch
switches from 32 to 64 bit ints.

Change-Id: Ic9180887d6c828f6ecd25435be86fd0bd52d3f0d
Reviewed-on: http://gerrit.cloudera.org:8080/61
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-02-16 00:59:34 +00:00
Alex Behm
f696861c5c Throw error on unrecognized test sections.
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.

Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.

Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.

Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-02 18:08:09 -08:00
Skye Wanderman-Milne
2bfb69523f IMPALA-1508: don't JIT TimestampFunctions::DateAddSub
For some reason, the try/catch added to fix IMPALA-1493 doesn't work
when we JIT the function. Fixing this in the JIT'd code will take some
time, so for now just don't JIT the function.

Change-Id: I7b2801027db0a9deb19b477c1a4ca0bdad77a825
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5383
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-11-23 21:36:03 -08:00
Nong Li
e2d7fb6402 Some test case cleanup.
Change-Id: Ic29b7c1f5fd714a1e2cc41bf0e55c0d11c782862
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4791
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5090
Reviewed-by: Nong Li <nong@cloudera.com>
2014-11-03 22:33:08 -08:00
Martin Grund
6e0c1c26c9 IMPALA-1424: abs() function retains input type
This patch modifies the abs() built-in function so that it
retains the type of the input argument for the return type
in the same way as Postgres does.

Change-Id: I1750237b85bedbc3ce9d52330ac4d458b0aada3a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4980
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 424b359ab0a4f621f2865844c3293f2c80e0867f)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4996
2014-10-28 08:07:21 -07:00
casey
9a72c28832 Add DECODE builtin
This adds DECODE functionality into the existing CaseExpr class. There
will be no separate backend impementation for DECODE, it will be sent to
the backend as a CASE expr so the existing codegen function can be used.

Because Oracle does cast checking during execution and Impala cast
checking during analysis, some uses of DECODE that are valid in Oracle
are invalid in Impala.

Ex:

  SELECT DECODE(foo, bar, int_col, baz, string_col_containing_only_ints)
  FROM ...

  would be run on Oracle. If string_col_containing_only_ints actually
  contained non-INTs, an error would be thrown during execution and no
  results would be returned. In Impala an error is thrown during analysis.
  If a CAST was added to the STRING column, a cast failure would result in
  NULL.

Change-Id: Ia08cc2389abb6f843bba117e7091c659ad25ff41
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4334
Tested-by: jenkins
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
2014-09-26 12:26:46 -07:00
Skye Wanderman-Milne
559b83d3d0 Expr refactoring
This patch changes the interface for evaluating expressions, in order
to allow for thread-safe expression evaluations and easier
codegen. Thread safety is achieved via the ExprContext class, a
light-weight container for expression tree evaluation state. Codegen
is easier because more expressions can be cross-compiled to IR.

See expr.h and expr-context.h for an overview of the API
changes. See sort-exec-exprs.cc for a simple example of the new
interface and hdfs-scanner.cc for a more complicated example.

This patch has not been completely code reviewed and may need further
cleanup/stylistic work, as well as additional perf work.

Change-Id: I3e3baf14ebffd2687533d0cc01a6fb8ac4def849
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3459
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:44 -07:00
Skye Wanderman-Milne
7a0cc27fd1 Convert math functions to the UDF interface.
Also adds FunctionContext::GetNumArgs() method to the public UDF API.

Change-Id: I76e21814e423f075a0a22b4e924c1d3ec26daba7
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3410
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-08-17 12:44:32 -07:00
Paden Tomasello
67d23c2d4b Modified Case expression tests in exprs.test
Change-Id: I65cee2e14291db8bf14a428715b08dac475b863a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3485
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3601
2014-07-24 12:34:02 -07:00
Paden Tomasello
3d173e65d2 Adding Codegen function and tests for CASE expressions.
Change-Id: Ib52b3e3f12b35e2c0a60ef94501c20ef83abdfe5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3187
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3498
2014-07-18 12:03:58 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Victor Bittorf
c13a1d080e IMPALA-938: Fix implicit casting in timestamp arithmetic exprs.
Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671
2014-05-23 14:11:35 -07:00
Victor Bittorf
0bb66ef327 Adding aliases ADD_MONTHS and SUB_MONTHS
This is a request for consistency with oracle.

Change-Id: I463a66694a068cd773532d8f6f853a4b089b918a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2400
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1f0b643789596f96c54580b8c5262fada4dfc958)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2502
2014-05-09 17:35:29 -07:00
Victor Bittorf
46151dc7dd Adding EXTRACT builtin.
Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370
2014-04-25 15:38:51 -07:00
Matthew Jacobs
967346b0c4 IMPALA-630: Add fn to get the PID of the impalad to which the user is connected
Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:54:37 -08:00
Chris Channing
7e98708d7d IMPALA-114: Add support for custom date/time formats
This change set adds support for dealing with custom date/time formats in Impala.  The following date/time tokens are supported:

y – Year
M – Month
d – Day
H – Hour
m – Minute
s – second
S – Fractional second

The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21.

Formatting character groups can appear in any order along with any separators e.g.

yyyy/MM/dd
dd-MMM-yy
(dd)(MM)(yyyy)  HH:mm:sss
..etc..

The following features are not supported with this patch:

    - Long literal months e.g. MMMM
    - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd
    - Lazy formatting

Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001
Reviewed-by: Christopher Channing <cchanning@cloudera.com>
Tested-by: Christopher Channing <cchanning@cloudera.com>
2014-01-08 10:54:34 -08:00
Alex Behm
dd0409e9d6 IMPALA-509: Minimal type promotion for arithmetic exprs.
Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:30 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
Alex Behm
33000b8c15 Fixed codegen of floating-point modulo.
Change-Id: Idd28c6a71a659471aa632a6e26d970557daeb3bf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/385
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:46 -08:00
Nong Li
ce092065be Fix bug with how exec sets if the conjuncts are thread safe. 2014-01-08 10:50:53 -08:00
Skye Wanderman-Milne
0c343913fa IMPALA-266: Round() does not output the right precision 2014-01-08 10:50:02 -08:00
Skye Wanderman-Milne
04bee45af5 Update query test to use dayofyear() 2014-01-08 10:49:47 -08:00
Alex Behm
1b2e8280d4 Fix NULL issues. 2014-01-08 10:49:32 -08:00
Lenni Kuff
5f81becd84 Create tables used by insert tests in a supported insert format 2014-01-08 10:49:00 -08:00
Henry Robinson
8d87972695 Improve parser coverage
This patch adds support for the following SQL constructs

  - Unary + operator
  - The ALL keyword, in SELECT ALL and SELECT aggregate_func(ALL *)
  - REAL and INTEGER as type synonyms for DOUBLE and INT respectively
  - The AS keyword after a table spec. e.g. SELECT * FROM tbl AS t0
2014-01-08 10:48:54 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
ef48f65e76 Add test framework for running Impala query tests via Python
This is the first set of changes required to start getting our functional test
infrastructure moved from JUnit to Python. After investigating a number of
option, I decided to go with a python test executor named py.test
(http://pytest.org/). It is very flexible, open source (MIT licensed), and will
enable us to do some cool things like parallel test execution.

As part of this change, we now use our "test vectors" for query test execution.
This will be very nice because it means if load the "core" dataset you know you
will be able to run the "core" query tests (specified by --exploration_strategy
when running the tests).

You will see that now each combination of table format + query exec options is
treated like an individual test case. this will make it much easier to debug
exactly where something failed.

These new tests can be run using the script at tests/run-tests.sh
2014-01-08 10:46:50 -08:00
Marcel Kornacker
ea050a43ad Switching over backend runtime structures to new planner.
Added container-util.h
2014-01-08 10:46:20 -08:00
Nong Li
4d0319d32b Fix null string parsing. 2014-01-08 10:44:40 -08:00
Alexander Behm
ee705e3083 Added timestamp arithmetic expressions. 2014-01-08 10:44:31 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00