Commit Graph

297 Commits

Author SHA1 Message Date
Alex Behm
22ab2595d6 [CDH5] Fixes to expected explain plans due to new HBase version.
Change-Id: I33f09283dcea278ca07f9d2d44e542e644def8ca
2014-01-15 15:12:24 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
6799c93922 Simplified/enhanced explain plans with a total of four explain levels.
There are now 4 explain levels summarized as follows:
- Level 0: MINIMAL
  Non-fragmented parallel plan only showing plan nodes with minimal attributes
- Level 1: STANDARD
  Non-fragmented parallel plan with some details in plan nodes
- Level 2: EXTENDED
  Non-fragmented parallel plan with full details in plan nodes including
  the table/column stats, row size, #hosts, cardinality,
  and estimated per-host memory requirement
- Level 3: VERBOSE
  Fragmented parallel plan with full details (like level 2)

This patch also includes several bugfixes related to plan costing and/or
testing of explain plans.

Change-Id: I622310f01d1b3d53ea1031adaf3b3ffdd94eba30
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1211
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-10 19:17:59 -08:00
Skye Wanderman-Milne
561da008c7 IMPALA-729: fix resource management in Parquet scanner for multiple row groups
We weren't attaching resources to the row batch when starting a new
row group, so it was possible for string data to be overwritten. This
patch removes CloseStreams() and merges its functionality with
AttachCompletedResources() so it's not possible to destroy streams
without transferring the resources first. It also merges and removes
ScannerContext::Close().

Also adds test cases for IMPALA-720.

Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb
2014-01-08 10:56:26 -08:00
Alan Choi
57b961168d IMP-1188 Fix HBase row key predicates issues
This patch fixes a few row key issues:

1. We used to assert that the row key filter must be a string literal.
However, it can also be a constant function. We need to eval the expr
and then use the result as the start/stop key.

2. Cast(row_key as int) simply failed.
This should not be transformed into start/stop key.

3. We used to assert that lower bound < upper bound.
This query:
  select * from tbl where row_key > 'b' and row_key < 'a'
would simply ASSERT. We should simply not return any rows.

4. Handle NULL predicate
HBase row key can't be null. If either upper/lower bound is null, we simply
don't need to return any rows.

Change-Id: Ia03590a862888b377bf1f48bcb838b99193fa241
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1180
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:40 -08:00
Alan Choi
468ca0aa5d IMPALA-723 Fix union with aggregate
The problem is that with Union, AggregateInfo.materializeRequiredSlots() is being called more than once.
Other "materializeSlots" related calls are idempotent, but this one is not.
That's because materializedAggregateSlots_ is an array list and we keep adding the same duplicate value
to the array list. We can fix it by making materializeRequiredSlots() idempotent.

Change-Id: Ic18f89010c088fe9018b15f0281bc9340b8a2d14
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1195
Tested-by: jenkins
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
2014-01-08 10:54:40 -08:00
Lenni Kuff
6afea60704 Update test logging to print executable SQL statements and log all actions executed
This is the first step in cleaning up the test logging. It provides a common connection
interface that provides tracing around all operations. When a test fails the output will
be executable SQL. It also logs actions such as when a connection is opened, close, or
when an operation is cancelled. Currently only beeswax connections are supported, but
I have a seperate patch that adds support for executing using HS2 as well as Beeswax.

Example of new logging:
-- connecting to: localhost:21000
-- executing against localhost:21000
use functional;

SET disable_codegen=False;
SET abort_on_error=1;
SET batch_size=0;
SET num_nodes=0;

-- executing against localhost:21000
select a.timestamp_col from alltypessmall a inner join alltypessmall b on
(a.timestamp_col = b.timestamp_col)
where a.year=2009 and a.month=1 and b.year=2009 and b.month=1;
-- closing connection to: localhost:21000

Change-Id: Iedc7d4d3a84bfeff6cc1daae6ed1ca97613d7700
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1133
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:40 -08:00
Alan Choi
00e912b372 IMPALA-715 HBase scanner should use CallIntMethod for int return type function call
The hbase-table-scanner used CallShortMethod to retrieve the size of the array.
    Short will overflow. Because the size of the array is an int, we should use
    CallIntMethod instead.

Change-Id: I941981f7504ee04adf998398f8baf6beae76d000
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1171
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:39 -08:00
Matthew Jacobs
967346b0c4 IMPALA-630: Add fn to get the PID of the impalad to which the user is connected
Change-Id: I2d8b304bfb22883489bbbbe33e07478d164583b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1127
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:54:37 -08:00
Chris Channing
7e98708d7d IMPALA-114: Add support for custom date/time formats
This change set adds support for dealing with custom date/time formats in Impala.  The following date/time tokens are supported:

y – Year
M – Month
d – Day
H – Hour
m – Minute
s – second
S – Fractional second

The token names and usage have been modeled on the SimpleDateFormat class used in Java. This allows the use of repeating tokens to indicate zero padding for an output scenario (TS -> String) and a guide for reading data to a given length in a parsing scenario. Representing literals months is achieved by specifying three repeating tokens e.g. yyyy-MMM-dd -> 2013-Nov-21.

Formatting character groups can appear in any order along with any separators e.g.

yyyy/MM/dd
dd-MMM-yy
(dd)(MM)(yyyy)  HH:mm:sss
..etc..

The following features are not supported with this patch:

    - Long literal months e.g. MMMM
    - Nested strings e.g. “Year: “ yyyy “Month: “ mm “Day: “ dd
    - Lazy formatting

Change-Id: Ibba2eaed366fd736b921b31b8d0d517ac1248bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1001
Reviewed-by: Christopher Channing <cchanning@cloudera.com>
Tested-by: Christopher Channing <cchanning@cloudera.com>
2014-01-08 10:54:34 -08:00
Alex Behm
74164e8f99 IMPALA-688: Fix column stats computation for HBase row key. Use regex to fix flaky tests.
Change-Id: I1d3fb915921bbc5366da0ee51608fd54aa237777
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1135
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:33 -08:00
Alex Behm
e4ad086dee Added max/avg length for string columns in COMPUTE STATS.
Change-Id: I6f61de2323ee12681642684ec633ed4bb7506de2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1079
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:30 -08:00
Alex Behm
dd0409e9d6 IMPALA-509: Minimal type promotion for arithmetic exprs.
Change-Id: I576fe9baf3bae7d46ee08e29ececc4adda97e9df
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1078
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:30 -08:00
Matthew Jacobs
93368e20b1 Fix CROSS JOIN handling in join order optimization and add tests
Cross joins should be handled like outer joins in the join order
optimization in that the right table referenced by a cross join may not
be reordered anywhere before tables referenced to the left of the cross
join. If there are inner joins to the right of the cross join, those
tables may be reordered before the cross join.

E.g., if we have A JOIN B CROSS JOIN C JOIN D, then C must come after A
and B, but D may be reordered to come before C.

Also adds test cases for join order optimization and predicate propagation.

Change-Id: I6b1022dd3e862efbff81e283b43284d846c8eca4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1096
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:29 -08:00
Skye Wanderman-Milne
9e17042185 Allow zero bit width dict/RLE decoders.
This allows us to read single-value dictionary-encoded columns
generated by parquet-mr.

Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne
de531e15bd IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8
parquet-mr had a bug where it didn't include the dictionary page's
header in the total column size. We now compensate for this by
detecting these files and padding the scan range length. This required
changing how the scanner detects when it's finished: it now counts the
number of rows rather than checking eosr (since the scan range may be
longer than the column).

Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Matthew Jacobs
f327431a8e IMPALA-171: Add CROSS JOIN
Adds a CROSS JOIN (cartesian product). Common join code is moved from to
a new abstract base class BlockingJoinNode. We must keep all build RowBatches in
memory in order to iterate over them for every row from the left child. The
TupleRowList provides a convenient way to iterate over all of the rows.
A future change will address codegen for the CrossJoinNode.

Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/970
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne
acdc792355 IMPALA-695: Use the local path of Hive UDF jars in the FE.
The FE was creating class loaders with the HDFS locations of Hive UDF
libs, rather than the local locations created by the BE. Our tests
still passed since we only used UDFs already on the classpath
(e.g. Hive builtins).

Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne
b54d16dabd IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions.
Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Alex Behm
24662b1941 Allow ALTER TABLE to set per-partition serde and table properties.
The main motivation is to allow users to set the per-partition number
of rows for manual incremental stats maintenance, as well as a means
to 'drop' stats that may have caused undesirable plan changes.

Change-Id: Iff38317a993e5d7952ea4df839947f5ec341e930
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1010
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
bfb16ff552 Disable SHOW STATS tests because results are unstable (IMPALA-688)
Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:22 -08:00
Nong Li
e3fdef7839 Fix subexpr elimination IR rewriting.
Change-Id: Iabdcc1686951e71136a603ed30f9d16fb1c1ec46
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1056
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
0bae3978c9 Update compute-stats.py to execute using Impala
Updates our compute stats script to execute using Impala. This allows us
to easily compute stats on all tables in a database or all tables in the
metastore.
The updated stats caused one of the TPCH plans to change so this also
updates the TPCH planner test results.

Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Lenni Kuff
76fa3b2ded Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax
This change updates our DDL syntax support to allow for using 'STORED AS PARQUET'
as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax,
but continue to support the old.  I made the same change for 'AVROFILE', but since
we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax.

Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Nong Li
ab21dde002 Update compute stats test to use regex for parquet/hbase file size.
The parquet file stores the application version that wrote it so
is different between our c4 and c5 branches.

HBase storage is also not guaranteed to be identical across versions.

Change-Id: I02984a55e0678756e50c1fff6db22c43788d3916
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1028
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:17 -08:00
Alex Behm
93e5b262c2 Added COMPUTE STATS command for gathering table and column stats.
A compute stats command computes the table and column stats for a given
table and persists them in the metastore.
The table stats consist of the per-partition and per-table row count.
The column stats are computed on a per-table basis and consist of the
number of distinct values and the number of NULLs per column.

This patch introduces a new 'child query' concept that
compute stats utilizes. Child queries are cancelled
if the parent query is cancelled. A compute stats stmt is
executed by the following query hirarchy:
parent: compute stats query (DDL)
- child: compute table stats query (QUERY)
- child: compute column stats query (QUERY)

The new child query concept is necessary to decouple child query fetches
from parent query fetches, i.e., we could not execute a child query as
part of the original compute stats query, because then a client could
fetch the results we need for updating the Metastore statistics. The
reason why our existing CTAS works without this decoupling
is that its insert 'child query' is not fetchable.

Change-Id: I560533e3cb09bcbbdb3eea7fcf0b460bc6b36dcd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/873
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00
Skye Wanderman-Milne
49f4bd285a Change test_metadata_query_statements.py::test_show to ignore Parquet
file sizes since they may fluctuate slightly.

Change-Id: I3ddb6ceebe6dcc86cc1c58b35b0cd96986ec43e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/988
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:14 -08:00
Nong Li
7f08146b88 Add ndv (distinct estimate) as a builtin aggregate function.
This is implemented in the BE using HLL (but we could change this in the
future).

These estimates usually work better than the other algorithm we have and
we've not implemented all the improvements from the google paper.

Change-Id: Ied715ddd0e1a7cbe7f5f90469f1ed3d4b9c537c7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/956
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:03 -08:00
Matthew Jacobs
8a55982105 Add OFFSET to skip rows returned with a LIMIT
Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive
does not support OFFSET so creating a view with an OFFSET will not work in Hive.

For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip
5 rows, then return the next 20. OFFSET requires an ORDER BY clause.

Note this is not very efficient as we must actually keep (limit+offset) rows in memory
in the topn-node, and all child sort nodes must as well. Users should be careful when
using this feature.

Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/900
Tested-by: jenkins
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:54:02 -08:00
Skye Wanderman-Milne
9147cd7518 IMPALA-525: Adjust IO buffer size based on read length and other memory fixes
We were previously wasting memory by always reading into 8MB IO
buffers, even when the data read was much less than 8MB. With this
patch, the IO manager picks a buffer size closer to the actual amount
being read (we don't use the exact size so we can continue to recycle
buffers). The minimum IO buffer size is determined via the
--min_buffer_size flag, and the max IO buffer size via the --read_size
flag.

This technique also helps with IMPALA-652, since short columns will
not use as much memory as before (we will not use considerably more
memory than the size of the table).

This patch also changes StringBuffer to use a doubling strategy so it
doesn't end up allocating many large unused buffers, and has the
scanner context use the requested length as the sync read size if it's
larger than the size produced by read_past_size_cb(). These changes
help prevent the boundary buffer in the scanner context from
allocating excess memory.

Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50
Reviewed-on: http://gerrit.ent.cloudera.com:8080/938
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:01 -08:00
Lenni Kuff
6bba0c8ffe Fix bug cleaning up removed Functions and fix test_ddl to create all test dbs
When dropping functions, we neeed to remove the function from the list
of Functions with that name AND remove the list from the Function map if
the list is empty. The second part wasn't happening.

Also fixes the test_ddl to properly create all test databases.

Change-Id: Id85af7d5db74a31161f48bea3816bdf734063133
Reviewed-on: http://gerrit.ent.cloudera.com:8080/952
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:00 -08:00
Lenni Kuff
39f77b8b8f Add support for cluster-synchronized catalog operations
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....

Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.

The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1

To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.

TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).

Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:58 -08:00
Lenni Kuff
8bb0010415 IMPALA-597: Do not crash when multiple partitions have the same LOCATION
This patch fixes an issue where Impala would crash if two partitions had
the same HDFS location. This is now fixed in hdfs-scan-node. It also includes some
cleanup and bug fixes to the FE partition related classes and adds tests.

There is still a problem where partition location metadata is not sent
to the BE for INSERT statements, but that will be resolved in a separate
patch.

Change-Id: I0f1c3113d654f7d2b410f00e793ff6b0cae1ae18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/876
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:57 -08:00
Matthew Jacobs
51bfc99c63 IMPALA-395: Impala "show create table" statement
Adds support for "show create table", a DDL statement that outputs a DDL statement that
creates the specified table.

In general, the output DDL works in Impala, so a user can copy the output and execute it
to create the same table. However, there are a few special cases that output Hive DDL
because we do not support creating some tables in Impala: HBase tables and tables with
LZO compressed text. When we do support creating these tables in Impala, users should
be able to execute the DDL in Impala as well.

Change-Id: I8c130297a657810dea5b994bf99d72b0e61b847b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/842
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:53:53 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
Matthew Jacobs
00bc971d34 IMPALA-531: Allow arithmetic expressions for LIMIT
Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/829
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:53:49 -08:00
Matthew Jacobs
65353fd9fb IMPALA-598: Order by behavior for NULLs should be revisited
This change modifies that behavior of NULL ordering such that nulls always
compare greater than other values, but "nulls first" or "nulls last" can be used
to explicitly specify if nulls should be sorted first or last regardless of the
asc/desc.

Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/812
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2014-01-08 10:53:48 -08:00
Skye Wanderman-Milne
9d05d6d03a Allow UDF tests to run in parallel.
Change-Id: I9512d4a6920c4a71383d9374eb5feb303c3db85d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/727
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:47 -08:00
Skye Wanderman-Milne
7e8e184acf Allow UDFs in conjunct expressions.
This patch refactors HDFSScanNode to copy and prepare all conjunct
exprs in Prepare(), rather than in the scanner threads. This is
necessary so the UDF exprs get codegen'd. Prepare() also only codegens
the functions for the necessary file formats now, rather than for all
file formats regardless of what's actually be scanned.

Change-Id: Ic3220cbd0cba9a3baa138b1f50ecdc6889ed0cd1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/710
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:39 -08:00
Skye Wanderman-Milne
97a6b12e37 Fix UDFs used in partition pruning exprs.
Exprs used for partition pruning are prepared/evaluated with a
separate RuntimeState. If these exprs use UDFs, the runtime state
needs access to the process's ExecEnv so we can use the LibCache and
the IR produced by the UDF exprs needs to be optimized and jit'd.

Change-Id: If7c1d6ebc0015ef3c21a0421c1a36cad4be66625
Reviewed-on: http://gerrit.ent.cloudera.com:8080/695
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:39 -08:00
Nong Li
601f24a198 UDA execution loose ends.
Unfortunately, the BE does not have the codegen path to execute UDAs.
This puts some restrictions on the UDAs we can run.

- No IR UDAs
- No varargs
- Must have 8 arguments or less.

The code to do this is almost all there for UDFs but I'm not sure I'll get to it.

Change-Id: I8a06e635a9138397c8474a5704c3e588bb92347b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:38 -08:00
Nong Li
a944a1fe52 'Invalidate metadata' no longer clears user functions.
Change-Id: I36de18fefa1d515a7960c2bf8c116d5217c388d6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/726
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:36 -08:00
Alex Behm
b670b9f4f9 Fix switch to hive-exec from hive-builtins in UDF test file.
Change-Id: Ibb75e129ea6c3da5ede9e8e399e537e3e561e814
Reviewed-on: http://gerrit.ent.cloudera.com:8080/723
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:35 -08:00
Lenni Kuff
01c8c43fec Uniquify FUNCTION catalog topic entry keys by including parent database name
Change-Id: I6aa49520f548ddfcd557e2f908a09be454765e8c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/698
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:29 -08:00
Alex Behm
b82880738c IMPALA-617: Cast NULLs in INSERT statement due to incomplete permutation list to expected column type.
Inserting NULLs with NULL_TYPE into Parquet tables cases a crash.

Change-Id: I350c7ee2789c017cee5c4b6a1292c9fae36087f1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/696
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:29 -08:00
Skye Wanderman-Milne
b41ff0c8cd Modify test-udfs.cc so there are no undefined symbols in shared library.
AnalyzeDDLTest was failing because the fesupport binary couldn't
resolve a function used in libTestUdfs.so (the function was defined in
udf.cc, rather than udf.h). I couldn't figure out how to cleanly build
udf.cc into the libTestUdfs.so, so instead I removed the use of the
function in test-udfs.cc.

Change-Id: I81243547584a5b49a5f9265d0d17e035e18d6110
Reviewed-on: http://gerrit.ent.cloudera.com:8080/694
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:27 -08:00
Nong Li
911cfc1bb9 Fix vararg UDFs.
Change-Id: I0e202b984ece7de3d220b6ce89b0c0a4c9edcb45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/688
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:26 -08:00
Nong Li
4800995d44 Add execution for Hive UDFs.
Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500
Reviewed-on: http://gerrit.ent.cloudera.com:8080/458
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:25 -08:00
Nong Li
904289d168 Add UDA execution.
Change-Id: Ie5aab79742675fc62ed731c13abe83304df80991
Reviewed-on: http://gerrit.ent.cloudera.com:8080/642
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:24 -08:00
Alex Behm
3f54240fed PlannerTest uses explain level 'normal'. Only add stats and costs to explain output in 'verbose' mode.
Change-Id: I827b4c7085b5aa2dc5521f8748d8973178f43f4c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/678
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:23 -08:00