Commit Graph

268 Commits

Author SHA1 Message Date
Nong Li
895d69c09f IMPALA-1026: Fix decimal partition cols.
Change-Id: I956b69a86528f1969febf356181dc3182f309909
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2841
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-06-06 09:26:56 -07:00
Matthew Jacobs
2f9b2ae785 Fix SHOW DATA SOURCE test; must execute setup/cleanup serially
The SHOW DATA SOURCE tests were run as part of the other SHOW * tests
in test_show(), but the setup/cleanup for data sources can't be run
in parallel. This change moves the SHOW DATA SOURCE tests into a separate
test method and the setup/cleanup code is only run for this test (i.e.
not using setup_method() and teardown_method()). The test is then
only executed serially.

Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870
2014-06-05 20:07:57 -07:00
Nong Li
b5c5c05bcb Fix bad test. Needs to be overwrite to allow loading from snapshot.
Change-Id: I7abe2a105d72662c874debfb2b9ae98647b03a1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2853
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-06-05 08:36:46 -07:00
Nong Li
e6b7565eff Fix decimal literal casting and cast expr reanalyze().
BigDecimal doesn't think about scale the way we need it to.

Change-Id: I09612c31e30e80ce4806080f1d24c6615090785e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2794
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-02 23:34:20 -07:00
Ippokratis Pandis
e34ede292c IMPALA-1016: Return correct number of NULL values when projecting newly added column
This patch handles the case where when a query was projecting a newly added column,
the parquet scanner was returning infinite values.

Change-Id: Ie5f4d4a88d5868e8d9e5c39fa9440821776dde3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2725
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2761
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
2014-06-01 01:28:25 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Nong Li
6e691f9500 IMPALA-1010: Remove Close() of build side in blocking join node.
This optimization is generally not safe since the probe side is still streaming. The
join node could acquire all of the data from the child into its own pool but then
there's no real point in doing this (doesn't lead to lower memory footprint and just
makes the mem accounting harder to reason about).

This is exposed in busy plans.

Change-Id: I37b0f6507dc67c79e5ebe8b9242ec86f28ddad41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2747
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 11:50:50 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Matthew Jacobs
12b72c4330 IMPALA-1011: Handle SHOW DATA SOURCES when no sources configured
Change-Id: I367b90c7603aea973d442f9186a6b32598a66a28
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4df5c6d741237e9c91e84e39fd6ea760ccb40cf5)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2723
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-28 20:38:41 -07:00
Lenni Kuff
745c091fcc [CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats
Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-28 08:54:54 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Dimitris Tsirogiannis
ca86e470de IMPALA-887: Improve partition pruning time
This commit is the first step in improving the performance of partition
pruning. Currently, Impala can prune approximately 10K partitions per
sec, thereby introducing significant overhead for huge table with a
large number of partitions. With this commit we reduce that overhead by
3X by batching the partition pruning calls to the backend.

Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687
2014-05-26 13:10:12 -07:00
Victor Bittorf
c13a1d080e IMPALA-938: Fix implicit casting in timestamp arithmetic exprs.
Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671
2014-05-23 14:11:35 -07:00
Alex Behm
b252921363 IMPALA-994: Handle incorrect column metadata in views created by Hive.
Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-05-19 20:17:23 -07:00
Matthew Jacobs
f9c9a7ca13 Add SHOW DATA SOURCES
Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614
2014-05-19 17:52:27 -07:00
Matthew Jacobs
6ccd56bc1f Enforce slot equivalences at data source scan nodes
Change-Id: I2ed606ba398990ab05afa3301b6356c6a636e2bb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2521
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 55061f6953956f45d433fe227ded539a648e3f9c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2536
2014-05-19 14:37:44 -07:00
Dimitris Tsirogiannis
a7a9cde86f CDH-18969: Incorrect query result in Impala
This commit fixes issue CDH-18969 where Impala returns wrong results
when querying an HBase table. This issue is triggered when a column family
sorts lexicographically before ":key", which is the column family of the
row key, thereby causing the wrong column to be used as a row key by the
backend.

The following changes are included:
1. Modified the load function in HBaseTable.java to make sure the
catalog object of an HBase table always stores the row key column first.

Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602
2014-05-18 16:29:11 -07:00
Skye Wanderman-Milne
edbbe6035e Decimal: read from Avro
Allows reading decimal columns with or without codegen. Includes tests
based on a data file posted on HIVE-5823.

Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6
(cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-05-16 22:26:11 -07:00
Alex Behm
fcf4e43a3c IMPALA-962: Fully qualify table and view names in toSql().
Change-Id: I6bf757c4ffbaf82c136af7b59d2d415234545a86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2373
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2589
2014-05-16 01:26:38 -07:00
Dimitris Tsirogiannis
2d7a8b7c70 IMPALA-964: Full outer join on values() followed by group by hits a
preconditions check

This commit fixes IMPALA-964 where full outer join between two inline
views followed by a group by (e.g. select 1 FROM (VALUES(1 x, 1 y)) a
FULL OUTER JOIN (VALUES(1 x, 1 y)) b ON (a.x = b.y) GROUP BY a.x;)
hits a preconditions check. This check evaluates if the numNodes
(number of nodes for the purpose of resource estimation) variable
is greater or equal to zero and is triggered when we try to compute
the resource estimates (number of distinct values) of a plan fragment.

The following changes are included in this commit:
1. Modified the getNumDistinctValues function in PlanFragment class to
consider the special case where the numNodes of a plan fragment is -1.
2. Added a test case in QueryTest/joins.test.

Change-Id: I2962ed5079e174d0e76ad990ab84e1fb1a4607ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2466
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2514
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-05-11 19:30:38 -07:00
Victor Bittorf
0bb66ef327 Adding aliases ADD_MONTHS and SUB_MONTHS
This is a request for consistency with oracle.

Change-Id: I463a66694a068cd773532d8f6f853a4b089b918a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2400
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1f0b643789596f96c54580b8c5262fada4dfc958)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2502
2014-05-09 17:35:29 -07:00
Matthew Jacobs
0c533bb152 External Data Source: Backend changes
Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-09 02:24:41 -07:00
Dimitris Tsirogiannis
1a21bb9b9e IMPALA-642: Conjunctive predicates on HBase table not working...
This commit fixes IMPALA-642 issue where conjunctive predicates are
returning incorrect results from HBase in the presence of NULL values.

The following changes are included:
1. Modified the HBaseScanNode to re-apply the "pushed-down" predicates.
2. Added tests in QueryTest/hbase-filters.test
3. Added tests in PlannerTest/hbase.test

Change-Id: I598b325ad63b043b325fba74448698ed71a3cd78
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2414
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2489
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-05-08 13:59:00 -07:00
Henry Robinson
38befd2126 IMPALA-724: Support infinite / nan values in text files
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.

Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.

In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).

Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
2014-05-08 12:28:53 -07:00
Victor Bittorf
6f31dc7f8a Adding STDDEV builtin.
Change-Id: I79e5aee1e9e879aa2d09078ab45bc149675e1d4a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2341
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a42c375d933c0b7ffe7c9b6702777679492d7ad6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2464
2014-05-06 13:06:26 -07:00
Victor Bittorf
46151dc7dd Adding EXTRACT builtin.
Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370
2014-04-25 15:38:51 -07:00
Alex Behm
121fab8fdf IMPALA-888: Drop union operands with constant conjuncts evaluating to false.
This patch simplifies the complex slot materialization logic for unions by
making the materialization independent of conjuncts assigned to MergeNodes.
When 'pushing down' predicates into union operands, we drop union operands
with constant predicates evaluating to false. Constant predicates that
evaluate to true are simply ignored.

Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-23 18:25:14 -07:00
Henry Robinson
2a69019525 IMPALA-945: Fix column reordering with SELECT expressions
Previously, to produce the correct output expressions for the root plan
fragment before a table sink, InsertStmt would reorder the result
expressions for the query statement at the plan root. This had stopped
working for SelectStmts (and test coverage didn't catch that).

Now InsertStmt produces its own output expressions that can substitute
for the originals from the query statement, and the planner uses those
instead.

All query tests for column reordering have been duplicated to use SELECT
expressions.

Change-Id: Ib909fe35d27416b33ba2e5ac797aa931e1fe43f9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2204
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit d526db7ac6274f35b6affcb7428327100026e14e)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2275
2014-04-18 00:12:12 -07:00
Nong Li
1cab95066d Add the return type as a column for SHOW FUNCTIONS.
Also includes some misc pattern matching cleanup.

Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271
2014-04-17 17:58:13 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00
Skye Wanderman-Milne
e60bf29a96 IMPALA-13: Use SSE string functions that take an explicit length
This patch modifies DelimitedTextParser and StringValue to work with
data containing null characters by using SSE instructions that take a
length, rather than expecting null-terminated strings. It also adds
some other minor changes to correctly handle data with nulls and to
faciliate testing. I checked the execution time of a count(*) and a
select(*) limit 1 query locally, and saw no difference for either text
or sequence files.

Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181
2014-04-11 11:16:24 -07:00
Henry Robinson
37236845b1 Mark test_non_codegen_tinyint_grouping as execute_serially
The test contains an INSERT and some DDL, which is racy if performed in parallel.

Change-Id: I2b88533f45756fcf6372d6ee4eb7edd474087048
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2167
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 8b103c029cc341bacea4746c369bb58e6af5ed29)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2182
Tested-by: jenkins
2014-04-10 15:17:25 -07:00
Lenni Kuff
9e2dd7e049 Add support for SHOW PARTITIONS <table name>
This statement returns info on all partitions for the given table. It is implemented as
an alias for SHOW TABLE STATS, with some extended analysis checks (such as throwing if
the statement targets an unpartitioned table).

Change-Id: I19154a9d90314de18f86ba355aa5dbed808f147f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2145
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2179
Tested-by: jenkins
2014-04-10 12:15:39 -07:00
Henry Robinson
415540d789 IMPALA-901: Fix grouping with NULLs when codegen is disabled
The standard implementation of HashTable::Equals() did not correctly
check the NULL bit when the argument row did not evaluate to NULL for a
given probe expr. In the rare circumstance that this gave rise to a
false positive (more on that below), two rows with different grouping
values would be considered equal, and one would be excluded from the
final aggregation output.

HashTable::EvalRow() fills an expression value buffer with the values of
either probe or build exprs evaluated for the argument row. These cached
values are used to determine row equality in Equals(). In order to avoid
a lot of false collisions, an 'unlikely' value is written to that buffer
for NULL values, chosen to be HashUtil::FNV_SEED. So without correct
NULL-bit checking in Equals(), two single-slot rows are considered to be
equal if one of them has NULL for its slot, and the other has a value
equal to HashUtil::FNV_SEED truncated to the size of the slot.

For tinyint columns, this value is -59. As it happens, our random
generator happened to create a table with one tinyint column and which
contained NULL and -59 as values. In order to trigger this bug, the rows
must also have been written to disk in order such that the scanners
returned -59 *first*, and then NULL to the aggregation node; the bug is
not symmetric and works in the opposite case.

Change-Id: I17d43eaeee62b2ac01b67dd599bc4346b012a074
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2130
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e8098254280a9d5ead0b607263ca6728a3222a7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2161
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-07 17:30:52 -07:00
Alex Behm
a85dacafe8 IMPALA-904: Make TupleIsNullPredicate work on non-nullable tuples.
We wrap certain exprs substituted from outer-joined inline view in an expr that
evaluates to NULL if the underling tuple(s) are NULL. We do this for exprs that evaluate
to non-NULL values if their slots are NULL, i.e., we must then distinguish tuples that are
NULL from slots that are NULL (otherwise evaluating an expr against a tuple that is NULL
due to the outer join may incorrectly return a non-NULL value.)

The bug: Exprs referring to an outer-joined inline view may appear in various places
in the outer query block. For example, they could appear in an On-clause or be
placed into scans/aggregates due to predicate propagation. In such cases, the underlying
tuples may not be nullable yet because they only become nullable after the outer join.
We had a DCHECK in tuple-is-null-predicate.cc requiring the tuples to be nullable.
The fix: Remove the DCHECK. The fix is not elegant but practical. It would be rather
difficult to fix the inline view expr substitution such that a TupleIsNullPredicate
never references a non-nullable tuple, esp. due to predicate propagation.

Change-Id: I180f75f14173f356abfeec751e6b2d419378a9a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2157
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-07 14:18:49 -07:00
Nong Li
b0de4bbe40 IMPALA-812: Fix select node to properly transfer memory ownership.
Change-Id: I83b6d085362726aa080077845d3bef71b184621c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2076
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-25 18:38:55 -07:00
Skye Wanderman-Milne
3e728f3180 Symbol mangling for UDF prepare/close functions
Change-Id: If8f1386073f467e66ada74e606fc98f3344f0733
(cherry picked from commit 32df8b3f963a2b46ec33aad86a151d4c7ecda39c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-03-19 02:15:07 -07:00
Nong Li
457055f8f4 IMPALA-892: Fix subexpr for IR generated from compound predicate.
Change-Id: I638533827e97f3486eb75a571b18f9e8d1cd4aed
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1973
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-18 16:49:34 -07:00
Skye Wanderman-Milne
44125729dc UDF/UDA memory management improvements
* AggFnEvaluator now uses the UDF mem pool (I'm planning to change
  this to per-exec node pools in the expr refactoring)
* FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker
* Added FunctionContextImpl::Close() which sets warnings for leaked allocations

Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954
Tested-by: jenkins
2014-03-17 20:38:25 -07:00
Henry Robinson
635dd7d289 IMPALA-875: Respect isAnalyzed_ in IntLiteral expressions
Partition column expressions are analysed twice for INSERT statements -
once to infer the type and so to add a possible cast, and once to
compute stats on the resulting expr. However, this process resulted in
an partition column expr that was a IntLiteral getting the smallest type
that would contains its value, rather than retaining the
column-compatible type that had been assigned to it.

This patch does the minimum thing, which is make IntLiteral.analyze()
idempotent. Doing the same thing to Expr and LiteralExpr unearths some
other bugs, which we will have to fix in a follow-on patch (see
IMPALA-884).

Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947
2014-03-17 17:35:12 -07:00
Lenni Kuff
dd20958e5d Minor test cleanup
* Prefer 'refresh <table name>' over 'invalidate metadata'
* Remove the 'RELOAD' test setup option that was used by only 1 test.
* Delete a .py test file that seems to be a duplicate

Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-17 17:30:15 -07:00
Srinath Shankar
74a975c45b IMPALA-862: count(x) may return null when a similar count(distinct x) is also used
count(x) with no distinct and no group-by expressions returns NULL on empty input
if other distinct aggs (e.g. COUNT(distinct x) are present.
This happens because the COUNT is transformed to SUM(COUNT()),
with the inner COUNT being evaluated WITH a group-by expression (e.g. x).
SUM over empty input returns NULL, but COUNT should return 0.

This patch fixes this by replacing COUNT with zeroifnull(COUNT) before AggregateInfo
is generated if there are distinct aggs and no group-bys. The logic in AggregateInfo
itself has not been modified.

Change-Id: I902e3fdd95767135b2f3fe423e8802ef57366af1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1921
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-03-14 23:35:55 -07:00
Alex Behm
ce40134ad0 IMPALA-867: Fail COMPUTE STATS in analysis for Avro tables affected by HIVE-6308.
Avro tables that were not created with a column-definition list do not have
their columns properly populated in the Metastore backend DB (HIVE-6308).
For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed.
This patch fails COMPUTE STATS in analysis for such broken Avro tables
and adds tests for Avro tables with mismatched a column-definition list
and Avro schema.

Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920
2014-03-14 23:24:55 -07:00
Lenni Kuff
aa0b7a35f5 IMPALA-880: COMPUTE STATS should update partitions in batches
When updating partition metadata as part of COMPUTE STATS we would previously
attempt to update all partitions at once. This could lead to HMS socket timeouts
and also could run into issues if there were > 32K partitions.

In this change we now update the partitions in batches, with a max size of 500
partitions per batch. We also compare whether the row count has changed and only
update partitions that have been modified.

Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918
2014-03-14 19:20:12 -07:00
Alex Behm
15e05082c0 IMPALA-831: Distributed aggregation and top-n over unions.
Change-Id: I056e8271421008378db93e8b2393861cc9dd4b90
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1840
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1886
2014-03-13 15:42:31 -07:00
Lenni Kuff
cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
2014-03-13 13:00:15 -07:00
Alex Behm
7fcd7cd64e Add list of tables missing stats to explain header and mem-limit exceeded error.
Change-Id: Ibe8f329d5513ae84a8134b9ddb3645fa174d8a66
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1501
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1880
2014-03-12 21:15:22 -07:00
Alex Behm
58950a52a3 IMPALA-798: Distributed execution of CTAS and explain CTAS.
Change-Id: I32004a4b31c54cf5c185169fece143a61213d12d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1850
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1867
2014-03-12 16:51:50 -07:00
Matthew Jacobs
8fa8a0f828 IMPALA-843: Do not close reader contexts until plan fragment close
Fixes a crash that occurs in some cases when io buffers are still used and
child nodes are closed early. We close child nodes early when all rows have
been consumed and resources are transfered, but in some cases io buffers are
still in use when a scan node is closed. We avoid this problem by only
closing reader contexts when the entire fragment is closed.

Change-Id: Ie62cdecdcd530bdc61dd4e83cd9ecfc7d2c93ef6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1806
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 66f14a47b953b7b7153c73f4e018d03461dcd5ef)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1859
2014-03-12 14:44:18 -07:00
Alex Behm
748ea3f38b Fix test_partitioning.py and expected results.
Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-12 11:25:17 -07:00