Commit Graph

192 Commits

Author SHA1 Message Date
Matthew Jacobs
2f9b2ae785 Fix SHOW DATA SOURCE test; must execute setup/cleanup serially
The SHOW DATA SOURCE tests were run as part of the other SHOW * tests
in test_show(), but the setup/cleanup for data sources can't be run
in parallel. This change moves the SHOW DATA SOURCE tests into a separate
test method and the setup/cleanup code is only run for this test (i.e.
not using setup_method() and teardown_method()). The test is then
only executed serially.

Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870
2014-06-05 20:07:57 -07:00
Nong Li
5d80942d42 [CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads.
Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 19:14:39 -07:00
Nong Li
84f851b5a5 IMPALA-959: Fix ASAN decimal crashes.
Not quite sure what the underlying issue is but these fixes seem to work.

Change-Id: I759804eb8338ba86969c0214a1e6e35588c94297
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2726
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-30 16:47:07 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Matthew Jacobs
f9c9a7ca13 Add SHOW DATA SOURCES
Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614
2014-05-19 17:52:27 -07:00
Skye Wanderman-Milne
edbbe6035e Decimal: read from Avro
Allows reading decimal columns with or without codegen. Includes tests
based on a data file posted on HIVE-5823.

Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6
(cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-05-16 22:26:11 -07:00
ishaan
0298e8b6ab Fix the ASAN build by xfailing test_decimal when ASAN_OPTIONS is set.
Adding decimal columns crashes an ASAN built impalad. This change skips the test.

Change-Id: Ic94055a3f0d00f89354177de18bc27d2f4cecec2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2532
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2594
2014-05-16 18:14:30 -07:00
Matthew Jacobs
19f34d9187 test_data_source_tables should only run for 'text/none'
Change-Id: I784fb4305f8cff92c2582b0a7008f836c7aa9fa4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2504
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9f3621ec5d270c60258e93e8a2a596329c31f4e6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2508
2014-05-09 19:32:18 -07:00
Matthew Jacobs
0c533bb152 External Data Source: Backend changes
Change-Id: Ifa62b4ea231da47facb31c3f8d43e5e3ac73591f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2284
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f1e5db2853135c4346788192e2dbc632d4fe1dfb)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2497
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-09 02:24:41 -07:00
Henry Robinson
38befd2126 IMPALA-724: Support infinite / nan values in text files
This patch allows the text scanner to read 'inf' or 'Infinity' from a
row and correctly translate it into floating-point infinity. It also
adds is_inf() and is_nan() builtins.

Finally, we change the text table writer to write Infinity and NaN for
compatibility with Hive.

In the future, we might consider adding nan / inf literals to our
grammar (postgres has this, see:
http://www.postgresql.org/docs/9.3/static/datatype-numeric.html).

Change-Id: I796f2852b3c6c3b72e9aae9dd5ad228d188a6ea3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2393
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 58091355142cadd2b74874d9aa7c8ab6bf3efe2f)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2483
2014-05-08 12:28:53 -07:00
Nong Li
03e5665e56 Decimal: Read/Write to parquet.
This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and
encoding for decimals.

Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432
2014-05-02 16:38:35 -07:00
Nong Li
5adbcbbce5 Update decimal tests to only run on text/none.
Change-Id: I9a35f9e1687171fc3f06c17516bca2ea4b9af9e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2217
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2431
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-02 12:18:37 -07:00
Nong Li
bb3feb675e Dynamically scale down mem usage in scanners and io mgr.
This patch scales down the amount of buffering in the io mgr and the number
of scanner threads if the query is close to mem limits.

Change-Id: I68ef247a68642939b98ec7c429dfd393b23a20d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1906
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2417
2014-05-01 15:04:07 -07:00
Skye Wanderman-Milne
60db4d4d82 CDH-18416: Don't inline ReadWriteUtil::ReadZLong()
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.

This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.

Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
2014-04-28 15:58:15 -07:00
Skye Wanderman-Milne
bd2fc2d1d4 IMPALA-934: Refresh cached UDF library when creating a new function
This change adds the ability to refresh a local cache entry, causing
the old cache entry to be dropped and the library to be reloaded from
HDFS. This is used in ResolveSymbolLookup(), which is called by the
frontend when creating a new a function, and in ImpalaServer when
receiving a "create function" heartbeat. This change also makes sure
the FE calls into the backend for jars, so jars get refreshed as well.

Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348
2014-04-24 19:39:16 -07:00
Lenni Kuff
bb09b5270f IMPALA-839: Update tests to be more thorough when run exhaustively
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()

These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.

Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
2014-04-18 20:11:31 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00
ishaan
5803e6883e Cleanup and re-enable some tests in TestPartitionMetadata
Partition metadata tests were marked as xfail because of IMPALA-624. Additionally, we had
to invoke hive to insert into two partitions pointing to the same location (this
limitation is now removed). This patch changes the test to use Impala exclusively,
removes the xfail tag and adds a teardown method to the test class.

Change-Id: I15fa97bef4f8714d0873a9c713627a198f3388ad
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2086
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2215
2014-04-13 17:55:43 -07:00
ishaan
0e0c480262 Re-enable some tests in test_describe_formatted
A few tests which dealt with running queries via hs2 and impala were marked as xfail as
hiveserver2 would occasionally not come up. Given that we now have a script that checks
whether hiveserver2 is up before continuining the build, it should be safe to remove the
xfail.

Change-Id: I2b5063e7259c01fc0ef8ffda86d85514c9cf959c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2082
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2214
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-13 17:51:45 -07:00
ishaan
6f416dd2c2 Close all queries in test_cancellation
The queries in test_cancellation are currently cancelled but not closed, causing some test
queries to eventually time out because the admission controller limits are passed. This
patch ensures that all queries issued in test_cancellation are closed.

Change-Id: I65b26672155e31889bb6f43d3ac87be0f7b4eb72
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2187
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2213
2014-04-13 17:45:51 -07:00
Nong Li
1a3caca8c4 [CDH5] Update execution engine to take advantage of DN caching.
This finishes up the support to use HDFS caching. The scheduler will
prefer replicas that are cached and the scan node plumbs the metadata
to the io mgr.

This is a bit hard to test without a cluster and some perf benchmarking.
I've added a basic test to make sure the path is being exercised.

Change-Id: I8762ca9ef2f88c3637113d3c5ee82f4c0ea7f1be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2212
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-13 17:11:21 -07:00
Skye Wanderman-Milne
e60bf29a96 IMPALA-13: Use SSE string functions that take an explicit length
This patch modifies DelimitedTextParser and StringValue to work with
data containing null characters by using SSE instructions that take a
length, rather than expecting null-terminated strings. It also adds
some other minor changes to correctly handle data with nulls and to
faciliate testing. I checked the execution time of a count(*) and a
select(*) limit 1 query locally, and saw no difference for either text
or sequence files.

Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181
2014-04-11 11:16:24 -07:00
Alex Behm
2fff51d9e9 IMP-1329,IMPALA-924: Make ExchangeNode::Open() block until rows are available.
The bug: Coordinator::Wait() is supposed to block until rows become available for
consumption by the client. We rely on Wait() to determine when to advance the query
status to a 'ready' state and signal to the client that rows can be fetched.
Long fetch times can trigger client timeouts at various levels (socket, app, etc.).
Coordinator::Wait() simply opens the coordinator fragment's plan tree.
For most plan nodes, Open() does work to prepare the plan tree, s.t., GetNext()
returns quickly. However, for ExchangeNodes Open() used to not wait
until rows are obtained form the underlying stream receiver.
The fix: Make ExchangeNode::Open() block until rows are available.

Change-Id: I7b197eea11d21fd732414d96c899a17b2d99631c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2185
2014-04-10 23:49:38 -07:00
Henry Robinson
37236845b1 Mark test_non_codegen_tinyint_grouping as execute_serially
The test contains an INSERT and some DDL, which is racy if performed in parallel.

Change-Id: I2b88533f45756fcf6372d6ee4eb7edd474087048
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2167
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 8b103c029cc341bacea4746c369bb58e6af5ed29)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2182
Tested-by: jenkins
2014-04-10 15:17:25 -07:00
Henry Robinson
415540d789 IMPALA-901: Fix grouping with NULLs when codegen is disabled
The standard implementation of HashTable::Equals() did not correctly
check the NULL bit when the argument row did not evaluate to NULL for a
given probe expr. In the rare circumstance that this gave rise to a
false positive (more on that below), two rows with different grouping
values would be considered equal, and one would be excluded from the
final aggregation output.

HashTable::EvalRow() fills an expression value buffer with the values of
either probe or build exprs evaluated for the argument row. These cached
values are used to determine row equality in Equals(). In order to avoid
a lot of false collisions, an 'unlikely' value is written to that buffer
for NULL values, chosen to be HashUtil::FNV_SEED. So without correct
NULL-bit checking in Equals(), two single-slot rows are considered to be
equal if one of them has NULL for its slot, and the other has a value
equal to HashUtil::FNV_SEED truncated to the size of the slot.

For tinyint columns, this value is -59. As it happens, our random
generator happened to create a table with one tinyint column and which
contained NULL and -59 as values. In order to trigger this bug, the rows
must also have been written to disk in order such that the scanners
returned -59 *first*, and then NULL to the aggregation node; the bug is
not symmetric and works in the opposite case.

Change-Id: I17d43eaeee62b2ac01b67dd599bc4346b012a074
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2130
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e8098254280a9d5ead0b607263ca6728a3222a7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2161
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-07 17:30:52 -07:00
Henry Robinson
99c37aac37 IMPALA-827: Add an option for directories created by INSERT to inherit
their parent's permissions

This patch adds --insert_inherit_permissions. If true, all
new partition directories created by INSERT will inherit their
permissions from their parent. When false, the directories are created
with the default permissions.

Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-04 10:25:20 -07:00
Skye Wanderman-Milne
8e9776b824 Mark TestUdfs.test_mem_limits to run serially
This was causing other tests to fail with process mem limit exceeded.

Change-Id: I1407b0896052aece691c681827994961b09d8103
(cherry picked from commit 2bcc46117f504f50ded724fddf74f24bd829c6c6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2003
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-03-19 14:18:11 -07:00
Skye Wanderman-Milne
3e728f3180 Symbol mangling for UDF prepare/close functions
Change-Id: If8f1386073f467e66ada74e606fc98f3344f0733
(cherry picked from commit 32df8b3f963a2b46ec33aad86a151d4c7ecda39c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1993
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-03-19 02:15:07 -07:00
Lenni Kuff
9c3b318112 Fix test_compressed_formats to properly pull in tbl created in Hive
Change-Id: I4e143826e5900ebfa6f77023ae4cf0d2c71db190
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1960
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1967
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-18 13:24:10 -07:00
Lenni Kuff
b7432cd68a Constrain test_explain to run only on text/none table format
The tests expect to be run against text/none tables which causes failures
on exhaustive test runs.
I don't think it adds any extra coverage to run these tests against lzo
format so added a constraint.

Change-Id: Ib0878e2ba84107c9df4499def304fe45ba4fe4b4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1884
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1964
Tested-by: jenkins
2014-03-18 11:51:19 -07:00
Skye Wanderman-Milne
44125729dc UDF/UDA memory management improvements
* AggFnEvaluator now uses the UDF mem pool (I'm planning to change
  this to per-exec node pools in the expr refactoring)
* FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker
* Added FunctionContextImpl::Close() which sets warnings for leaked allocations

Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954
Tested-by: jenkins
2014-03-17 20:38:25 -07:00
Lenni Kuff
d7c06486e1 Disable flaky explain tests due to inconsistent per-host mem requirements
Change-Id: Ie372696c4986dc7f7c8f7fc074c41b89bd65f456
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1939
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit ed4cb660b7a60d9b9248df525c477bab4d218c4b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1953
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-17 17:42:21 -07:00
Henry Robinson
635dd7d289 IMPALA-875: Respect isAnalyzed_ in IntLiteral expressions
Partition column expressions are analysed twice for INSERT statements -
once to infer the type and so to add a possible cast, and once to
compute stats on the resulting expr. However, this process resulted in
an partition column expr that was a IntLiteral getting the smallest type
that would contains its value, rather than retaining the
column-compatible type that had been assigned to it.

This patch does the minimum thing, which is make IntLiteral.analyze()
idempotent. Doing the same thing to Expr and LiteralExpr unearths some
other bugs, which we will have to fix in a follow-on patch (see
IMPALA-884).

Change-Id: Ie22fc5d3f4832c735a1ebc0ef78f50d736f597fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1931
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1912d65ea21a5025d385948642f0d4aadad91abf)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1947
2014-03-17 17:35:12 -07:00
Lenni Kuff
dd20958e5d Minor test cleanup
* Prefer 'refresh <table name>' over 'invalidate metadata'
* Remove the 'RELOAD' test setup option that was used by only 1 test.
* Delete a .py test file that seems to be a duplicate

Change-Id: I890546635840bb8f4d55789a89f8c8f33e40d001
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1933
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1946
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-17 17:30:15 -07:00
Skye Wanderman-Milne
be18bd8f76 IMPALA-752: Improve INSERT error message for unsupported file formats
Change-Id: Ib16817d6e49d3df30643563eb9ec5573a920bba7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1911
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9e93c237fde1877eb0d140e73b090f2b891f3474)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1941
2014-03-17 14:54:46 -07:00
Nong Li
88a54dc532 Restrict parquet many cols test to one test dimension.
Change-Id: Ib7e1a63a9981fc646899b627748b523119d9a5d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1928
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-03-16 15:37:37 -07:00
ishaan
2caf687d8d Temporarily disable explain tests for explain levels 2 and 3
The explain tests that verify we detect missing stats properly is failing for avro. This
change disables the test to unblock the full data load build.

Change-Id: I0a7f54dbf1e8a3ebb557250287e7e0491aaa27f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1925
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-03-15 20:13:20 -07:00
Lenni Kuff
aa0b7a35f5 IMPALA-880: COMPUTE STATS should update partitions in batches
When updating partition metadata as part of COMPUTE STATS we would previously
attempt to update all partitions at once. This could lead to HMS socket timeouts
and also could run into issues if there were > 32K partitions.

In this change we now update the partitions in batches, with a max size of 500
partitions per batch. We also compare whether the row count has changed and only
update partitions that have been modified.

Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918
2014-03-14 19:20:12 -07:00
Nong Li
6629c32d6a IMPALA-742: Fix unhandled metadata reading code path in parquet scanner.
Change-Id: I1dd6364d148fed881020c045ece635b1601f86bb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1836
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-14 16:08:45 -07:00
Lenni Kuff
cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
2014-03-13 13:00:15 -07:00
Alex Behm
748ea3f38b Fix test_partitioning.py and expected results.
Change-Id: I21148f3a10abbda4f9e587f83cbabdd2a79c6147
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1861
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1866
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-12 11:25:17 -07:00
Lenni Kuff
08417c875f IMPALA-849: Impala does not work with boolean partition key columns
This is because in HdfsTable we call call "expr.castTo(colType)", but BooleanLiteral
(incorrectly) didn't implement "uncheckedCastTo()". This meant that instead of a
BooleanLiteral being returned we got back a CastExpr, which cannot be cast to LiteralExpr.

As part of this change it turns out Boolean partition columns are also broken in Hive. I
filed HIVE-6590 for these issues and we decided to disable INSERT into a boolean partition
column for Impala due to this bug.

Change-Id: I3e295bb96aadc08d64faf551f6393a7128a7ef27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1755
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-03-11 18:42:08 -07:00
Henry Robinson
05c8e4da93 IMPALA-624: Inserts should respect changes in partition location
Impala would ignore changes in a partition's location (by ALTER TABLE
... SET LOCATION ...).

Change-Id: I9fdc1f09f9d848aa1a4ade3d4f35f8de9cbd18a5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1647
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1824
2014-03-08 13:21:06 -08:00
Lenni Kuff
23c619f794 Limit test_udfs to always run with a single exec_option test vector
Change-Id: If3ff1f5f17a95cce88282f9dc165fe5ce85200b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1781
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1811
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-07 18:44:11 -08:00
Skye Wanderman-Milne
6ceed1e632 UDF API additions
This patch introduces the ability to specify a prepare and close
function for a UDF, as well as FunctionContext methods for maintaining
state across UDF invocations within a query. Many of the changes are
related to adding an Expr::Open() function which calls the UDF's
prepare function, if specified (it has to be called in Open() since
the LLVM module must be compiled first).

Change-Id: I581d90d03dff71f7ff5d4a6bef839ba6bc46b443
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1693
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8e2ed7fb9051d98f89327715fdebd6f5ed22d6ee)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1757
2014-03-05 07:32:34 -08:00
Alex Behm
69a840d965 Consistent memory estimates for explain tests.
Our new build machines (e.g., beefy) have more cores than our other machines,
so scan nodes may have a different memory estimate causing the explain tests
to fail. This patch fixes the num_scanner_threads to 1 for explain tests
to ensure consisteny estimates.

Change-Id: Ie6194f3c3b17d04aa141d04fcddb7ac948e92fcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1735
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-03-05 05:38:30 -08:00
Skye Wanderman-Milne
68f6d57809 More robust initialization of ScannerContext
This commit is in conjunction with the "Fix missing status check in
lzo scanner." commit in the LZO repo. It provides a test case for the
LZO fix, and changes the ScannerContext initialization so it will fail
more gracefully instead of crashing.

Change-Id: Idcafeb3679a8fa54322d1ec31c6f1aba860e4e4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1680
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9b84e3514c618bb3e171b5b3bb2ff862af4d35cc)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1752
2014-03-04 21:39:47 -08:00
Skye Wanderman-Milne
203fc66456 Add GetTypeDesc() method to FunctionContext.
This is currently only implemented for NativeUdfExpr.

Change-Id: I81b442c5668dff43d0486d1cfc445bca2af66606
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1664
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e1087c3a78e6e12938b583c302907bd32c59f524)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1720
2014-03-01 20:24:30 -08:00
Lenni Kuff
d6cbd3dc44 Ensure db/table names are always case insensitive in catalog topic entry keys
This fixes a bug that can happen with 'invalidate metadata <table name>' if the following
sequences of events happens:

1) Table is created in Impala (table names are always treated as lower case)
2) Table is dropped and re-created in Hive, using the same name but different casing
3) invalidate metadata <table name> is run in Impala, which will update the existing
   table with the version from the Hive metastore.

When building the next statestore update, the catalog server will send an update out
thinking that the table from 1) was dropped and the table from 3) was added because
the topic entry key is case sensitive. This may incorrectly remove the table from
an impalad's catalog. The fix is to always treat db/table names as case insensitive.

Change-Id: Ib59edc403989781bf12e0405c0ccd37b8e41ee41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1634
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1637
2014-02-23 00:20:16 -08:00