Commit Graph

2266 Commits

Author SHA1 Message Date
Victor Bittorf
808f9a661a IMPALA-939: Regex should match anywhere in string.
Change-Id: I8dcd337c3b06b632017270670a4f199ec7ada648
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2296
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c97f82eaaf0efe9bd4c3da3d005464f425696a62)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2371
2014-04-25 16:16:15 -07:00
Victor Bittorf
46151dc7dd Adding EXTRACT builtin.
Change-Id: I6de20f336ecdfa3acd8d3a9166cff4a062baaacc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2247
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f233955020ffbd1023f2d6adbbfb22e267986305)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2370
2014-04-25 15:38:51 -07:00
Alex Behm
91e1eb0789 CDH-18563: Speed up the computation of transitive value transfers.
The issue: Computing the full transitive closure for all slots can be very
expensive (10s of seconds for >2k slots, minutes for >4k slots).
Queries with many views and/or unions were affected most because each
union/view adds a new tuple with slots, increasing the total number of slots.

The fix: The new algorithm exploits the sparse structure of the value transfer
graph for a significant speedup (>100x). The high-level steps are:
1. Identify complete subgraps based on bi-directional value transfers, and
   coalesce the slots of each complete subgraph into a single slot.
2. Map the remaining uni-directional value transfers into the new slot domain.
3. Identify the connected components of the uni-directional value transfers.
   This step partitions the value transfers into disjoint sets.
4. Compute the transitive closure of each partition from (3) in the new slot
   domain separately. Hopefully, the partitions are small enough to afford
   the O(N^3) complexity of the brute-force transitive closure computation.

Change-Id: I35b57295d8f04b92f00ac48c04d1ef1be4daf41b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2360
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-24 23:53:28 -07:00
Henry Robinson
0117a92268 CDH-18751: Impala should use a unique Kerberos credentials cache location
Kerberos, by default, uses a location for its credentials cache (where
tickets are saved to avoid having to re-authenticate on each connection)
that is unique only to the user. This means that if several programs
running as the same user try and authenticate concurrently, they may
race to access the cache, and an error might occur.

This patch causes Impala's demons to use a path that is unique to
process name and process id (e.g. /tmp/krb5_impalad_4324). The new flag
--krb_credential_cache can be used to set the location explicitly.

Any Kerberos connections initiated from Java will still use the system
default cache. The catalog service is the only such service. It's
important that the cache locations for Java and C++ are different (to
avoid the same bug being exposed in a single process, which is actually
its most likely cause). The Java location can be changed by setting the
KRB5CCNAME environment variable.

Change-Id: I4d975a3a7cf5ff6ee4c107833f5292115cd33b03
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2308
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 36327bd74c76dc69207d148cc401190fe5a9a50c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2351
2014-04-24 19:42:50 -07:00
Skye Wanderman-Milne
bd2fc2d1d4 IMPALA-934: Refresh cached UDF library when creating a new function
This change adds the ability to refresh a local cache entry, causing
the old cache entry to be dropped and the library to be reloaded from
HDFS. This is used in ResolveSymbolLookup(), which is called by the
frontend when creating a new a function, and in ImpalaServer when
receiving a "create function" heartbeat. This change also makes sure
the FE calls into the backend for jars, so jars get refreshed as well.

Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348
2014-04-24 19:39:16 -07:00
Henry Robinson
a3b8215956 Fix two unused variable warnings
Change-Id: Ib970d8300ba289455fcbc6843001c8ec6844e009
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2340
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 7584189b8c9d33a109d01047e9c86978a8b9b78d)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2349
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-24 19:33:40 -07:00
ishaan
405a6fbba3 [CDH5] Change the hdfs-site template to work for CDH5
The hdfs-site template in CDH5 is different from the one we fine in CDH5. Specifically:
  - It has entries that enable hdfs caching.
  - It uses the correct parameter name for hdfs block locations timeout.

Change-Id: I0ca6bd84b074ccbb8f42243d37c5082b305f9bcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2338
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-04-24 11:36:56 -07:00
Alex Behm
121fab8fdf IMPALA-888: Drop union operands with constant conjuncts evaluating to false.
This patch simplifies the complex slot materialization logic for unions by
making the materialization independent of conjuncts assigned to MergeNodes.
When 'pushing down' predicates into union operands, we drop union operands
with constant predicates evaluating to false. Constant predicates that
evaluate to true are simply ignored.

Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-23 18:25:14 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Lenni Kuff
45a734f6cd IMPALA-951: Throw parser error if no partition spec specified in ALTER TABLE ADD/DROP PARTITION
Change-Id: I876423e39d858d602ed0fbe8369a6714c82639d8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2295
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2320
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-23 11:20:35 -07:00
Matthew Jacobs
25c0ebf58c External Data Source: Public API
Adds the thrift structures for the public external data source API
and a new maven project containing the Java ExternalDataSource
interface and the generated Java thrift classes.

The ExternalDataSource.thrift structures can evolve in a backward
compatible way. The ExternalDataSource Java interface will always
contain a version number in the namespace (e.g.
com.cloudera.impala.extdatasource.v1 for V1) so we can potentially
make breaking changes to the interface in the future but still
support older versions.

A trivial implementation of the ExternalDataSource API is also
added for testing purposes.
TODO: Make the sample data source implementation realistic.

Change-Id: I827d6420a87ed7a2bce34e050362ca98ddc5dbcc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2241
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f29814e9ede9d4c889f2648606fcf511feeb47ae)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2313
2014-04-22 18:34:48 -07:00
Matthew Jacobs
b1c331fd81 IMPALA-956: RequestPoolService should use short username of principal
We should be using the short name of a Kerberos principal (e.g.
user/fully.qualified.domain@realm.com) or LDAP username (e.g. user@domain)
when checking group membership in RequestPoolService. Right now we call
UserGroupInformation.createRemoteUser() with the full user name and it
will throw an exception.

Change-Id: I39d849627cb49760807504d66109c05b7a399482
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2288
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 0005da9cb71f5a4a4ed6bb1dfcd74f8526cd8316)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2305
2014-04-22 13:55:39 -07:00
Victor Bittorf
c414c91931 Adding TRUNC builtin.
Includes additions to builtin UDF registration to support prepare/close.

Change-Id: I22668fa7ee033b3fa37050b7bccee935571ac453
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2243
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-04-22 13:17:12 -07:00
Alex Behm
689870ca3a IMPALA-914: Map null type to boolean in JDBC to be compatible with Hive.
Change-Id: I5831ae7d5dcb03aecea4138d0b13487898951068
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2279
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2282
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-04-21 15:00:32 -07:00
Lenni Kuff
bb09b5270f IMPALA-839: Update tests to be more thorough when run exhaustively
Some tests have constraints that were there only to help reduce runtime which
reduces coverage when running in exhaustive mode. The majority of the constraints
are because it adds no value to run the test across additional dimensions (or
it is invalid to run with those dimensions). Updates the tests that have
legitimate constraints to use two new helper methods for constraining the table format
dimension:
create_uncompressed_text_dimension()
create_parquet_dimension()

These will create a dimension that will produce a single test vector, either
uncompressed text or parquet respectively.

Change-Id: Id85387c1efd5d192f8059ef89934933389bfe247
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2149
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e02acbd469bc48c684b2089405b4a20552802481)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2290
2014-04-18 20:11:31 -07:00
Alex Behm
c8e928119d IMPALA-912: Enforce slot equivalences at the lowest possible plan node.
The reported issue is that we can have redundant hash expressions in exchanges.
The underlying cause is that we fail to remove redundant join predicates.
This patch enforces slot equivalences based on our computed equivalence classes
at the lowest possible plan node by generating new equality predicates.
Each plan subtree now has a minimal set of equality predicates that express
all known equivalences between slots belonging to tuples materialized at that
plan node.
As a result, eliminating redundant join predicates becomes trivial: It is
sufficient to pick a single representative predicate of each relevant equivalence
class. All predicates beyond that are redundant.

Change-Id: I7998fe8d7bdf84cc8eb129d32c86269bedeab68e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2177
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2278
2014-04-18 13:28:49 -07:00
Lenni Kuff
15327e8136 Migrate DataErrors tests to Python test framework, re-enable subset of tests
This re-enables a subset of the stable data errors tests and updates them to
work in our test framework. This includes support for updating results via --update_results.

This also lets us remove a lot of old code that was there only to support these disabled
tests.

Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277
2014-04-18 02:25:11 -07:00
Nong Li
ac230c7021 Fix active time reporting in runtime profiles.
- A few places didn't have total timer at the beginning.
- Async build thread for blocking join nodes really messed things up (sum of
  children was more than the time in the join node).

Change-Id: I9176ce37cf22f2bcebea21b117e45cce066dbc1d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2276
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-18 02:24:28 -07:00
Henry Robinson
2a69019525 IMPALA-945: Fix column reordering with SELECT expressions
Previously, to produce the correct output expressions for the root plan
fragment before a table sink, InsertStmt would reorder the result
expressions for the query statement at the plan root. This had stopped
working for SelectStmts (and test coverage didn't catch that).

Now InsertStmt produces its own output expressions that can substitute
for the originals from the query statement, and the planner uses those
instead.

All query tests for column reordering have been duplicated to use SELECT
expressions.

Change-Id: Ib909fe35d27416b33ba2e5ac797aa931e1fe43f9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2204
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit d526db7ac6274f35b6affcb7428327100026e14e)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2275
2014-04-18 00:12:12 -07:00
Nong Li
1cab95066d Add the return type as a column for SHOW FUNCTIONS.
Also includes some misc pattern matching cleanup.

Change-Id: I6c9ec78b094a73864b4d669afbd75a48c9bf9585
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2199
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2271
2014-04-17 17:58:13 -07:00
Nong Li
831c0bbdc1 IMPALA-949: Fix scan range initial queue capacity.
Change-Id: I289c61587da75b318ba5a543d31010920a9cffe9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2268
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-17 15:13:54 -07:00
Nong Li
85be9a5050 Update bin/make* -notests to include other artifacts for packages.
Change-Id: I95e95f0a2e2131875b95d6676620bec7117b7f8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2250
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-16 00:37:00 -07:00
Nong Li
44fd279f18 Decimal: switch out the boost 128_t int with the c++ standard one.
The c++ standard int128_t is exactly what we want. It is 16 bytes, stored as 2's
complement little endian (the exact extension of the native int types). It out
performs the boost library we were using (see benchmark) and looking at the assembly
for some of the operators, I doubt we can do better. This also seems like the kind
of thing hardware might be able to do natively in the future if we stuck with the
standard implementation.

This requires minimal changes to the rest of our code so the multi int library is
abstracted away.

The standard only added int128 and not 96 or any others. We still will need to use
the boost library for some cases but nothing in the hot path. We might want to revisit
implementing an int96 in the future that is of the same format to get some space
and efficiency savings but I think we can live with just int128 for a while.

Change-Id: I137ef7be812675036dd9b6e5b48dfc5c7aa9ab37
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2200
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2249
2014-04-15 23:24:46 -07:00
Matthew Jacobs
d0c353a9b4 IMPALA-922: Return helpful errors with Yarn group rules
When the -fair_scheduler_allocation_path is configured with a policy that uses
the "primaryGroup" Yarn queue allocation rule, Yarn throws an error if the user
is not on the local OS. Currently the user will get an error message that says:
"java.io.IOException: No groups found for user <username>". We now return a more
helpful error message.

Change-Id: I014ac15ef607e473957752f23af94d0cc4efec0f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2078
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 3cf37dc4e91afe887ada988f256b7008983580d2)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2244
2014-04-15 15:32:05 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00
Srinath Shankar
b1f46c8029 IMPALA-913: Revisit the use of FNV Hash in exchange
FNV hash has the property that the least significant bit of the hashed value
is just the XOR of the LSBs of its input bytes. This results in poor
distribution of rows when the partition keys are duplicated -- for example,
if the partition key is (l_orderkey, l_orderkey). A recommended technique
to mitigate this is to generate a larger hash and use XOR-folding to reduce
it to the desired length.

In this patch FnvHash has been modified to use generate a 64-bit hash and
fold the result down to 32-bits. It has been renamed FnvHash64to32 to make
this explicit.

Change-Id: Ie12ad3f863fca15092803d3e4d616a654cb8d244
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2220
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-04-14 12:03:53 -07:00
Nong Li
3bbe002d19 [CDH5] Break up locking in DiskIoMgr::ScanRange.
Currently, the entire object is protected by one lock. Unfortunately this
lock is taken during calls into libhdfs. This means it is impossible for the
scan node to pull off ready buffers while the disk thread is reading from
this scan range. Whoops. The lock is taken while in libhdfs to facilitate cleanup
so it's very easy to split the big lock up.

Change-Id: Idbf34cdba0cf860a90f9cad016d1ec133f923d85
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2143
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2202
2014-04-13 19:50:25 -07:00
ishaan
5803e6883e Cleanup and re-enable some tests in TestPartitionMetadata
Partition metadata tests were marked as xfail because of IMPALA-624. Additionally, we had
to invoke hive to insert into two partitions pointing to the same location (this
limitation is now removed). This patch changes the test to use Impala exclusively,
removes the xfail tag and adds a teardown method to the test class.

Change-Id: I15fa97bef4f8714d0873a9c713627a198f3388ad
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2086
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2215
2014-04-13 17:55:43 -07:00
ishaan
0e0c480262 Re-enable some tests in test_describe_formatted
A few tests which dealt with running queries via hs2 and impala were marked as xfail as
hiveserver2 would occasionally not come up. Given that we now have a script that checks
whether hiveserver2 is up before continuining the build, it should be safe to remove the
xfail.

Change-Id: I2b5063e7259c01fc0ef8ffda86d85514c9cf959c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2082
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2214
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-13 17:51:45 -07:00
ishaan
6f416dd2c2 Close all queries in test_cancellation
The queries in test_cancellation are currently cancelled but not closed, causing some test
queries to eventually time out because the admission controller limits are passed. This
patch ensures that all queries issued in test_cancellation are closed.

Change-Id: I65b26672155e31889bb6f43d3ac87be0f7b4eb72
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2187
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2213
2014-04-13 17:45:51 -07:00
Nong Li
f9dd32724c Cleanup build scripts.
Consolidated our build scripts and added the -notests option which skips
build the BE tests.

Change-Id: Ida6aa064b7fe47e535c142b9af92b7c158e83c32
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2043
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2201
2014-04-13 17:11:39 -07:00
Nong Li
1a3caca8c4 [CDH5] Update execution engine to take advantage of DN caching.
This finishes up the support to use HDFS caching. The scheduler will
prefer replicas that are cached and the scan node plumbs the metadata
to the io mgr.

This is a bit hard to test without a cluster and some perf benchmarking.
I've added a basic test to make sure the path is being exercised.

Change-Id: I8762ca9ef2f88c3637113d3c5ee82f4c0ea7f1be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2212
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-13 17:11:21 -07:00
Nong Li
826a57d246 IMP-1339: Fix crash in rcfile scanner from mem-pool tracking bug.
In the case where the MemPool fails in FindChunk, we were not properly
updating the MemPool's state.

Change-Id: I3ed9bd7ee9505cfaf4c7812304c1da85ae06f72f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2160
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2203
2014-04-13 16:02:59 -07:00
Lenni Kuff
d101ef86e2 [CDH5] Bump version to 1.4.0-cdh5-INTERNAL
Change-Id: I0a0334084e444c948f1718133afb2d7246dde414
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2193
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-11 16:03:09 -07:00
Skye Wanderman-Milne
c85d88714f Fix buffer overflow bug in StringCompare()
Includes benchmark for comparing different StringCompare() implementations.

Change-Id: Ib4623b3ae6c99977af332ce5161da66af3cae9e5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2190
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-04-11 11:16:50 -07:00
Skye Wanderman-Milne
e60bf29a96 IMPALA-13: Use SSE string functions that take an explicit length
This patch modifies DelimitedTextParser and StringValue to work with
data containing null characters by using SSE instructions that take a
length, rather than expecting null-terminated strings. It also adds
some other minor changes to correctly handle data with nulls and to
faciliate testing. I checked the execution time of a count(*) and a
select(*) limit 1 query locally, and saw no difference for either text
or sequence files.

Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181
2014-04-11 11:16:24 -07:00
Alex Behm
2fff51d9e9 IMP-1329,IMPALA-924: Make ExchangeNode::Open() block until rows are available.
The bug: Coordinator::Wait() is supposed to block until rows become available for
consumption by the client. We rely on Wait() to determine when to advance the query
status to a 'ready' state and signal to the client that rows can be fetched.
Long fetch times can trigger client timeouts at various levels (socket, app, etc.).
Coordinator::Wait() simply opens the coordinator fragment's plan tree.
For most plan nodes, Open() does work to prepare the plan tree, s.t., GetNext()
returns quickly. However, for ExchangeNodes Open() used to not wait
until rows are obtained form the underlying stream receiver.
The fix: Make ExchangeNode::Open() block until rows are available.

Change-Id: I7b197eea11d21fd732414d96c899a17b2d99631c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2185
2014-04-10 23:49:38 -07:00
Skye Wanderman-Milne
ba89e60a81 IMPALA-932: evaluate concat/concat_ws children once
Change-Id: Id22a6c1dfb57cf659a1e24af4de6e5a2336cafa4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2152
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit cf6960017d4f7d75c1c685cf362bd3d9cd9b63c7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2183
2014-04-10 20:20:36 -07:00
Alex Behm
91db96d903 IMPALA-762: Add the query status to Beeswax::get_log() and pick it up in the Impala shell.
COMPUTE STATS is an async DDL command. When COMPUTE STATS fails it will set the
query status of the QueryExecState properly, but the original Beeswax::query() RPC
won't throw. The Impala shell sometimes did not pick up and display the
query status because no RPC actually threw. To fix this, I modified
Beeswax::get_log() to include the query status if it is not ok. The shell looks
for a special prefix to distinguish the query status from the runtime state error log.

Change-Id: I0d9dbf0801629a37de22ea4ebb6d2e5d53b836ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1899
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2063
2014-04-10 15:47:06 -07:00
Henry Robinson
37236845b1 Mark test_non_codegen_tinyint_grouping as execute_serially
The test contains an INSERT and some DDL, which is racy if performed in parallel.

Change-Id: I2b88533f45756fcf6372d6ee4eb7edd474087048
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2167
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit 8b103c029cc341bacea4746c369bb58e6af5ed29)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2182
Tested-by: jenkins
2014-04-10 15:17:25 -07:00
Lenni Kuff
342ff28ae2 IMP-1332: Remove unused 'nss-pam-ldapd' openldap contrib module from /thirdparty
Change-Id: I478d9238864052981377a03cd90d37f60129c70e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2081
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2180
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-10 12:20:33 -07:00
Lenni Kuff
9e2dd7e049 Add support for SHOW PARTITIONS <table name>
This statement returns info on all partitions for the given table. It is implemented as
an alias for SHOW TABLE STATS, with some extended analysis checks (such as throwing if
the statement targets an unpartitioned table).

Change-Id: I19154a9d90314de18f86ba355aa5dbed808f147f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2145
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2179
Tested-by: jenkins
2014-04-10 12:15:39 -07:00
Lenni Kuff
f1f4e99c85 [CDH5] IMP-1326: Impala assumes BlockLocation#getCachedHosts returns IP addresses
Impala determines the location (network address) of all block replicas using
the HDFS API BlockLocation.getNames(), which returns results in IP:port format.
To find where cached replicas are located we call BlockLocation.getCachedHosts(),
which returns results as hostnames. This caused an issue where we would compare
an IP address to a hostname to determine if a replica was cached.

The fix is to resolve cached hosts by comparing against BlockLocation.getHosts(),
which returns the block replica locations by hostname. getHosts() will always return
results in the same order as getNames() and getHosts() and may contain duplicate
entries (multiple data nodes on the same host), which is what we want. This allows
the same array index to be used to convert between the two location formats.

Change-Id: I74fdc20b1dc5200d7e0e90856b8b2088f050e215
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2156
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-08 22:06:14 -07:00
Alex Behm
0585dfb546 IMPALA-888: Materialize union slots referenced by constant predicates.
To keep the predicate assignment/propagation logic simple, we assign conjuncts
whose underlying base table exprs are constant in at least one union operand
to the evaluating MergeNode, and not in the operand(s) whose corresponding base
table exprs are constant.
The JIRA describes two different bugs:
The first bug was that the slots required for evaluating such predicates in the
MergeNode were not marked as materialized. The second bug was that predicates
'pushed' into union operands did not get re-analyzed after substituting the
predicate's exprs with the result exprs of that union operand. Missing casts
lead to a crash. The new test covers both bugs.

Change-Id: I0f5b8a366b32f7d4b2587e13793b6103cdf7e8b3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2162
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-07 18:32:29 -07:00
Henry Robinson
415540d789 IMPALA-901: Fix grouping with NULLs when codegen is disabled
The standard implementation of HashTable::Equals() did not correctly
check the NULL bit when the argument row did not evaluate to NULL for a
given probe expr. In the rare circumstance that this gave rise to a
false positive (more on that below), two rows with different grouping
values would be considered equal, and one would be excluded from the
final aggregation output.

HashTable::EvalRow() fills an expression value buffer with the values of
either probe or build exprs evaluated for the argument row. These cached
values are used to determine row equality in Equals(). In order to avoid
a lot of false collisions, an 'unlikely' value is written to that buffer
for NULL values, chosen to be HashUtil::FNV_SEED. So without correct
NULL-bit checking in Equals(), two single-slot rows are considered to be
equal if one of them has NULL for its slot, and the other has a value
equal to HashUtil::FNV_SEED truncated to the size of the slot.

For tinyint columns, this value is -59. As it happens, our random
generator happened to create a table with one tinyint column and which
contained NULL and -59 as values. In order to trigger this bug, the rows
must also have been written to disk in order such that the scanners
returned -59 *first*, and then NULL to the aggregation node; the bug is
not symmetric and works in the opposite case.

Change-Id: I17d43eaeee62b2ac01b67dd599bc4346b012a074
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2130
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e8098254280a9d5ead0b607263ca6728a3222a7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2161
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-07 17:30:52 -07:00
Alex Behm
8b319f8959 IMPALA-935: Make PlanFragment.getDestFragment() return null if no destination is set.
Change-Id: I269a7f552d7ff67ff4d65e86e8c6df9c41d0fca1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2159
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-07 16:21:24 -07:00
Alex Behm
a85dacafe8 IMPALA-904: Make TupleIsNullPredicate work on non-nullable tuples.
We wrap certain exprs substituted from outer-joined inline view in an expr that
evaluates to NULL if the underling tuple(s) are NULL. We do this for exprs that evaluate
to non-NULL values if their slots are NULL, i.e., we must then distinguish tuples that are
NULL from slots that are NULL (otherwise evaluating an expr against a tuple that is NULL
due to the outer join may incorrectly return a non-NULL value.)

The bug: Exprs referring to an outer-joined inline view may appear in various places
in the outer query block. For example, they could appear in an On-clause or be
placed into scans/aggregates due to predicate propagation. In such cases, the underlying
tuples may not be nullable yet because they only become nullable after the outer join.
We had a DCHECK in tuple-is-null-predicate.cc requiring the tuples to be nullable.
The fix: Remove the DCHECK. The fix is not elegant but practical. It would be rather
difficult to fix the inline view expr substitution such that a TupleIsNullPredicate
never references a non-nullable tuple, esp. due to predicate propagation.

Change-Id: I180f75f14173f356abfeec751e6b2d419378a9a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2157
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-07 14:18:49 -07:00
Henry Robinson
99c37aac37 IMPALA-827: Add an option for directories created by INSERT to inherit
their parent's permissions

This patch adds --insert_inherit_permissions. If true, all
new partition directories created by INSERT will inherit their
permissions from their parent. When false, the directories are created
with the default permissions.

Change-Id: Ib2b4c251e51ea5048387169678e8dde34ecfe5f6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1917
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-04-04 10:25:20 -07:00
Lenni Kuff
c798b23fd9 IMPALA-925: JDBC driver returns no results from getTables()/Columns() with null name pattern
Our HS2 Metadata Op implementation would not return any results if null was passed as the
table name or column name. Instead a null value should be treated the same as '%' (match
everything).

Change-Id: Ibad41e94724cd1f9c1caf40831e30a98132247d9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2137
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 7020c62545397872877c03a5e101e71edf8101bf)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2142
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-03 17:12:25 -07:00
Matthew Jacobs
4d9aad8b9c Admission controller: Change default values for the "default pool"
The admission controller is configurable via Yarn fair scheduler allocation
and Llama configurations, but a "default pool" is used when these files are
not provided. When a pool is defined in a fair-scheduler.xml but no limits
are specified, the following Yarn/Llama default values are used: the max
number of concurrent requests is 20, the max queue size is 50, and the mem
limit is unlimited.

This changes the default values of the "default pool" limits so that the
limits are consistent with the defaults from Yarn/Llama.

Change-Id: Ic76ff550c18cc49353c72926591af46dcbe26ac7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2006
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 1619d83e452e5b868d12e3934e9704fc5f16cac7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2118
2014-03-31 15:53:26 -07:00