Commit Graph

734 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
a618d34f17 More decimal builtins.
Change-Id: Ie5b89ad7d1fc80fa646f7cf5f520db13b25b9565
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2764
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e994ce7712047000d3a12b5eb677b5470687370)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2830
2014-06-06 19:42:45 -07:00
Nong Li
895d69c09f IMPALA-1026: Fix decimal partition cols.
Change-Id: I956b69a86528f1969febf356181dc3182f309909
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2841
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-06-06 09:26:56 -07:00
Lenni Kuff
12a08ec07c IMPALA-788: Loading an HBase table when HBase/Zookeeper are unavailable takes a long time to fail
This is because the default HBase/Zookeeper client timeouts are meant to be very long:

* If both Zookeeper and HBase are down, it will wait:
  hbase.client.retries.number(default 10) x hbase.rpc.timeout (default 60s)
* If ZK is up but HBase is down, but it has run at least once, then it waits:
  hbase.client.retries.number * some exponential backoff time (default sleep time is 1 second,
  backoff tables looks like this: 1, 1, 1, 2, 2, 4, 4, 8, 16, 32, 64).

In my experiments, it takes ~20-25minutes if HBase before the table loading will fail. If there
are many HBase tables this can block all loading threads.

The fix is to change the default timeout values to fail faster. These values were suggested
by somoene from the HBase team. With these values we will fail in ~1 minute. I am working with the CM team to get the defaults changed there as well.

Change-Id: I625e35af57374c72d50d03372d177624ce67694a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1903
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit dcbd4db64a0d764f5caf06ba87c9b90ab643f0d7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2884
2014-06-06 09:08:21 -07:00
Dimitris Tsirogiannis
0348a36b49 IMPALA-887: Improve partition pruning time (final)
This commit contains the final set of changes for improving the
performance of partition pruning. For each HdfsTable, we materialize a
set of partition value metadata that allows the efficient evaluation of
simple predicates on partition attributes without invoking the BE. These
changes result in three orders of magnitude performance improvement
during partition pruning.

Change-Id: I5b405f0f45a470f2ba7b2191e0d46632c354d5ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2700
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2823
2014-06-03 23:17:44 -07:00
Nong Li
89e115436f SHOW FUNCTIONS should return the functions sorted by signature.
Change-Id: Ia843331ff22bd482e716ed12e09b6778fc53dac2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2818
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-03 20:50:18 -07:00
Lenni Kuff
f34a0507bf [CDH5] Add support for Sentry Service to Impala
This change adds support for authorizing based on policy metadata read from the Sentry
Service. Authorization is role based and roles are granted to user groups. Each role
can have zero or more privileges associated with it, granting fine grained access to
specific catalog objects at server, URI, database, or table scope. This patch only
adds support to authorize against metadata read from the Sentry Policy Service, it does
not add support for GRANT/REVOKE statements in Impala.

The authorization metadata is read by the catalog server from the Sentry Service and
propagated to all nodes in the cluster in the "catalog-update" statestore topic. To
enable the Catalog Server to read policy metadata, the --sentry_config must be
set to a valid sentry-site.xml config file.

On the impalad side, we continue to support authorization based on a file-based provider.
To enable file based authorization set the --authorization_policy_file to a
non-empty value. If --authorization_policy_file is not set, authorization will be done
based on cached policy metadata received from the Catalog Server (via the statestore).

TODO: There are still some issues with the Sentry Service that require disabling some of
the authorization tests and adding some workarounds. I have added comments in the code
where these workarounds are needed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-03 07:19:52 -07:00
Nong Li
e6b7565eff Fix decimal literal casting and cast expr reanalyze().
BigDecimal doesn't think about scale the way we need it to.

Change-Id: I09612c31e30e80ce4806080f1d24c6615090785e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2794
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-02 23:34:20 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Nong Li
26ca559f38 Add decimal builtins: abs/round/ceil/floor/truncate.
Change-Id: I4fe0ee69475ff56d3dc0cd69ea21f677714ae8bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2748
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 11:53:06 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Matthew Jacobs
12b72c4330 IMPALA-1011: Handle SHOW DATA SOURCES when no sources configured
Change-Id: I367b90c7603aea973d442f9186a6b32598a66a28
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4df5c6d741237e9c91e84e39fd6ea760ccb40cf5)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2723
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-28 20:38:41 -07:00
Lenni Kuff
745c091fcc [CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats
Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-28 08:54:54 -07:00
Lenni Kuff
d529c6446f Cleanup builtins db initialization
Instead of doing the initialization in Catalog.java, there is now a special
BuiltinsDb that handles this initialization. This makes in more clear what file
should be modified when adding a new builtin and also cuts down the code in the Catalog.

Change-Id: I4512abff6e8c7f4924701aeffe10e656104a0b86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2567
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 761a8728de309a20c077913aa154c6259d29d1e8)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2644
2014-05-28 08:50:44 -07:00
Ippokratis Pandis
30357f2351 IMPALA-1003: Speed up compute stats by avoiding null counting for all types and max/avg for fixed-size types.
Removing the Null counting as well as the MAX and AVG calculations
for the types of known size

Change-Id: If218b41fcde0b18218732675585d69342f34e544
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2629
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2681
2014-05-27 19:12:40 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Nong Li
53a6ead074 Disable hive UDFs that use decimal in analysis.
Change-Id: Ifd3e134d0e8ba6ad054e5769da57c9a830ad74d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2676
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2694
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-27 15:54:21 -07:00
Dimitris Tsirogiannis
ca86e470de IMPALA-887: Improve partition pruning time
This commit is the first step in improving the performance of partition
pruning. Currently, Impala can prune approximately 10K partitions per
sec, thereby introducing significant overhead for huge table with a
large number of partitions. With this commit we reduce that overhead by
3X by batching the partition pruning calls to the backend.

Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687
2014-05-26 13:10:12 -07:00
Dimitris Tsirogiannis
2dea6708b2 CDH-19292: Impala query on an HBase table takes almost same time as Hive
This commit fixes issue CDH-19292, where querying an HBase table takes
a significant amount of time if HBase has a large (>1K) number of
tables. The performance bottleneck is the call to HBase to retrieve the
table's metadata (column families) during the computation of row stats.
To resolve this issue we cache the column families in the catalog object
associated with an HBase table, so the expensive call to HBase only
happens the first time the table is queried. Subsequent calls use the
cached data.

Change-Id: I0203fee3e73d2a4304530fe0a1ba2cf163f39350
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2672
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2679
2014-05-23 21:54:02 -07:00
Victor Bittorf
c13a1d080e IMPALA-938: Fix implicit casting in timestamp arithmetic exprs.
Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671
2014-05-23 14:11:35 -07:00
Lenni Kuff
e34a3d4872 IMPALA-825: Synchronize Hive Metastore client connection creation
The creation of hive metastore clients is now synchronised to avoid the possibility of
race conditions accessing local Kerberos state. In experiments, this does not fully
resolve the issue but significantly reduces the chances it will occur.

Also adds in a new debug config option to optionally sleep for a specified number of
milliseconds between creating connections. The default is zero, but can be configured
by setting impala.catalog.metastore.cnxn.creation.delay.ms in the hive-site.xml.

Change-Id: I83e863760470bdc2d9b27c6669f35604111d69d7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2661
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b0b486028ce46be26967aa202a4b1acea22d9311)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2665
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-05-23 00:22:22 -07:00
Nong Li
723f583b4d Allow adding predicates after processing build table.
Change-Id: I4c845d9f08f0be29e548eceac3912871acd0270f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2658
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-22 13:09:51 -07:00
Nong Li
5729024fe9 IMPALA-984: Fix missing reanalyze in InlineViewRef and NULL handling.
Change-Id: Ia80035c5456630aeef7a24288a998fe08546a282
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2652
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-21 18:18:29 -07:00
Henry Robinson
e87c0eb22a [CDH5] Detect pseudo-distributed Llama cluster
Since we're no longer using the MiniLlama, we need to explicitly set
whether or not the cluster is pseudo-distributed. Impala needs this
information to correctly translate datanode addresses to a format that
Llama understands.

This change (adapted from one made by Casey) adds a method to the
frontend (callable via JNI) to get a configuration value from the Hadoop
configuration. We'll set that configuration value for local RM testing.

Change-Id: Ifd51db98a993ac0270dac2b832babbc394483c1a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2549
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-05-20 21:24:33 -07:00
Alex Behm
1b9a8020bf IMPALA-996: Exclude non-materialized slots from a tuple's avgSerializedSize.
Change-Id: Ic7936c6b5c5e6d4c162d91105128cda2b1b7284c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2617
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2626
2014-05-20 16:21:59 -07:00
Alex Behm
b252921363 IMPALA-994: Handle incorrect column metadata in views created by Hive.
Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-05-19 20:17:23 -07:00
Matthew Jacobs
f9c9a7ca13 Add SHOW DATA SOURCES
Change-Id: Ieeb0df107f45a58b8a99f717e96453da93ee7270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2529
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b2392c5bfe9fc928ad19af6ff6737e6dc6324e63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2614
2014-05-19 17:52:27 -07:00
Matthew Jacobs
6ccd56bc1f Enforce slot equivalences at data source scan nodes
Change-Id: I2ed606ba398990ab05afa3301b6356c6a636e2bb
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2521
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 55061f6953956f45d433fe227ded539a648e3f9c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2536
2014-05-19 14:37:44 -07:00
Dimitris Tsirogiannis
a7a9cde86f CDH-18969: Incorrect query result in Impala
This commit fixes issue CDH-18969 where Impala returns wrong results
when querying an HBase table. This issue is triggered when a column family
sorts lexicographically before ":key", which is the column family of the
row key, thereby causing the wrong column to be used as a row key by the
backend.

The following changes are included:
1. Modified the load function in HBaseTable.java to make sure the
catalog object of an HBase table always stores the row key column first.

Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602
2014-05-18 16:29:11 -07:00
Alex Behm
fcf4e43a3c IMPALA-962: Fully qualify table and view names in toSql().
Change-Id: I6bf757c4ffbaf82c136af7b59d2d415234545a86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2373
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2589
2014-05-16 01:26:38 -07:00
Nong Li
4b883ac7eb Fix decimal bugs.
Fix overflow handling in a few cases and add decimal as hs2 type.

Change-Id: Ifde1988365f6be961e7eb7404ed37d7bbaab875c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2531
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2564
2014-05-16 00:17:38 -07:00
Lenni Kuff
61cbdd4f49 [CDH5] Add Sentry Service to local test environment
Adds the ability to start/stop the Sentry Service to our local test environment and
load the sentry-site.xml configs. Since the existing Sentry startup scripts don't work
I wrote a simple wrapper to handle service startup.

Change-Id: I1b77a2e50e51e6e6eae58cfed4d5d7c403dbc0b4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2540
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-05-14 12:02:02 -07:00
Lenni Kuff
f1d9c0f58b [CDH5] Update Impala's Sentry dependency to Sentry v1.3 (from v1.2)
This updates Impala to use Sentry v1.3 instead of Sentry v1.2. No major functionality
changed between Sentry versions, but some Sentry classes were moved and APIs changed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38ebd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2319
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-13 02:57:07 -07:00
Lenni Kuff
a02577f9f3 Remove dependency on Hive for Impala Avro schema parsing
This change removes Impala's dependency on Hive for parsing the Avro schema. To remove the
dependency I use the Avro library to parse the schema and then reused some of Hive's AvroSerDe
code to perform the Avro -> Hive type mapping.

It also adds support for parsing DECIMAL type information from an Avro schema.

Change-Id: I4359210ce50ddc3c0d03fe9eb30d35cc8e45a797
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2460
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2522
2014-05-12 17:14:08 -07:00
Dimitris Tsirogiannis
2d7a8b7c70 IMPALA-964: Full outer join on values() followed by group by hits a
preconditions check

This commit fixes IMPALA-964 where full outer join between two inline
views followed by a group by (e.g. select 1 FROM (VALUES(1 x, 1 y)) a
FULL OUTER JOIN (VALUES(1 x, 1 y)) b ON (a.x = b.y) GROUP BY a.x;)
hits a preconditions check. This check evaluates if the numNodes
(number of nodes for the purpose of resource estimation) variable
is greater or equal to zero and is triggered when we try to compute
the resource estimates (number of distinct values) of a plan fragment.

The following changes are included in this commit:
1. Modified the getNumDistinctValues function in PlanFragment class to
consider the special case where the numNodes of a plan fragment is -1.
2. Added a test case in QueryTest/joins.test.

Change-Id: I2962ed5079e174d0e76ad990ab84e1fb1a4607ef
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2466
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2514
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-05-11 19:30:38 -07:00
Nong Li
7eb4f7f226 Fix FE bug from missing casts for functions with wildcard decimal args.
Before, if none of the child args where decimals, the casts would be added.
This is because getResolvedWildcardDecimalType could return a non-null, non-decimal
type.

Change-Id: I652c3afc00ce4c2047660dc4d226b299a11069a6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2507
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2517
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-11 15:11:12 -07:00
Matthew Jacobs
fb49706ec8 Add additional types to TColumnValue and fix field names
Adds 8 and 16 byte integer values and a binary value to TColumnValue
and fixes the field names.

Change-Id: Ie318fe7dad43b0cc0032b65b6b04c3fe173ae9b8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2418
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 68c476822402d27d985ed78fa5d14a843b681082)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2493
2014-05-08 17:38:54 -07:00
Matthew Jacobs
ebc6c5894e External Data Source: Frontend and catalog changes
Initial frontend and catalog changes for external data sources.

Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485
2014-05-08 14:56:19 -07:00
Dimitris Tsirogiannis
1a21bb9b9e IMPALA-642: Conjunctive predicates on HBase table not working...
This commit fixes IMPALA-642 issue where conjunctive predicates are
returning incorrect results from HBase in the presence of NULL values.

The following changes are included:
1. Modified the HBaseScanNode to re-apply the "pushed-down" predicates.
2. Added tests in QueryTest/hbase-filters.test
3. Added tests in PlannerTest/hbase.test

Change-Id: I598b325ad63b043b325fba74448698ed71a3cd78
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2414
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2489
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-05-08 13:59:00 -07:00
Victor Bittorf
6f31dc7f8a Adding STDDEV builtin.
Change-Id: I79e5aee1e9e879aa2d09078ab45bc149675e1d4a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2341
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a42c375d933c0b7ffe7c9b6702777679492d7ad6)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2464
2014-05-06 13:06:26 -07:00
Henry Robinson
f968bb6087 IMPALA-923: Boolean slotrefs not marked as assigned in inline views
A boolean slotref predicate that could be pushed into an inline view
would not be correctly marked as assigned, leading to an extra select
node being introduced to evaluate it. This was because the id of the
expression after substitution would change (see createInlineViewPlan()),
but only the post-substitution conjunct IDs were marked as assigned.

This bug only affected standalone slotrefs; other exprs (like casts, or
explicit predicates referencing a slotref) would not change their ID
under substitution.

Change-Id: I4127528b4aec25c966a4d186ddc98a68502b90c1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2430
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b49bfdf57769615d43d86fcfce2269531640788a)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2435
2014-05-02 18:45:21 -07:00
Alex Behm
d9c436912e Clarify prevention of value transfers into outer-joined inline views with a limit.
Change-Id: I1cd239314bb844f5eeaaf808610dda99a9206e35
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2377
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-25 19:35:58 -07:00
Alex Behm
91e1eb0789 CDH-18563: Speed up the computation of transitive value transfers.
The issue: Computing the full transitive closure for all slots can be very
expensive (10s of seconds for >2k slots, minutes for >4k slots).
Queries with many views and/or unions were affected most because each
union/view adds a new tuple with slots, increasing the total number of slots.

The fix: The new algorithm exploits the sparse structure of the value transfer
graph for a significant speedup (>100x). The high-level steps are:
1. Identify complete subgraps based on bi-directional value transfers, and
   coalesce the slots of each complete subgraph into a single slot.
2. Map the remaining uni-directional value transfers into the new slot domain.
3. Identify the connected components of the uni-directional value transfers.
   This step partitions the value transfers into disjoint sets.
4. Compute the transitive closure of each partition from (3) in the new slot
   domain separately. Hopefully, the partitions are small enough to afford
   the O(N^3) complexity of the brute-force transitive closure computation.

Change-Id: I35b57295d8f04b92f00ac48c04d1ef1be4daf41b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2360
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-24 23:53:28 -07:00
Skye Wanderman-Milne
bd2fc2d1d4 IMPALA-934: Refresh cached UDF library when creating a new function
This change adds the ability to refresh a local cache entry, causing
the old cache entry to be dropped and the library to be reloaded from
HDFS. This is used in ResolveSymbolLookup(), which is called by the
frontend when creating a new a function, and in ImpalaServer when
receiving a "create function" heartbeat. This change also makes sure
the FE calls into the backend for jars, so jars get refreshed as well.

Change-Id: I5fd61c1bc2e04838449335d5a68b61af8b101b01
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2286
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit e8587794b3b82438190c91b2ebe9d1e12db73981)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2348
2014-04-24 19:39:16 -07:00
Alex Behm
121fab8fdf IMPALA-888: Drop union operands with constant conjuncts evaluating to false.
This patch simplifies the complex slot materialization logic for unions by
making the materialization independent of conjuncts assigned to MergeNodes.
When 'pushing down' predicates into union operands, we drop union operands
with constant predicates evaluating to false. Constant predicates that
evaluate to true are simply ignored.

Change-Id: I0e7ccfb206bed29db2b5d667e2bb61310980e80a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2327
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-04-23 18:25:14 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Lenni Kuff
45a734f6cd IMPALA-951: Throw parser error if no partition spec specified in ALTER TABLE ADD/DROP PARTITION
Change-Id: I876423e39d858d602ed0fbe8369a6714c82639d8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2295
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2320
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-23 11:20:35 -07:00
Matthew Jacobs
b1c331fd81 IMPALA-956: RequestPoolService should use short username of principal
We should be using the short name of a Kerberos principal (e.g.
user/fully.qualified.domain@realm.com) or LDAP username (e.g. user@domain)
when checking group membership in RequestPoolService. Right now we call
UserGroupInformation.createRemoteUser() with the full user name and it
will throw an exception.

Change-Id: I39d849627cb49760807504d66109c05b7a399482
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2288
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 0005da9cb71f5a4a4ed6bb1dfcd74f8526cd8316)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2305
2014-04-22 13:55:39 -07:00
Victor Bittorf
c414c91931 Adding TRUNC builtin.
Includes additions to builtin UDF registration to support prepare/close.

Change-Id: I22668fa7ee033b3fa37050b7bccee935571ac453
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2243
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-04-22 13:17:12 -07:00
Alex Behm
689870ca3a IMPALA-914: Map null type to boolean in JDBC to be compatible with Hive.
Change-Id: I5831ae7d5dcb03aecea4138d0b13487898951068
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2279
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2282
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-04-21 15:00:32 -07:00
Alex Behm
c8e928119d IMPALA-912: Enforce slot equivalences at the lowest possible plan node.
The reported issue is that we can have redundant hash expressions in exchanges.
The underlying cause is that we fail to remove redundant join predicates.
This patch enforces slot equivalences based on our computed equivalence classes
at the lowest possible plan node by generating new equality predicates.
Each plan subtree now has a minimal set of equality predicates that express
all known equivalences between slots belonging to tuples materialized at that
plan node.
As a result, eliminating redundant join predicates becomes trivial: It is
sufficient to pick a single representative predicate of each relevant equivalence
class. All predicates beyond that are redundant.

Change-Id: I7998fe8d7bdf84cc8eb129d32c86269bedeab68e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2177
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2278
2014-04-18 13:28:49 -07:00