Commit Graph

2376 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
a618d34f17 More decimal builtins.
Change-Id: Ie5b89ad7d1fc80fa646f7cf5f520db13b25b9565
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2764
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 6e994ce7712047000d3a12b5eb677b5470687370)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2830
2014-06-06 19:42:45 -07:00
Henry Robinson
8f26285801 IMPALA-1027 - Fix overflow in row count summary returned by CTAS
Change-Id: I1a538d4deec92db6a95166a081106f07e0787c1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2849
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-06 10:59:53 -07:00
Nong Li
895d69c09f IMPALA-1026: Fix decimal partition cols.
Change-Id: I956b69a86528f1969febf356181dc3182f309909
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2841
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-06-06 09:26:56 -07:00
Lenni Kuff
12a08ec07c IMPALA-788: Loading an HBase table when HBase/Zookeeper are unavailable takes a long time to fail
This is because the default HBase/Zookeeper client timeouts are meant to be very long:

* If both Zookeeper and HBase are down, it will wait:
  hbase.client.retries.number(default 10) x hbase.rpc.timeout (default 60s)
* If ZK is up but HBase is down, but it has run at least once, then it waits:
  hbase.client.retries.number * some exponential backoff time (default sleep time is 1 second,
  backoff tables looks like this: 1, 1, 1, 2, 2, 4, 4, 8, 16, 32, 64).

In my experiments, it takes ~20-25minutes if HBase before the table loading will fail. If there
are many HBase tables this can block all loading threads.

The fix is to change the default timeout values to fail faster. These values were suggested
by somoene from the HBase team. With these values we will fail in ~1 minute. I am working with the CM team to get the defaults changed there as well.

Change-Id: I625e35af57374c72d50d03372d177624ce67694a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1903
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit dcbd4db64a0d764f5caf06ba87c9b90ab643f0d7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2884
2014-06-06 09:08:21 -07:00
Srinath Shankar
60276f7e8c IOMgr changes for order-by without limit
Conflicts:

	be/src/runtime/disk-io-mgr-scan-range.cc
	be/src/runtime/disk-io-mgr.cc
	be/src/runtime/disk-io-mgr.h

Change-Id: I361d1c6d0f588a726f2add3f96bd1fc724ed83ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2666
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2877
2014-06-06 06:22:36 -07:00
Matthew Jacobs
5faa603461 IMPALA-978: Admission control error messages should have more actionable details
Change-Id: Ie3d36438e6f534a56de24ea3d188cdffacff4e7c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2778
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 817f677d58025a0ac5cefc5cbfcd85bdaf85b9e9)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2871
2014-06-05 22:02:06 -07:00
Matthew Jacobs
2f9b2ae785 Fix SHOW DATA SOURCE test; must execute setup/cleanup serially
The SHOW DATA SOURCE tests were run as part of the other SHOW * tests
in test_show(), but the setup/cleanup for data sources can't be run
in parallel. This change moves the SHOW DATA SOURCE tests into a separate
test method and the setup/cleanup code is only run for this test (i.e.
not using setup_method() and teardown_method()). The test is then
only executed serially.

Change-Id: I221145f49cfe7290e132c6a87a5295b747c1fcc7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2864
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5bcd769eae3a694d7f6f42d093f9197e8a4e8b77)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2870
2014-06-05 20:07:57 -07:00
Henry Robinson
d264ab90fe Add support for client SSL to Python Beeswax client
Change-Id: I0d9352471067bfe19e25221e0ecbbb08f945b962
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2810
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 545bd30d5cf3cae9a3581d7bc942a909a1a98806)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2850
Tested-by: Henry Robinson <henry@cloudera.com>
2014-06-05 10:48:23 -07:00
Nong Li
b5c5c05bcb Fix bad test. Needs to be overwrite to allow loading from snapshot.
Change-Id: I7abe2a105d72662c874debfb2b9ae98647b03a1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2853
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-06-05 08:36:46 -07:00
Dimitris Tsirogiannis
0348a36b49 IMPALA-887: Improve partition pruning time (final)
This commit contains the final set of changes for improving the
performance of partition pruning. For each HdfsTable, we materialize a
set of partition value metadata that allows the efficient evaluation of
simple predicates on partition attributes without invoking the BE. These
changes result in three orders of magnitude performance improvement
during partition pruning.

Change-Id: I5b405f0f45a470f2ba7b2191e0d46632c354d5ae
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2700
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2823
2014-06-03 23:17:44 -07:00
Nong Li
89e115436f SHOW FUNCTIONS should return the functions sorted by signature.
Change-Id: Ia843331ff22bd482e716ed12e09b6778fc53dac2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2818
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-03 20:50:18 -07:00
Nong Li
1b88bf928a IMPALA-974: Return NULL on overflow when casting decimals.
Change-Id: Ie4d7723570ea731f6cbaff16e43d8ff86b6d33c3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2817
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-03 20:43:47 -07:00
Henry Robinson
3e7e7ed0dc Fix impala-config.sh when JAVA_HOME not set
Change-Id: Iaefda2039de1a5aafc782bca582d3007abcf6eff
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2803
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 48db5de6825cba8b6a1c1c658ff79a9641341dca)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2814
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-06-03 19:48:57 -07:00
Alan Choi
03e8b3bb31 CDH-19348 LDAP + SSL
TSSLSocket should not be "opened" if it's used by the server. See TSSLSocket::open()

Therefore, in TSaslTransport::open(), it should not open underlying transport if it's a sever.

I've tested it manually with LDAP and LDAP+SSL, but we don't have functional test for LDAP yet.

Change-Id: Ifee4957c6a874df47760d33ab50aa90eb7eda617
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2718
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 47831dbe40da8db7503f42cbde1426a498ac68fd)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2813
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-06-03 18:00:55 -07:00
Henry Robinson
3f6f570e51 Track allocated resources via metrics
Change-Id: Ib0e303038717a8614ac571e5735ba7f80aa312dd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2556
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-03 17:17:58 -07:00
Lenni Kuff
f34a0507bf [CDH5] Add support for Sentry Service to Impala
This change adds support for authorizing based on policy metadata read from the Sentry
Service. Authorization is role based and roles are granted to user groups. Each role
can have zero or more privileges associated with it, granting fine grained access to
specific catalog objects at server, URI, database, or table scope. This patch only
adds support to authorize against metadata read from the Sentry Policy Service, it does
not add support for GRANT/REVOKE statements in Impala.

The authorization metadata is read by the catalog server from the Sentry Service and
propagated to all nodes in the cluster in the "catalog-update" statestore topic. To
enable the Catalog Server to read policy metadata, the --sentry_config must be
set to a valid sentry-site.xml config file.

On the impalad side, we continue to support authorization based on a file-based provider.
To enable file based authorization set the --authorization_policy_file to a
non-empty value. If --authorization_policy_file is not set, authorization will be done
based on cached policy metadata received from the Catalog Server (via the statestore).

TODO: There are still some issues with the Sentry Service that require disabling some of
the authorization tests and adding some workarounds. I have added comments in the code
where these workarounds are needed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-03 07:19:52 -07:00
Nong Li
e6b7565eff Fix decimal literal casting and cast expr reanalyze().
BigDecimal doesn't think about scale the way we need it to.

Change-Id: I09612c31e30e80ce4806080f1d24c6615090785e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2794
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-02 23:34:20 -07:00
Alex Behm
84686f6b95 [CDH5] Upgraded Hive SNAPSHOT in thirdparty.
Change-Id: Iab313d5376770620fb67988d740acc0b1ae7c7e4
2014-06-02 23:07:18 -07:00
Alex Behm
cdc1002060 [CDH5] Remove SNAPSHOT Hive from thirdparty.
Change-Id: I1a9290afd8f550a5bcbb4d297e23d8c502769233
2014-06-02 23:07:05 -07:00
Nong Li
418739813f IMPALA-895: Don't log cancelled to error log.
Change-Id: I1d06c4dc94f59b413678589d663eb77724482d5e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2777
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-06-02 11:54:42 -07:00
Skye Wanderman-Milne
0f23ddd5c1 Codegen NDV computation.
The HLL update and merge functions are cross-compiled and called in
the codegen'd UpdateSlot() function. (The UdfContext functions are
also cross-compiled so they can be inlined.) This speeds up NDV
calculation 2-3x.

Change-Id: Ia0de5e231e4520097ee1a4df8a3dfda5b1843738
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2732
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 9f8113403d70a053b088a014976e513765f374a7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2776
2014-06-02 01:11:11 -07:00
Ippokratis Pandis
e34ede292c IMPALA-1016: Return correct number of NULL values when projecting newly added column
This patch handles the case where when a query was projecting a newly added column,
the parquet scanner was returning infinite values.

Change-Id: Ie5f4d4a88d5868e8d9e5c39fa9440821776dde3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2725
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2761
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
2014-06-01 01:28:25 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Alex Behm
7b7ca065c9 Support Llama HA by cycling through a list of configured Llama hostports.
Change-Id: I471388f468254598347fdf605669b0c0af0f7a15
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2707
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-05-30 23:46:56 -07:00
Nong Li
5d80942d42 [CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads.
Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 19:14:39 -07:00
Nong Li
84f851b5a5 IMPALA-959: Fix ASAN decimal crashes.
Not quite sure what the underlying issue is but these fixes seem to work.

Change-Id: I759804eb8338ba86969c0214a1e6e35588c94297
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2726
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-30 16:47:07 -07:00
Nong Li
26ca559f38 Add decimal builtins: abs/round/ceil/floor/truncate.
Change-Id: I4fe0ee69475ff56d3dc0cd69ea21f677714ae8bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2748
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 11:53:06 -07:00
Nong Li
6e691f9500 IMPALA-1010: Remove Close() of build side in blocking join node.
This optimization is generally not safe since the probe side is still streaming. The
join node could acquire all of the data from the child into its own pool but then
there's no real point in doing this (doesn't lead to lower memory footprint and just
makes the mem accounting harder to reason about).

This is exposed in busy plans.

Change-Id: I37b0f6507dc67c79e5ebe8b9242ec86f28ddad41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2747
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 11:50:50 -07:00
Skye Wanderman-Milne
c8b2017093 Add decimal UDF/UDA support.
Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739
2014-05-29 20:49:53 -07:00
Alex Behm
91175de7c7 Fix recovering from Llama restart. Refactor connect/rpc/retry logic.
Change-Id: Idfbbfc3141cb9774d30ed1b1da4fe2cd9e511889
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2685
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-05-29 14:54:46 -07:00
Matthew Jacobs
12b72c4330 IMPALA-1011: Handle SHOW DATA SOURCES when no sources configured
Change-Id: I367b90c7603aea973d442f9186a6b32598a66a28
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4df5c6d741237e9c91e84e39fd6ea760ccb40cf5)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2723
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
2014-05-28 20:38:41 -07:00
Lenni Kuff
745c091fcc [CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats
Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-28 08:54:54 -07:00
Lenni Kuff
d529c6446f Cleanup builtins db initialization
Instead of doing the initialization in Catalog.java, there is now a special
BuiltinsDb that handles this initialization. This makes in more clear what file
should be modified when adding a new builtin and also cuts down the code in the Catalog.

Change-Id: I4512abff6e8c7f4924701aeffe10e656104a0b86
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2567
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 761a8728de309a20c077913aa154c6259d29d1e8)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2644
2014-05-28 08:50:44 -07:00
ishaan
c5c58c6bce The workload runner should abort execution is a query fails in a multi-user run.
Currently, we coalesce the results and do not properly catch a failure if one of the
threads has a failed query and exit_on_error is set to True. This patch ensures that we
exit before the next query is run.

Change-Id: Ie650e0f547874386c79c78982ea9916f33e18cda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2654
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-05-27 20:46:21 -07:00
Ippokratis Pandis
30357f2351 IMPALA-1003: Speed up compute stats by avoiding null counting for all types and max/avg for fixed-size types.
Removing the Null counting as well as the MAX and AVG calculations
for the types of known size

Change-Id: If218b41fcde0b18218732675585d69342f34e544
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2629
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2681
2014-05-27 19:12:40 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Nong Li
53a6ead074 Disable hive UDFs that use decimal in analysis.
Change-Id: Ifd3e134d0e8ba6ad054e5769da57c9a830ad74d2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2676
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2694
Reviewed-by: Nong Li <nong@cloudera.com>
2014-05-27 15:54:21 -07:00
ishaan
10952da6e0 Change the slf4j version to harmonize with the rest of CDH.
All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version
causes a lot of benign warnings. This patch changes Impala's version to be the same
as the rest of the stack.

Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-05-27 13:46:17 -07:00
Skye Wanderman-Milne
00e956f7cf Move CodegenAnyVal to its own file.
This is needed by upcoming commits so we can #include anyval-util.h in
cross-compiled functions without pulling in all of LLVM in our IR
module (this causes build problems, plus our module will be huger than
necessary).

Change-Id: I756d5a95e5c254403d9ad5684fe27cf96f3aec1e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2677
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit ebc328e0225d7e6204d5bc8d0c0eaa2f3c6456cf)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2688
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-05-26 13:28:12 -07:00
Dimitris Tsirogiannis
ca86e470de IMPALA-887: Improve partition pruning time
This commit is the first step in improving the performance of partition
pruning. Currently, Impala can prune approximately 10K partitions per
sec, thereby introducing significant overhead for huge table with a
large number of partitions. With this commit we reduce that overhead by
3X by batching the partition pruning calls to the backend.

Change-Id: I3303bfc7fb6fe014790f58a5263adeea94d0fe7d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2608
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2687
2014-05-26 13:10:12 -07:00
Dimitris Tsirogiannis
2dea6708b2 CDH-19292: Impala query on an HBase table takes almost same time as Hive
This commit fixes issue CDH-19292, where querying an HBase table takes
a significant amount of time if HBase has a large (>1K) number of
tables. The performance bottleneck is the call to HBase to retrieve the
table's metadata (column families) during the computation of row stats.
To resolve this issue we cache the column families in the catalog object
associated with an HBase table, so the expensive call to HBase only
happens the first time the table is queried. Subsequent calls use the
cached data.

Change-Id: I0203fee3e73d2a4304530fe0a1ba2cf163f39350
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2672
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2679
2014-05-23 21:54:02 -07:00
Nong Li
39dda02fd6 Fix comment in BitWriter::PutAligned from bits to bytes.
Change-Id: I22d029bc219ef49b94610d3e0ce9d9f2f4927bd6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2678
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-23 21:46:00 -07:00
Victor Bittorf
c13a1d080e IMPALA-938: Fix implicit casting in timestamp arithmetic exprs.
Change-Id: I7e875ec2251e9782c98b60195ecbc92258b63b5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2657
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 8822401dbb65d9b4d996d5bb78ac3aca1aa2dbac)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2671
2014-05-23 14:11:35 -07:00
Lenni Kuff
e34a3d4872 IMPALA-825: Synchronize Hive Metastore client connection creation
The creation of hive metastore clients is now synchronised to avoid the possibility of
race conditions accessing local Kerberos state. In experiments, this does not fully
resolve the issue but significantly reduces the chances it will occur.

Also adds in a new debug config option to optionally sleep for a specified number of
milliseconds between creating connections. The default is zero, but can be configured
by setting impala.catalog.metastore.cnxn.creation.delay.ms in the hive-site.xml.

Change-Id: I83e863760470bdc2d9b27c6669f35604111d69d7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2661
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b0b486028ce46be26967aa202a4b1acea22d9311)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2665
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-05-23 00:22:22 -07:00
Skye Wanderman-Milne
1dff1686aa Add option to build UDF test libs in copy-udfs-udas.sh
The option is off by default, but useful for running this script
without building the world.

Change-Id: I82d8251cf9bb2763ce69094da1995a4d6ceff167
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2647
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit a7f77643820dcbfbab231a9260c94450564bd2df)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2659
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-05-22 18:01:55 -07:00
Nong Li
723f583b4d Allow adding predicates after processing build table.
Change-Id: I4c845d9f08f0be29e548eceac3912871acd0270f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2658
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-22 13:09:51 -07:00
Nong Li
5729024fe9 IMPALA-984: Fix missing reanalyze in InlineViewRef and NULL handling.
Change-Id: Ia80035c5456630aeef7a24288a998fe08546a282
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2652
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-21 18:18:29 -07:00
Skye Wanderman-Milne
794aaad8ad Address CR comments on HdfsAvroScanner.
Change-Id: Ia92b07e0b0cd16297c0d84f0df5e5eff76788c3e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2610
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
(cherry picked from commit f4f418b8ad99e88010bd78579848d34a72a27280)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2648
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-05-21 16:06:27 -07:00
Lenni Kuff
83e239723f Add TRole/TPrivilege structs to Thrift CatalogObjects
These are used as our internal representation of the authorization policy metadata
(as opposed to directly using the Sentry Thrift structs). Versioned/managed in the
same way as other TCatalogObjects.

Change-Id: Ia1ed9bd4e25e9072849edebcae7c2d3a7aed660d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2545
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c89431775fcca19cdbeddba635b83fd121d39b04)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2646
2014-05-21 15:51:24 -07:00
Taras Bobrovytsky
46aba6149d CDH-18512: Modification to allow spaces around the = sign in SET in impala-shell
Change-Id: I3c149e9a27962ed1130b1ddbb02952f4254bd4c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2609
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2645
2014-05-21 15:34:24 -07:00