Commit Graph

1681 Commits

Author SHA1 Message Date
Nong Li
c031cd4e96 Update RLE encoding to pad literal groups to 8.
Change-Id: I77cb2b80b888b569ff715c583f16aea4e39fe680
Reviewed-on: http://gerrit.ent.cloudera.com:8080/644
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:17 -08:00
ishaan
8a43426879 Sleep after starting the hiveserver2 service to guards against it not starting on time.
Change-Id: I9a0de1cc63089cba2f9b59942ee45abc44b8662e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/643
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:17 -08:00
Lenni Kuff
13605ad834 Support catalogd in ImpalaCluster test library
Adds basic support for catalogd to our ImpalaCluster test library/object model.
This will allow us to write more programatic tests targeting the catalogd process
including process failure tests and metric check validators.

Change-Id: I8e5f7bc73f999f105437c6d3d52c6d436a354d2d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/617
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:16 -08:00
Nong Li
1621d27053 LibCache improvements.
- Add fine grain locking
- Allow caching of module files copied from HDFS.
Change-Id: Ib7409c1fea715199f2be5ed65bb3b0cba90c9d9a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/632
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:16 -08:00
Lenni Kuff
dd6736b74d Change error message format to fix Analysis test failure
Change-Id: Ib9a84f3a5ff4431e45f9a6477dac5686fd1066db
Reviewed-on: http://gerrit.ent.cloudera.com:8080/636
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:16 -08:00
Lenni Kuff
9425ad1e14 IMPALA-461: Log table loading exceptions as ERROR instead of INFO
Change-Id: Ie993f6b8765b73ab9b6dbc12b4b2739203076023
Reviewed-on: http://gerrit.ent.cloudera.com:8080/630
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:15 -08:00
Lenni Kuff
b07a3ccfd6 Use an external Hive Metastore Service for local test runs
Using an external Hive Metastore Service for local test runs has a number of benefits.
Some of the benefits are that it helps separate the metastore logs from the impala
logs, and that it is more representative of what is on real cluster environments.
It also may help with some of the concurrency issues that we have been seeing when
running directly against the backend database since we no longer spin up an in-process
metastore server for each client connection.

The metastore is started by running "run-hive-server.sh" which is invoked as part of
"run-all.sh".

Change-Id: If60fa97aa38e4ad5cf578b9b409eeea1e0e29375
Reviewed-on: http://gerrit.ent.cloudera.com:8080/628
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:15 -08:00
Nong Li
6981f33b11 Compile avro with -fPIC (so the FE can pick it up).
Change-Id: Iab8f377663ae332e08d42fea95b2d968e879b12c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/623
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:15 -08:00
Lenni Kuff
92829b8400 IMPALA-587: Support implicit hbase column mapping keys
The Hive HBase spec specifies that the key column mapping can either be
defined explicitly (using the :key syntax) or left out completely in
which case a mapping to the first table column is implied. This change
updates Impala to support implicit key mappings and also adds some
checks in our ALTER TABLE DDL to unsure we cannot get into this state by
dropping a column from an Hbase table (a similar restriction that Hive
puts in place)

Change-Id: I920d642261659ee3e881da2553ffe83300923af8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/554
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:14 -08:00
Nong Li
c868350fbd Add OS info, which now just contains the os version.
Change-Id: Ifdaf80702301ff6beb3fd34abe814fd2fa904607
Reviewed-on: http://gerrit.ent.cloudera.com:8080/619
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:14 -08:00
Skye Wanderman-Milne
49c07abce3 Fix gen_opcodes.py for Python 2.4
Change-Id: I6e373c370f8081c1e549cbe4d1bc2a0a254ad357
Reviewed-on: http://gerrit.ent.cloudera.com:8080/622
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:14 -08:00
Nong Li
15db34e356 AggregationNode refactoring
This patch redoes how the aggregation node is implemented. The functionality is
now split between aggregation-node, agg-expr and aggregate-functions. This is a working
progress (there's still a lot of debug stuff I added that needs to be cleaned up) but
it does pass the tests.

Aggregation-node is now very simple and now only deals with the grouping part.
Aggregate-expr serves as the glue between the agg node and the aggregate functions.
The aggregation functions are implemented with the UDA interface. I've reimplemented
our existing aggregate functions with this setup. For true UDAs, the binaries would be
loaded in aggregate-expr.

This also includes some preliminary changes in the FE. We now need to annotate each
AggNode as executing the update vs. merge phase (root aggs execute update, others
execute merge) and if it needs a finalize step (only the root does). This is more
general than our builtins which are too simple to need this structure.

There is a big TODO here to allow the intermediate types between agg nodes to change.
For example, in distinct estimate, the input type is the column type and the output type
is a bigint. We'd like the intermediate type to be CHAR(256). This is different since
currently, the intermediate type and output type have always been the same. We've hacked
around this by having both the intermediate and output type be TYPE_STRING. I've left
this for another patch (changing the BE to support this is trivial).
For aggregates that result in strings, we used to store some additional stuff past the
end of the tuple. The layout was:
<tuple> <length of 1st string buffer>,<length of 2nd string buffer>, etc

The rationale for this is that we want to reuse the buffer for min/max and grow the buffer
more quickly for group_concat. This breaks down the abstraction between agg-expr and
agg-node and is not something UDAs can use in general. Rather than try to hack around
this, I think the proper solution is to the intermediate type not be StringValue and
to contain the buffer length itself.

This patch also resurrects the distinct estimate code. The distinct estimate functions
exercise all of the code paths.

Change-Id: Ic152a2cd03bc1713967673681e1e6204dcd80346
Reviewed-on: http://gerrit.ent.cloudera.com:8080/564
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
Skye Wanderman-Milne
0b2bebdfd1 Improve codegen optimizations, take two
Change-Id: Id9b48e1979bb9999c58e7fd89553ee9a7d8996d0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/606
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
Skye Wanderman-Milne
656ae8b1c8 Cross-compiled UDF builtins.
When codegen is enabled, UDF builtins will be loaded from the IR
module rather than using the native functions. Since we cannot run
UDFs without codegen yet this means UDF builtins can only be run this
way, but once we add support for running UDFs without codegen this
will allow us to switch back to the native functions for
development/debugging.

Change-Id: I948b113c61603801b84f80982384bbc07596f119
Reviewed-on: http://gerrit.ent.cloudera.com:8080/605
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:13 -08:00
Lenni Kuff
bf139d1eba Update catalogd to forward log4j log messages to glog
Change-Id: I4620b77ba731e134a3e48883e8ae7ee3820ed584
Reviewed-on: http://gerrit.ent.cloudera.com:8080/612
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:12 -08:00
ishaan
aa530ce11d Change the order of fields stored in the benchmark results to fix performance comparisons.
Change-Id: I7b7ebd711adfe9a44cba92b55d35ef8dd97eba60
Reviewed-on: http://gerrit.ent.cloudera.com:8080/584
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:12 -08:00
Lenni Kuff
6e8741aafd Add metric to determine if impalad catalog is 'ready'
Change-Id: I8e94d9beff05f2370902c887a5ae6a4fffad9dfe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/611
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:12 -08:00
Lenni Kuff
5a97258c1a Update table metadata loading to workaround Hive MetaStore bug HIVE-5457
There is a Hive Metastore concurrency bug (HIVE-5457) which causes concurrent
calls to getTable() to sometimes fail due with data nucleus exceptions. This
causes catalogd to fail to load ALL metadata for all tables. This fix is to
serialize our calls to getTable(). Additionally, tweaked the logging a bit and
improved start-impala-cluster to do a better job of reporting the status of catalog
initialization. It's too bad we have to serialize these calls, but we seem to be able
to run everything else in parallel with no problems (get col stats, block md, etc).

Also added a couple of changes in our hive-site to match the defaults for our cluster
metastore deployments.

Change-Id: Ic70e2a9b8190a56510e430d8da3942dca252eb4c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/609
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Nong Li
f3b5f9d5d8 Patch tcmalloc's stack walking to be more strict and less likely to crash.
The issue from looking at the core dump is that tcmalloc crashes trying to
walk the call stack. This is why it is happening on the heap checker build
where it stores the stack of many calls. I can see from GDB that there is
something goofy with that stack frames. Tcmalloc is able to identity 4 frames
(same values as GDB) and then crashes on the next where GDB shows ???. GDB
can continue to show the rest of the stack so I don't think this is stack
corruption. The frames are in libstdc++ and I believe the issue is from a
compiler optimization to omit stack frame pointers. Tcmalloc's stack walking
is not tolerant of this. Debuggers have lots of logic to look around where the
stack should have started and recover.

We have a few options to handle this. Here's one proposed solution. I think we
can also consider setting NO_TCMALLOC_SAMPLE which will cause it to never collect
stacks. We can also try different options for different build types.

Change-Id: I98dfb5bccd5fe485ac50b56c6f0fe3f3ded9ff76
Reviewed-on: http://gerrit.ent.cloudera.com:8080/600
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:53:10 -08:00
Nong Li
78539ee531 Allow insert cancellation test to fail due to IMPALA-551
Change-Id: I5d98be1cc503cc51206051a7c6a493bf884ab5b3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/594
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:10 -08:00
Skye Wanderman-Milne
e5ea524448 Revert "Improve codegen optimizations."
This reverts commit b375f3d7f4961def4ef930273420d447f99d093f.

Reverting for now to fix build.

Change-Id: Icaa790d44ab47825f855d8a123cad3130948934a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/586
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:09 -08:00
Nong Li
1eb2b7a964 Add execution for vararg UDFs.
Change-Id: I46e5670c09ac0b8e62f39dfc832fe880dd1dc995
Reviewed-on: http://gerrit.ent.cloudera.com:8080/572
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:09 -08:00
ishaan
ee42aa8d36 Fix incorrect argument in the Impala test suite call to execute_using_jdbc
execute_using_jdbc used to expect a query string. Its interface was recently changed to
accept a query object. Additionally, change the interface of the Query() class to enable
it to accept raw (qualified) query strings.

Change-Id: I44693cd2cccf1041cab32a9821fb76b12d148375
Reviewed-on: http://gerrit.ent.cloudera.com:8080/577
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:09 -08:00
Skye Wanderman-Milne
3a388eb461 Improve codegen optimizations.
Change-Id: I0698cbcb417b8e9981ffac43361cf1cafbb17348
Reviewed-on: http://gerrit.ent.cloudera.com:8080/576
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:08 -08:00
Alex Behm
9065648d77 Improvements to cost estimation and explain output.
Fixed cost estimation of union queries and exchange nodes.
Fixed propagation of stats through cloning of exprs and plan nodes.
Fixed propagation of expr stats to slots they are materialized into (e.g., grouping columns in multi-level aggs).
Improved explain output for constant selects.

Change-Id: I96d1652c00d48e4093b85ae7fc8bad28d74b8b81
Reviewed-on: http://gerrit.ent.cloudera.com:8080/547
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:08 -08:00
ishaan
a33c795de3 Fix build failure because of a function signature change in the test file parser.
Change-Id: I329eca710459910a743d682c21a625672096aec0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/573
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:08 -08:00
Chris Channing
ba7c764279 IMP-651: Adding support for the greatest function.
Change-Id: Ia8c53db5504e28d8669e6013545da6b1164bcb23
Reviewed-on: http://gerrit.ent.cloudera.com:8080/570
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:07 -08:00
ishaan
565d15579c Add the ability to use a workload as the unit of execution in the Impala benchmark runner.
At the moment, a query is the default unit of execution and parallelism in the Impala
performance suite. With this change, we now have the ability to treat a workload as the
unit of execution. A workload is defined as a unique combination of the dataset, scale
factor, a subset (or all) of the queries in the dataset, and a table format (file format,
compression codec and compression scheme).

It introduces two new command line options in bin/run-workload.py:
  * --execution_scope
    The default scope is 'query', and it maintains previous semantics. The
    new scope is 'workload', which toggles the unit of execution to a workload.
  * --shuffle_query_exec_order.
    Shuffles the order in which queries are executed (only applicable when the
    execution_scope if workload), defaults to False.

Change-Id: I790d75f0896210cda8eb999015b0be04246e4c45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/503
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:07 -08:00
Chris Channing
dc055c57ff IMP-651: Adding support for the least function.
Change-Id: I51c12bdd2ed614e2885403b4f857abe7d8e5777c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/552
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:07 -08:00
Nong Li
e4786f08fe Workaround GCC bug to fix build break in OpcodeRegistry.
Change-Id: I26eaaa4e87099d79507511203700352bb6df3922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/569
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:06 -08:00
Alex Behm
4b29b54d76 Fixed cost estimation overflow.
Temporarily switched a few precondition checks related to cost estimation
to warnings until cost estimation is more rebust.

Change-Id: I82538b5325a17921e6caab2be997f65cf57f5438
Reviewed-on: http://gerrit.ent.cloudera.com:8080/568
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:06 -08:00
Nong Li
e959e49b7c Update opcode registry to support UDF-interface builtins.
There's a bigger change to migrate the rest of them but I think this is how
the builtins, when not running as cross compiled, should be run. This mode
is still useful when developing the builtin.

When run as cross compiled IR, we wouldn't do anything to distinguish between
a builtin and an external UDF.

Change-Id: I6aa336b22aa19b00507bad33c9df3978baa576cc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/542
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:06 -08:00
Nong Li
4bb1e8c854 Add varargs to UDF/UDA parser/analyzer.
Change-Id: I4c3f2e74f6c29cee4b0b787c058b0455b16a11fd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/548
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:05 -08:00
ishaan
8e553a8a2e IMPALA-454 Tab completion in the shell should not depend on case.
This change adds support for upper case and mixed case tab completion for commands in
the shell.

Change-Id: I5b7083ec71463c9fd60b0a8b788423e2fe8d0ce5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/563
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:05 -08:00
Nong Li
b93b15f10f Integrate function context with mempool.
Change-Id: I55edb6cb89b67eb2c8031ac3a4f119df92a0896f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/565
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:05 -08:00
Lenni Kuff
43eed3365c Separate log4j/glog forwarding code from fe-support/FeSupport
This change splits the log4j/glog forwarding code out from libfesupport
into its own shared library - libloggingsupport.so. This allows the log
forwarding to be used in other places than the impalad FE, such as the
CatalogService.

Change-Id: I669e5b913b913488b4b7d5b7ed4b8be271850c6e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/559
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:04 -08:00
Nong Li
68e8470ae3 Add CHAR(N) type in BE.
It's going to be pretty suboptimal to implement UDAs without having CHAR(N)
support so I implemented the bare minimum to support it. We need to change
all uses of PrimitiveType in the future.

This is a bit hard to test now since we don't expose this in the language except
for UDAs currently. I can combine this with the UDA patch but that patch is
pretty big.

Change-Id: I799dd2c905b41194e92cc01728727546294b0a02
Reviewed-on: http://gerrit.ent.cloudera.com:8080/562
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:04 -08:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Nong Li
e5ed8e4105 Move minicluster_xml_conf to HADOOP_CONF_DIR.
The current location gets deleted if you rebuild, making you have to restart mini dfs.

Change-Id: If71b144534255fa8df2bfa187c0814ffdf28463e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/550
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:03 -08:00
Alex Behm
e4a24c8c1d Fixed the process failure test that was failing due to a race in
reading/writing a query's profile web page.

Change-Id: Ibf4a27aa17eb6439630d1616c2c719fc1ee2ba4e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/553
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:53:03 -08:00
Lenni Kuff
beea7d3d10 Disabled thrift-server-test due to IMPALA-606
Change-Id: I4d080535cc4778ddad90fca22dbffdfc5f303b15
Reviewed-on: http://gerrit.ent.cloudera.com:8080/556
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:02 -08:00
ishaan
3dfdbd88d9 IMPALA-547 The Impala Shell should have better handling when the history file does not
exist or is uneditable.

Currently, the shell warns the user that it's unable to load the command history if the
command history file (~/.impalahistory) is not found. Moreover, if the file is not
editable, then an error is thrown after each the execution of each command. This change
disables readline if the history file is not editable instead of throwing repeated
errors, and removes the warning if the history file does not exist.

Change-Id: Ie4c94629431f2407b0679a7721a6bdf28907437f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/532
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:02 -08:00
ishaan
c0129a1683 Improve the Impala shell's behavior when attempting to connect to a keberized impalad.
This change has the following additions:
- If the user's connecting to a kerberized impalad, the Impala shell will check
  whether a valid ticket exists by running 'klist -s'. If a valid ticket is not found,
  then the shell will exit with an appropriate error message on the commandline.
- If the user's connecting to a kerberized impalad without the '-k' option, the Impala
  Shell will issue a 'klist -s' to check if there are valid kerberos tickets in the
  credentials cache. If a valid ticket is found, it will retry the connection with
  kerberos enabled.
- The Impala shell encodes strings entered on the commandline as unicode. The sasl
  module expects ascii strings as arguments. Explcitly encode any string sent to the
  sasl module to ascii.

Change-Id: I1799b1e7988a19fa513b683afe1e3b66b68c1ffc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/535
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:02 -08:00
Nong Li
8963d79f51 Fix build break from UdfContext rename.
Change-Id: Ia3df23fcba7d3812ae90565daab89916cbb50861
Reviewed-on: http://gerrit.ent.cloudera.com:8080/549
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:01 -08:00
Nong Li
93bece32ae Rename UdfContext to FunctionContext.
Change-Id: I45da3f51a66c3e2cc4580c26733269f30ab9be83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/546
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:01 -08:00
Henry Robinson
8cee9fa138 Fix failing test_shell_commandline
Change-Id: Iea170885f740ceeb08e21e64ef88ab44584fa270
Reviewed-on: http://gerrit.ent.cloudera.com:8080/545
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:01 -08:00
Henry Robinson
41c88219ab Fix PYTHONPATH for Thrift on non-Debian systems
Python modules on Redhat systems might be in lib or in lib64, unlike Debian systems which
symlink one to the other

Change-Id: Ia1e2d362e3d7e13b87c70e7578644827a5234a91
Reviewed-on: http://gerrit.ent.cloudera.com:8080/544
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:00 -08:00
Henry Robinson
b9bc9a9e89 Add SSL support for client connections to Impala
This patch allows Impala to start either Beeswax or HS2 on an
SSL-secured port. SSL is a certificate-based authentication scheme,
where the server provides a certificate to the client as part of the
handshake process. The client verifies that certificate, either by
contacting a trusted third-party certificate authority (CA), or by
accepting a 'self-signed' certificate from the server that is also
provided to the client out-of-band; the client simply compares the two
certificate copies.

Once the certificate is verified, the client and server negotiate an
encryption key for the session, using a public key provided by the
server to encrypt that negotiation. Therefore the server has to have
access to a private key in order to decrypt the encryption key.

Both certificate and key are stored in industry standard .PEM
format. Impala uses the same certificate and key for both Beeswax and
HS2, and the files containing the certificate and key are provided via
--ssl_server_certificate and --ssl_private_key. If either are non-blank,
SSL is enabled for Beeswax and HS2.

The Python shell supports SSL as of this patch via new --ssl and
--ca_cert flags.

Finally, this patch also adds support for Impala's ThriftClients to use
SSL, paving the way for having the backend service use encryption on the
wire as well (although such a configuration is not used by this
patch). The client SSL support is only currently used for the new test
case.

This patch does not enable 'mutual' authentication, where clients
provide certificates to the server in order to authenticate
themselves. Impala has other authentication mechanisms for that purpose.

Change-Id: I3942aa0d21b34b7cda748292f04a9523f35ee6d4
Reviewed-on: http://gerrit.ent.cloudera.com:8080/514
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:00 -08:00
Henry Robinson
f3e4df14ac Move Thrift backend code into separate rpc library
We have ~60 files in Util which is a bit unwieldy. The Thrift / RPC code
is some of the easiest to move out, and doesn't really belong in a
'Util' library.

Change-Id: I7a188ab69459b019a643b192d51879bc8ead88a7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/528
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:53:00 -08:00