impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2026-01-25 18:01:04 -05:00

Commit Graph

Author	SHA1	Message	Date
Mike Yoder	75a97d3d7e	[CDH5] Kerberize mini-cluster and Impala daemons This is the first iteration of a kerberized development environment. All the daemons start and use kerberos, with the sole exception of the hive metastore. This is sufficient to test impala authentication. When buildall.sh is run using '-kerberize', it will stop before loading data or attempting to run tests. Loading data into the cluster is known to not work at this time, the root causes being that Beeline -> HiveServer2 -> MapReduce throws errors, and Beeline -> HiveServer2 -> HBase has problems. These are left for later work. However, the impala daemons will happily authenticate using kerberos both from clients (like the impala shell) and amongst each other. This means that if you can get data into the mini-cluster, you could query it. Usage: * Supply a '-kerberize' option to buildall.sh, or * Supply a '-kerberize' option to create-test-configuration.sh, then 'run-all.sh -format', re-source impala-config.sh, and then start impala daemons as usual. You must reformat the cluster because kerberizing it will change all the ownership of all files in HDFS. Notable changes: * Added clean start/stop script for the llama-minikdc * Creation of Kerberized HDFS - namenode and datanodes * Kerberized HBase (and Zookeeper) * Kerberized Hive (minus the MetaStore) * Kerberized Impala * Loading of data very nearly working Still to go: * Kerberize the MetaStore * Get data loading working * Run all tests * The unknown unknowns * Extensive testing Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-05 12:36:21 -07:00
Alex Behm	8223e1e44b	Avoid Hive replication bug (CDH-17414) by 'warming up' HiveServer2 after it starts. The purpose of this patch is to avoid CDH-17414 which causes data files loaded with Hive to incorrectly have a replication factor of 1. When using beeline this problem only appears to occur immediately after creating the first HBase table since starting HiveServer2, i.e., subsequent loads seem to function correctly. This patch add a new script that creates an external HBase table in Hive to 'warm up' HiveServer2 immediately after it is started. Subsequent loads should assign a correct replication factor. Change-Id: Ic54c9401b67b748a8848d19f82b8e7df9535e845 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1640 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-02-25 17:33:53 -08:00

Author

SHA1

Message

Date

Mike Yoder

75a97d3d7e

[CDH5] Kerberize mini-cluster and Impala daemons

This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore.  This is sufficient to test impala authentication.

When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.

Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems.  These are
left for later work.

However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.

Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
  'run-all.sh -format', re-source impala-config.sh, and then start
  impala daemons as usual.  You must reformat the cluster because
  kerberizing it will change all the ownership of all files in HDFS.

Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working

Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing

Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins

2014-09-05 12:36:21 -07:00

Alex Behm

8223e1e44b

Avoid Hive replication bug (CDH-17414) by 'warming up' HiveServer2 after it starts.

The purpose of this patch is to avoid CDH-17414 which causes data files loaded
with Hive to incorrectly have a replication factor of 1. When using beeline
this problem only appears to occur immediately after creating the first HBase table
since starting HiveServer2, i.e., subsequent loads seem to function correctly.
This patch add a new script that creates an external HBase table in Hive to
'warm up' HiveServer2 immediately after it is started.
Subsequent loads should assign a correct replication factor.

Change-Id: Ic54c9401b67b748a8848d19f82b8e7df9535e845
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1640
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins

2014-02-25 17:33:53 -08:00

2 Commits