impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Author	SHA1	Message	Date
Taras Bobrovytsky	bee5375502	Add kill cluster marker to KMS If PID files of each process in the mini cluster get deleted for some reason, it should still possible to kill them because each process is marked with "-DIBelongToTheMiniCluster". It turns out that the KMS process was not being marked. This patch fixes this. Change-Id: I0398dec94be3ae91548d11a79c1d5eec0ad3dadb Reviewed-on: http://gerrit.cloudera.org:8080/3354 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>	2016-06-13 15:32:17 -07:00
Matthew Jacobs	40b79aecbc	IMPALA-3417: run-all.sh fails when no services should start Change-Id: I268c7ad66c82f2b04b832d520e21662e631572cf Reviewed-on: http://gerrit.cloudera.org:8080/3250 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-06-02 09:32:54 -07:00
Sailesh Mukil	d2c3c8711b	IMPALA-2021: S3: Flaky tests: impala-s3 job sometimes encounters I/O error 255 Through emprical analysis, it was determined that setting the maximum number of connections to S3 as 1500 was optimal for functionality and performance. The hadoop set default of 15 connections could lead us to have deadlocks as our parquet scanner requires that we have multiple concurrent open connections proportional to the number of columns that we are scanning. Setting it to this high a value does not seem to have any negative implications. This has also been found to fix the Error(255): Unknown errors. Change-Id: Ide6f1326d5155b2e5f4da3a3f23df3f3d40c5a8d Reviewed-on: http://gerrit.cloudera.org:8080/3114 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-23 08:40:19 -07:00
Bharath Vissapragada	3092c96619	IMPALA-2660: Respect auth_to_local configs from hdfs configs This patch implements a new feature to read the auth_to_local configs from hdfs configuration files, using the parameter hadoop.security.auth_to_local. This is done by modifying the User#getShortName() method to use its hdfs equivalent. This patch includes an end to end authorization test using sentry where we add specific auth_to_local setting for a certain user and test if the sentry authorization passes for this user after applying these rules. Given we don't have tests that run on a kerberized min-cluster, this patch adds a hack to load this configuration during even on non-kerberized 'test runs'. However this feature is disabled by default to preserve the existing behavior. To enable it, 1. Use kerberos as authentication mechanism (by setting --principal) and 2. Add "--load_auth_to_local_rules=true" to the cluster startup args Change-Id: I76485b83c14ba26f6fce66e5f83e8014667829e0 Reviewed-on: http://gerrit.cloudera.org:8080/2800 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:18:01 -07:00
Tim Armstrong	8e64273fee	Fix Kudu hole punch check to work if /tmp is on different fs /tmp isn't necessarily on the same filesystem as the Kudu data directory. Fix the check so that it checks the actual Kudu directory. Change-Id: Ic6aa27569a0650db7dcf5759952cd50c8e47f8c9 Reviewed-on: http://gerrit.cloudera.org:8080/2967 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:56 -07:00
Anuj Phadke	a915293109	IMPALA-1850: Allow fs.defaultFS to be set to a non-HDFS filesystem This change whitelists the supported filesystems which can be set as Default FS for Impala to run on. This patch configures Impala to use S3 as the default filesystem, rather than a secondary filesystem as before. Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed Reviewed-on: http://gerrit.cloudera.org:8080/1121 Reviewed-by: anujphadke <aphadke@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:40 -07:00
casey	a27946e696	Improve mini-cluster usability (testdata/cluster/admin) Changes: 1) Previously when a service would fail, the user would have to find the the log file and open it. Now the end of the log is dumped to stdout. 2) Add start, stop, and restart commands to the "admin" script. For example now you can run testdata/cluster/admin restart kudu 3) Wait up to 120 seconds for services to shutdown. The timeout is the same as for the Impala processes. If the services fail to stop an error will be raised. Change-Id: I537ea5656df2081d4f1f27a9f3fcef4547fdc2fe Reviewed-on: http://gerrit.cloudera.org:8080/2751 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:37 -07:00
casey	6f4a5e6bb0	Kudu: Use -block_manager=file if "hole punching" isn't supported By default Kudu requires the underlying file system to support hole punching. If support isn't there Kudu will fail to start. People using such a file system can instead start Kudu with -block_manager=file. Before starting Kudu in the local mini-cluster, the "fallocate" command will be used to automatically determine if the special flag is needed. Note, users who need this must run bin/create-test-configuration.sh after pulling in this commit. This also fixes a bug in the delete_kudu_data() in the cluster admin script. A directory name was incorrect. Change-Id: I1ca7fedb367444c41e462b72b0b76091ee94e27c Reviewed-on: http://gerrit.cloudera.org:8080/2750 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:36 -07:00
casey	cef87e39dc	Updates for new Kudu toolchain layout and upgrade Kudu The directory structure of the newer Kudu toolchain artifacts has changed. Now the root directory is split into /release and /debug. A few little updates are needed to the build and service scripts. Since the toolchain no longer provides stubs for platforms that Kudu doesn't support the stubs need to be generated. This will be done as part of the toolchain bootstrapping. Also this upgrades Kudu to 0.8 RC1. Developers will need to run bin/create-test-configuration.sh after pulling in this change. Otherwise the Kudu service will fail to start. Change-Id: I625903bd92afece0ad819a96fc275d5812b5eb2a Reviewed-on: http://gerrit.cloudera.org:8080/2720 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:35 -07:00
Sailesh Mukil	86fd262dc9	IMPALA-3324: Hive server does not start for S3 builds. The hive server does not start for S3 builds because HDFS is marked as an unsupported service in testdata/cluster/admin; and so HDFS is not started at all, and so the Hive server is unable to start as well. Due to this, all our S3 builds fail. Currently our S3 builds need HDFS to run correctly. (This has to be reverted once IMPALA-1850 goes in, because then S3 can run as a default FS without HDFS) Change-Id: Ibda9dc3ef895c2aa4d39eb5694ac5f2dbd83bee4 Reviewed-on: http://gerrit.cloudera.org:8080/2741 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-04-12 14:03:43 -07:00
Casey Ching	9bb1b8a366	Kudu: Disable fsnyc in the mini-cluster The Kudu team recommended disabling this for testing purposes. This should help with timeouts in cloud machines (ec2/gce). Disabling fsyncs could lead to data loss if the system crashed before the OS had a chance to write the data to disk. Our test setups don't need that level of reliability. Change-Id: I72fd85ce5c4bc71f071b854ea6a9ebe60fc1305f Reviewed-on: http://gerrit.cloudera.org:8080/2734 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-04-12 14:03:43 -07:00
Casey Ching	9d43aac6ce	IMPALA-3274: Always start Kudu for testing Previously Kudu would only be started when the test configuration was the standard mini-cluster. That led to failures during data loading when testing without the mini-cluster (ex: local file system). Kudu doesn't require any other services so now it'll be started for all test environments. Change-Id: I92643ca6ef1acdbf4d4cd2fa5faf9ac97a3f0865 Reviewed-on: http://gerrit.cloudera.org:8080/2690 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-04-12 14:02:35 -07:00
Casey Ching	39a28185e8	Re-enable Kudu in build using client stubs when needed The stubs in Impala broke during the merge commit. This commit removes the stubs in hopes of improving robustness of the build. The original problem (Kudu clients are only available for some OSs) is now addressed by moving the stubbing into a dummy Kudu client. The dummy client only allows linking to succeed, if any client method is called, Impala will crash. Before calling any such method, Kudu availability must be checked. Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712 Reviewed-on: http://gerrit.cloudera.org:8080/2585 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-03-29 23:57:54 +00:00
Alex Behm	7e76e92bef	Consolidate test and cluster logs under a single directory. All logs, test results and SQL files generated during data loading and testing are now consolidated under a single new directory $IMPALA_HOME/logs. The goal is to simplify archiving in Jenkins runs and debugging. The new structure is as follows: $IMPALA_HOME/logs/cluster - logs of Hadoop components and Impala $IMPALA_HOME/logs/data_loading - logs and SQL files produced in data loading $IMPALA_HOME/logs/fe_tests - logs and test output of Frontend unit tests $IMPALA_HOME/logs/be_tests - logs and test output of Backend unit tests $IMPALA_HOME/logs/ee_tests - logs and test output of end-to-end tests $IMPALA_HOME/logs/custom_cluster_tests - logs and test output of custom cluster tests I tested this change with a full data load which was successful. Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa Reviewed-on: http://gerrit.cloudera.org:8080/2456 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-03-28 19:23:22 +00:00
casey	804cfbdd64	Get and use Kudu from the toolchain by default This is for review purposes only. This patch will be merged with David's big merge patch. Changes: 1) Make Kudu compilation dependent on the OS since not all OSs support Kudu. 2) Only run Kudu related tests when Kudu is supported (see #1). 3) Look for Kudu locally, but in a different location. To use a local build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and set KUDU_CLIENT_DIR to the path KUDU was installed in. Example: git clone https://github.com/cloudera/kudu.git ...build 3rd party etc... mkdir -p $KUDU_BUILD_DIR cd $KUDU_BUILD_DIR cmake <path to Kudu source dir> make DESTDIR=$KUDU_CLIENT_DIR make install 4) Look for Kudu in the toolchain if not using a local Kudu build. 5) Add Kudu service startup scripts. The Kudu in the toolchain is actually a parcel that has been renamed (the contents were not modified in any way), that mean the Kudu service binaries are there. Those binaries are now used to run the Kudu service. Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58	2016-03-11 11:38:05 -08:00
David Alves	82222abaf5	Merge branch 'feature/kudu' into cdh5-trunk This merges the 'feature/kudu' branch with cdh5-trunk as of commit: 055500cc753f87f6d1c70627321fcc825044e183 This patch is not a pure merge patch in the sense that goes beyond conflict resolution to also address reviews to the 'feature/kudu' branch as a whole. The review items and their resolution can be inspected at: http://gerrit.cloudera.org:8080/#/c/1403/ Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92	2016-03-11 11:37:58 -08:00
Casey Ching	f288867833	Stress test: Various changes The major changes are: 1) Collect backtrace and fatal log on crash. 2) Poll memory usage. The data is only displayed at this time. 3) Support kerberos. 4) Add random queries. 5) Generate random and TPC-H nested data on a remote cluster. The random data generator was converted to use MR for scaling. 6) Add a cluster abstraction to run data loading for #5 on a remote or local cluster. This also moves and consolidates some Cloudera Manager utilities that were in the stress test. 7) Cleanup the wrappers around impyla. That stuff was getting messy. Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7 Reviewed-on: http://gerrit.cloudera.org:8080/1298 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 23:00:25 +00:00
Bharath Vissapragada	26429aee4d	IMPALA-2624: Increase fs.trash.interval to 24 hours for test suite Some of the tests rely on hdfs trash mechanism to be enabled and poll the paths in the trash directory during test runs. These tests are failing intermittenly due to a race with the hdfs trash checkpointing mechanism which moves all the trash contents to another directory. This checkpointing runs every fs.trash.checkpoint.interval minutes and defaults to fs.trash.interval (when set to 0). Currently there seems to be no way to disable this checkpointing. This patch increases the fs.trash.interval from the current value of 30 minutes to 24 hours so that the test runs never hit this race condition. Change-Id: I42fcaee70a461712f1df6bac23c71f915718b015 Reviewed-on: http://gerrit.cloudera.org:8080/1703 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Internal Jenkins	2016-01-06 12:01:18 +00:00
casey	3a3497e819	Fix cluster start script for RHEL 7 The change to the start script for OSX used "find" with the "-perm +0111" option as an "executables only" filter but that doesn't work with newer versions of "find". "-perm +" has been deprecated or removed (depending on the version) in Linux. I couldn't find a OSX+Linux compatible filter. The variable IS_OSX was added and used to choose the appopriate filter. Change-Id: I0c49f78e816147c820ec539cfc398fb77b83307a Reviewed-on: http://gerrit.cloudera.org:8080/1630 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-29 23:25:02 +00:00
Martin Grund	44f4d4250b	Fix YARN configuration to pickup LZO Until now, our YARN configuration was broken so that we weren't able to run local Map Reduce jobs. The jobs would fail with a class not found exception of the LZO codec. This patch fixes this issues and corrects the classpath. Change-Id: I689cca7a079dbd269d4bd96f1b4e3d91147d527c Reviewed-on: http://gerrit.cloudera.org:8080/1667 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-12-18 07:01:07 +00:00
Casey Ching	e2bfb6ae2f	Misc improvements to shell scripts about error reporting Changes: 1) Consistently use "set -euo pipefail". 2) When an error happens, print the file and line. 3) Consolidated some of the kill scripts. 4) Added better error messages to the load data script. 5) Changed use of #!/bin/sh to bash. Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1 Reviewed-on: http://gerrit.cloudera.org:8080/1620 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 18:25:27 +00:00
casey	c56ba5149c	Infra scripts: Only attempt to kill processes owned by the current user This is for compatibility with docker containers. Before this patch, when the scripts were run on the docker host, the scripts would try to kill the mini-cluster in the docker containers and fail because they didn't have permissions (the user is different). Now the scripts will only try to kill mini-cluster processes that were started by the current user. Also some psutil availability checks were removed because psutil is now provided by the python virtualenv. Change-Id: Ida371797bbaffd0a3bd84ab353cb9f466ca510fd Reviewed-on: http://gerrit.cloudera.org:8080/1541 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 12:08:33 +00:00
Martin Grund	577e335122	OS X: cluster admin script compatibility admin was using the -executable flag of find that is not available on Mac. This patch replaces it with "-perm +0111 -type f" which is similar semantics. In addition, there seem to be differences in which shell builtins are available so some changes have been made to fix that issue. Change-Id: I9b2ecbd5bf6a9b1610e7ca9f15b1a4d1407b94c1 Reviewed-on: http://gerrit.cloudera.org:8080/1612 Reviewed-by: Casey Ching <casey@cloudera.com> Readability: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-12-11 22:09:45 +00:00
Bharath Vissapragada	4ed0742f3e	IMPALA-2310: Add PURGE option to DROP TABLE/ALTER TBL DROP PART This commit adds PURGE option to DROP TABLE/ALTER TABLE DROP PARTITION statements. Following is the usage: 1. DROP TABLE <tablename> takes an optional argument PURGE. Adding purge purges the table data by skipping trash, if configured. DROP TABLE [<database>.]<tablename> [IF EXISTS] [PURGE] 2. PURGE is also supported with alter table drop partition query with the following syntax. If specified, impala purges the partition data by skipping trash. ALTER TABLE [<database>.]<tablename> DROP PARTITION [IF EXISTS] [PURGE] This patch also helps the use case where Trash and the data directories are in different encryption zones, in which case we cannot move the data during ALTER/DROP. Then purge option can be used to skip the trash and make sure data is actually deleted. Change-Id: I64bf71d660b719896c32e0f3a7ab768f30ec7b3b (cherry picked from commit 585d4f8d9e809f3bf194018dd161a22d3f144270) Reviewed-on: http://gerrit.cloudera.org:8080/1244 Reviewed-by: Juan Yu <jyu@cloudera.com> Tested-by: Internal Jenkins	2015-10-14 17:51:37 -07:00
David Alves	59625548c4	Allow to format the Kudu cluster and move the data into testdata/cluster/kudu Currently we were using the default data directory to store the data used for Impala tests, without ever formatting it. This is contrary to how the other Impala data sources behave, i.e. when "--format" is passed to build-all.sh only Kudu wouldn't be formatted. This also moves Kudu's data directory inside of the Impala directory structure, where it's easier to account for it. Change-Id: Iae2870df0e625de07a761687e75999ef30f2be06 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/7055 Tested-by: jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-07-01 10:12:52 -07:00
ishaan	377214c469	Use Isilon as the default file system when running Isilon tests. This patch enables running Impala tests against Isilon as the default file system. The intention is to run tests against a realistic deployment, i.e, Isilon replacing HDFS as the underlying filesystem. Specifically, it does the following: - Adds a new environment variable DEFAULT_FS, which points to HDFS by default. - Makes the fs.defaultFs property in core-site.xml use the DEFAULT_FS environment variable, such that all clients talk to Isilon implicitly. - Unset FILESYSTEM_PREFIX when the TARGET_FILESYSTEM is Isilon, since path prefixes are no longer needed. - Only starts the Hive Metastore and the Impala service stack when running tests against Isilon. We don't start KMS/HBase because they're not relevant to Isilon. We also don't start YARN, Hive and LLama because hive queries are disabled with Isilon. The scripts that start/stop Hive, YARN and Llama should be modified to point to a filesystem other than HDFS in the future. Change-Id: Id66bfb160fe57f66a64a089b465b536c6c514b63 Reviewed-on: http://gerrit.cloudera.org:8080/449 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 01:23:11 +00:00
Matthew Jacobs	bc3a46daab	Change minicluster llama log level to INFO Change-Id: Ifa83cb437f807c5cbd9f2259a570c1af39340811 Reviewed-on: http://gerrit.cloudera.org:8080/402 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2015-05-20 21:11:49 +00:00
Matthew Jacobs	456e99b21b	Mini cluster configuration change for Yarn and log4j Update the yarn-site.xml to reduce the latency of resource acquisition. Also changes the log4j properties to reduce the very verbose logging for the hadoop daemons which was consuming huge amounts of space very quickly. Change-Id: I8532fb5125b604974e26ddad76aee93b9c4e64fb Reviewed-on: http://gerrit.cloudera.org:8080/381 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 23:05:44 +00:00
ishaan	058978dccb	Enable using isilon as the underlying filesystem. This patch enables the Impala test suite to run the end to end tests against an isilon namenode. There are a few caveats: - The fe test will currently not work. - Only loading data from both the test-warehouse snapshot and the metadata snapshot is supported. - The test suite cannot be run by multiple people (unless we have access to multiple isilon namenodes) Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237 Reviewed-on: http://gerrit.cloudera.org:8080/356 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-12 01:28:19 +00:00
Martin Grund	86d4516c44	Fix bad invocation of cluster startup scripts The templates for starting the services of the cluster had a bad declaration of the shebang that made it impossible to start kms when using a non-bash default shell. Change-Id: I6b105b328dc61e71095c2d5e5d6859f65ca56a18 Reviewed-on: http://gerrit.cloudera.org:8080/293 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-27 02:30:30 +00:00
Matthew Jacobs	835d6dbef4	IMPALA-1209: Add KMS service to testdata cluster (pt1) First change for IMPALA-1209 to address Impala limitations when using HDFS encryption. This adds a KMS process to the testdata cluster. This was tested manually by creating a key and an encryption zone. Change-Id: I499154506386f04e71c5371b128c10868b1e1318 Reviewed-on: http://gerrit.cloudera.org:8080/41 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-02-13 20:46:14 +00:00
ishaan	2386fb84a8	Enable the data loading infrastructure to switch the underlying file system. This patch enables loading data to s3 instead of hdfs. It is preliminary in nature, as such, there are a few caveats: - The fe tests do not work. - Only loading from a test-warehouse snapshot and metastore snapshot is enabled. - Until hive works with s3, only a subset of all the tests will work. Change-Id: Ia66a5f836b4245e3b022a49de805eec337a51324 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5851 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2015-02-03 01:02:42 -08:00
Dan Hecht	aaac1b36ad	S3: Allow DDL statements to reference non-HDFS file-systems This will allow you to create tables around data that already exists on S3. (Though INSERT and LOAD DATA don't support S3 yet). Also this will make it easier to create some test tables that are not on HDFS. Also, workaround HDFS-7031 (which is a "won't fix") where non-defaultFS paths can be qualified with the wrong authority. This is needed for Impala now that it can take non-HDFS paths as input. Change-Id: Ie513d50b26dfe5a71be284ad31a8c8151d0e30d3 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5417 Reviewed-by: Daniel Hecht <dhecht@cloudera.com> Tested-by: jenkins	2014-12-02 00:54:38 -08:00
Mike Yoder	d1e83f8280	Support for simultaneous LDAP and Kerberos authentication. Prior to this work, the impalad could either authenticate with Kerberos, or authenticate with LDAP. This fixes that so that both can co-exist in the same daemon. Prior code had both a KerberosAuthProvider and an LdapAuthProvider; this is refactored into a single SaslAuthProvider that potentially contains both LDAP and Kerberos. The terminology of "client facing" and "server facing" has been replaced with "external" and "internal". External is for clients like the impala shell, odbc, jdbc, etc. Internal is for daemon <-> daemon communication. The notion of the "auxprop" plugin is removed, as that was dead code. The Thrift code is enhanced to pass the Realm information from the SaslAuthProvider down to the underlying SASL library. Change-Id: I0a0b968a107c0b25610ca37295c3fee345ecdd6d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4051 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-18 12:54:45 -07:00
Mike Yoder	75a97d3d7e	[CDH5] Kerberize mini-cluster and Impala daemons This is the first iteration of a kerberized development environment. All the daemons start and use kerberos, with the sole exception of the hive metastore. This is sufficient to test impala authentication. When buildall.sh is run using '-kerberize', it will stop before loading data or attempting to run tests. Loading data into the cluster is known to not work at this time, the root causes being that Beeline -> HiveServer2 -> MapReduce throws errors, and Beeline -> HiveServer2 -> HBase has problems. These are left for later work. However, the impala daemons will happily authenticate using kerberos both from clients (like the impala shell) and amongst each other. This means that if you can get data into the mini-cluster, you could query it. Usage: * Supply a '-kerberize' option to buildall.sh, or * Supply a '-kerberize' option to create-test-configuration.sh, then 'run-all.sh -format', re-source impala-config.sh, and then start impala daemons as usual. You must reformat the cluster because kerberizing it will change all the ownership of all files in HDFS. Notable changes: * Added clean start/stop script for the llama-minikdc * Creation of Kerberized HDFS - namenode and datanodes * Kerberized HBase (and Zookeeper) * Kerberized Hive (minus the MetaStore) * Kerberized Impala * Loading of data very nearly working Still to go: * Kerberize the MetaStore * Get data loading working * Run all tests * The unknown unknowns * Extensive testing Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-05 12:36:21 -07:00
Henry Robinson	ff32821c6b	[CDH5] Test to confirm that ACLs are inherited correctly on INSERT Change-Id: I781a6b7203c2e12b484162954abae51a6443bead Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-07-09 19:04:55 -07:00
Alex Behm	c503e1aa20	Wait for the NN to exit safe mode before starting services that depend on it. Our testdata/run-all.sh can be brittle depending on the state of your Hdfs. In particular, Yarn depends on the NN not being in safe mode, but it may take some time for the NN to exit safe mode immediately after starting Hdfs. This patch makes the NN startup script complete only after the NN has exited safe mode. Change-Id: I8b30cd07128dc48d79d91726eafed4174fb91a6d Reviewed-on: http://gerrit.ent.cloudera.com:8080/3005 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3021	2014-06-13 01:36:34 -07:00
Henry Robinson	e87c0eb22a	[CDH5] Detect pseudo-distributed Llama cluster Since we're no longer using the MiniLlama, we need to explicitly set whether or not the cluster is pseudo-distributed. Impala needs this information to correctly translate datanode addresses to a format that Llama understands. This change (adapted from one made by Casey) adds a method to the frontend (callable via JNI) to get a configuration value from the Hadoop configuration. We'll set that configuration value for local RM testing. Change-Id: Ifd51db98a993ac0270dac2b832babbc394483c1a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2549 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-05-20 21:24:33 -07:00
ishaan	0fa87cba54	Reduce mini dfs logging verbosity. Currently, the default log level is set to DEBUG. This produces approximately 10-20 GB of logs per build, which is unacceptable. Change-Id: Ibbb48876fc72faa23d76f32166f31f0257a7a3a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2386 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2387	2014-04-28 23:42:48 -07:00
ishaan	405a6fbba3	[CDH5] Change the hdfs-site template to work for CDH5 The hdfs-site template in CDH5 is different from the one we fine in CDH5. Specifically: - It has entries that enable hdfs caching. - It uses the correct parameter name for hdfs block locations timeout. Change-Id: I0ca6bd84b074ccbb8f42243d37c5082b305f9bcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/2338 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-04-24 11:36:56 -07:00
casey	2351266d0e	Replace single process mini-dfs with multiple processes This should allow individual service components, such as a single nodemanager, to be shutdown for failure testing. The mini-cluster bundled with hadoop is a single process that does not expose the ability to control individual roles. Now each role can be controlled and configured independently of the others. Change-Id: Ic1d42e024226c6867e79916464d184fce886d783 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432 Tested-by: Casey Ching <casey@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-04-23 18:24:05 -07:00

41 Commits