Commit Graph

20 Commits

Author SHA1 Message Date
Casey Ching
f288867833 Stress test: Various changes
The major changes are:

1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
   random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
   remote or local cluster. This also moves and consolidates some
   Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
   messy.

Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-20 23:00:25 +00:00
Bharath Vissapragada
26429aee4d IMPALA-2624: Increase fs.trash.interval to 24 hours for test suite
Some of the tests rely on hdfs trash mechanism to be enabled and poll
the paths in the trash directory during test runs. These tests are
failing intermittenly due to a race with the hdfs trash checkpointing
mechanism which moves all the trash contents to another directory.
This checkpointing runs every fs.trash.checkpoint.interval minutes
and defaults to fs.trash.interval (when set to 0). Currently there
seems to be no way to disable this checkpointing. This patch increases
the fs.trash.interval from the current value of 30 minutes to 24 hours
so that the test runs never hit this race condition.

Change-Id: I42fcaee70a461712f1df6bac23c71f915718b015
Reviewed-on: http://gerrit.cloudera.org:8080/1703
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Internal Jenkins
2016-01-06 12:01:18 +00:00
Martin Grund
44f4d4250b Fix YARN configuration to pickup LZO
Until now, our YARN configuration was broken so that we weren't able to
run local Map Reduce jobs. The jobs would fail with a class not found
exception of the LZO codec. This patch fixes this issues and corrects
the classpath.

Change-Id: I689cca7a079dbd269d4bd96f1b4e3d91147d527c
Reviewed-on: http://gerrit.cloudera.org:8080/1667
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-12-18 07:01:07 +00:00
Bharath Vissapragada
4ed0742f3e IMPALA-2310: Add PURGE option to DROP TABLE/ALTER TBL DROP PART
This commit adds PURGE option to DROP TABLE/ALTER TABLE DROP
PARTITION statements. Following is the usage:

1. DROP TABLE <tablename> takes an optional argument PURGE. Adding
purge purges the table data by skipping trash, if configured.

  DROP TABLE [<database>.]<tablename> [IF EXISTS] [PURGE]

2. PURGE is also supported with alter table drop partition query
with the following syntax. If specified, impala purges the partition
data by skipping trash.

  ALTER TABLE [<database>.]<tablename> DROP PARTITION [IF EXISTS] [PURGE]

This patch also helps the use case where Trash and the data directories
are in different encryption zones, in which case we cannot move the data
during ALTER/DROP. Then purge option can be used to skip the trash and
make sure data is actually deleted.

Change-Id: I64bf71d660b719896c32e0f3a7ab768f30ec7b3b
(cherry picked from commit 585d4f8d9e809f3bf194018dd161a22d3f144270)
Reviewed-on: http://gerrit.cloudera.org:8080/1244
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
2015-10-14 17:51:37 -07:00
ishaan
377214c469 Use Isilon as the default file system when running Isilon tests.
This patch enables running Impala tests against Isilon as the default file system. The
intention is to run tests against a realistic deployment, i.e, Isilon replacing HDFS as
the underlying filesystem.

Specifically, it does the following:
  - Adds a new environment variable DEFAULT_FS, which points to HDFS by default.
  - Makes the fs.defaultFs property in core-site.xml use the DEFAULT_FS environment
    variable, such that all clients talk to Isilon implicitly.
  - Unset FILESYSTEM_PREFIX when the TARGET_FILESYSTEM is Isilon, since path prefixes
    are no longer needed.
  - Only starts the Hive Metastore and the Impala service stack when running
    tests against Isilon.

We don't start KMS/HBase because they're not relevant to Isilon. We also don't
start YARN, Hive and LLama because hive queries are disabled with Isilon.

The scripts that start/stop Hive, YARN and Llama should be modified to point to a
filesystem other than HDFS in the future.

Change-Id: Id66bfb160fe57f66a64a089b465b536c6c514b63
Reviewed-on: http://gerrit.cloudera.org:8080/449
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-06-11 01:23:11 +00:00
Matthew Jacobs
bc3a46daab Change minicluster llama log level to INFO
Change-Id: Ifa83cb437f807c5cbd9f2259a570c1af39340811
Reviewed-on: http://gerrit.cloudera.org:8080/402
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Matthew Jacobs <mj@cloudera.com>
2015-05-20 21:11:49 +00:00
Matthew Jacobs
456e99b21b Mini cluster configuration change for Yarn and log4j
Update the yarn-site.xml to reduce the latency of
resource acquisition.

Also changes the log4j properties to reduce the very
verbose logging for the hadoop daemons which was consuming
huge amounts of space very quickly.

Change-Id: I8532fb5125b604974e26ddad76aee93b9c4e64fb
Reviewed-on: http://gerrit.cloudera.org:8080/381
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-05-19 23:05:44 +00:00
ishaan
058978dccb Enable using isilon as the underlying filesystem.
This patch enables the Impala test suite to run the end to end tests
against an isilon namenode. There are a few caveats:
  - The fe test will currently not work.
  - Only loading data from both the test-warehouse snapshot and the metadata snapshot is
    supported.
  - The test suite cannot be run by multiple people (unless we have access to multiple
    isilon namenodes)

Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237
Reviewed-on: http://gerrit.cloudera.org:8080/356
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-05-12 01:28:19 +00:00
Martin Grund
86d4516c44 Fix bad invocation of cluster startup scripts
The templates for starting the services of the cluster had a bad
declaration of the shebang that made it impossible to start kms when
using a non-bash default shell.

Change-Id: I6b105b328dc61e71095c2d5e5d6859f65ca56a18
Reviewed-on: http://gerrit.cloudera.org:8080/293
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-03-27 02:30:30 +00:00
Matthew Jacobs
835d6dbef4 IMPALA-1209: Add KMS service to testdata cluster (pt1)
First change for IMPALA-1209 to address Impala limitations when
using HDFS encryption. This adds a KMS process to the testdata
cluster. This was tested manually by creating a key and an
encryption zone.

Change-Id: I499154506386f04e71c5371b128c10868b1e1318
Reviewed-on: http://gerrit.cloudera.org:8080/41
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-02-13 20:46:14 +00:00
ishaan
2386fb84a8 Enable the data loading infrastructure to switch the underlying file system.
This patch enables loading data to s3 instead of hdfs. It is preliminary in nature,
as such, there are a few caveats:
 - The fe tests do not work.
 - Only loading from a test-warehouse snapshot and metastore snapshot is enabled.
 - Until hive works with s3, only a subset of all the tests will work.

Change-Id: Ia66a5f836b4245e3b022a49de805eec337a51324
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5851
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2015-02-03 01:02:42 -08:00
Dan Hecht
aaac1b36ad S3: Allow DDL statements to reference non-HDFS file-systems
This will allow you to create tables around data that already exists on
S3.  (Though INSERT and LOAD DATA don't support S3 yet). Also this will
make it easier to create some test tables that are not on HDFS.

Also, workaround HDFS-7031 (which is a "won't fix") where non-defaultFS
paths can be qualified with the wrong authority. This is needed for
Impala now that it can take non-HDFS paths as input.

Change-Id: Ie513d50b26dfe5a71be284ad31a8c8151d0e30d3
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5417
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
2014-12-02 00:54:38 -08:00
Mike Yoder
d1e83f8280 Support for simultaneous LDAP and Kerberos authentication.
Prior to this work, the impalad could either authenticate with
Kerberos, or authenticate with LDAP.  This fixes that so that both can
co-exist in the same daemon.  Prior code had both a
KerberosAuthProvider and an LdapAuthProvider; this is refactored into
a single SaslAuthProvider that potentially contains both LDAP and
Kerberos.

The terminology of "client facing" and "server facing" has been
replaced with "external" and "internal".  External is for clients like
the impala shell, odbc, jdbc, etc.  Internal is for daemon <-> daemon
communication.

The notion of the "auxprop" plugin is removed, as that was dead code.

The Thrift code is enhanced to pass the Realm information from the
SaslAuthProvider down to the underlying SASL library.

Change-Id: I0a0b968a107c0b25610ca37295c3fee345ecdd6d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4051
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-18 12:54:45 -07:00
Mike Yoder
75a97d3d7e [CDH5] Kerberize mini-cluster and Impala daemons
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore.  This is sufficient to test impala authentication.

When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.

Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems.  These are
left for later work.

However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.

Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
  'run-all.sh -format', re-source impala-config.sh, and then start
  impala daemons as usual.  You must reformat the cluster because
  kerberizing it will change all the ownership of all files in HDFS.

Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working

Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing

Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-05 12:36:21 -07:00
Henry Robinson
ff32821c6b [CDH5] Test to confirm that ACLs are inherited correctly on INSERT
Change-Id: I781a6b7203c2e12b484162954abae51a6443bead
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3076
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-07-09 19:04:55 -07:00
Alex Behm
c503e1aa20 Wait for the NN to exit safe mode before starting services that depend on it.
Our testdata/run-all.sh can be brittle depending on the state of your Hdfs.
In particular, Yarn depends on the NN not being in safe mode, but it may take
some time for the NN to exit safe mode immediately after starting Hdfs.
This patch makes the NN startup script complete only after the NN has exited
safe mode.

Change-Id: I8b30cd07128dc48d79d91726eafed4174fb91a6d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3005
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3021
2014-06-13 01:36:34 -07:00
Henry Robinson
e87c0eb22a [CDH5] Detect pseudo-distributed Llama cluster
Since we're no longer using the MiniLlama, we need to explicitly set
whether or not the cluster is pseudo-distributed. Impala needs this
information to correctly translate datanode addresses to a format that
Llama understands.

This change (adapted from one made by Casey) adds a method to the
frontend (callable via JNI) to get a configuration value from the Hadoop
configuration. We'll set that configuration value for local RM testing.

Change-Id: Ifd51db98a993ac0270dac2b832babbc394483c1a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2549
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-05-20 21:24:33 -07:00
ishaan
0fa87cba54 Reduce mini dfs logging verbosity.
Currently, the default log level is set to DEBUG. This produces approximately 10-20 GB of
logs per build, which is unacceptable.

Change-Id: Ibbb48876fc72faa23d76f32166f31f0257a7a3a0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2386
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2387
2014-04-28 23:42:48 -07:00
ishaan
405a6fbba3 [CDH5] Change the hdfs-site template to work for CDH5
The hdfs-site template in CDH5 is different from the one we fine in CDH5. Specifically:
  - It has entries that enable hdfs caching.
  - It uses the correct parameter name for hdfs block locations timeout.

Change-Id: I0ca6bd84b074ccbb8f42243d37c5082b305f9bcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2338
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-04-24 11:36:56 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00