Commit Graph

44 Commits

Author SHA1 Message Date
Tim Armstrong
6ec6aaae8e IMPALA-3695: Remove KUDU_IS_SUPPORTED
Testing:
Ran exhaustive tests.

Change-Id: I059d7a42798c38b570f25283663c284f2fcee517
Reviewed-on: http://gerrit.cloudera.org:8080/16085
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-18 01:11:18 +00:00
Joe McDonnell
f241fd08ac IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support
Impala 4 moved to using CDP versions for components, which involves
adopting Hive 3. This removes the old code supporting CDH components
and Hive 2. Specifically, it does the following:
1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true.
   USE_CDP_HIVE now has no effect on the Impala environment. This also
   means that bin/jenkins/build-all-flag-combinations.sh no longer
   include USE_CDP_HIVE=false as a configuration.
2. Remove USE_CDH_KUDU and default to getting Impala from the
   native toolchain.
3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including
   the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml.

There is a fair amount of code that still references the Hive major
version. Upstream Hive is now working on Hive 4, so there is a high
likelihood that we'll need some code to deal with that transition.
This leaves some code (such as maven profiles) and test logic in
place.

Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608
Reviewed-on: http://gerrit.cloudera.org:8080/15869
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-07 22:14:39 +00:00
Tim Armstrong
5989900ae8 IMPALA-9618: fix some usability issues with dev env
Automatically assume IMPALA_HOME is the source directory
in a couple of places.

Delete the cache_tables.py script and MINI_DFS_BASE_DATA_DIR
config var which had both bit-rotted and were unused.

Allow setting IMPALA_CLUSTER_NODES_DIR to put the minicluster
nodes, most important the data, in a different location, e.g.
on a different filesystem.

Testing:
I set up a dev environment using this code and was able to
load data and run some tests.

Change-Id: Ibd8b42a6d045d73e3ea29015aa6ccbbde278eec7
Reviewed-on: http://gerrit.cloudera.org:8080/15687
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-09 08:01:24 +00:00
Grant Henke
208d9d6896 IMPALA-9577: [test] Use system_unsync time for Kudu test clusters
Recently Kudu made enhancements to time source configuration and
adjusted the time source for local clusters/tests to `system_unsync`.

This patch mirrors that behavior in Impala test clusters given there is no
need to require NTP-synchronized clock for a test where all the
participating Kudu masters and tablet servers are run at the same node
using the same local wallclock.

See the Kudu commit here for details:
eb2b70d4b9

While making this change, I removed all ntp related packages and special
handling as they should not be needed in a development environment
any more. I also added curl and gawk which were missing in my
Docker ubuntu environment and broke my testing.

Testing:
I tested with the steps below using Docker for Mac:

  docker rm impala-dev
  docker volume rm impala
  docker run --privileged --interactive --tty --name impala-dev -v impala:/home -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash

  apt-get update
  apt-get install sudo
  adduser --disabled-password --gecos '' impdev
  echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
  su - impdev
  cd ~

  sudo apt-get --yes install git
  git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala
  cd ~/Impala
  export IMPALA_HOME=`pwd`
  git remote add fork https://github.com/granthenke/impala.git
  git fetch fork
  git checkout kudu-system-time

  $IMPALA_HOME/bin/bootstrap_development.sh

  source $IMPALA_HOME/bin/impala-config.sh
  (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest)
  (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest)

  $IMPALA_HOME/bin/start-impala-cluster.py
  ./tests/run-tests.py query_test/test_kudu.py

Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Reviewed-on: http://gerrit.cloudera.org:8080/15568
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-03-31 19:38:10 +00:00
Tim Armstrong
6f150d383c IMPALA-9361: manually configured kerberized minicluster
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.

After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.

This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).

Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.

Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-08 05:16:12 +00:00
Joe McDonnell
0163a10332 IMPALA-9068: Use different directories for external vs managed warehouse
Hive 3 changed the typical storage model for tables to split them
between two directories:
 - hive.metastore.warehouse.dir stores managed tables (which is now
   defined to be only transactional tables)
 - hive.metastore.warehouse.external.dir stores external tables
   (everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.

Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.

The Hive 2 configuration doesn't change as it does not have this concept.

Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.

Testing:
 - Ran exhaustive tests with USE_CDP_HIVE=false
 - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
 - Verified that dataload succeeds and tests are able to run with a newer
   Hive version.

Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-24 17:29:15 +00:00
Joe McDonnell
d873922e71 Fix condition for starting YARN on USE_CDP_HIVE=true
Dataload on Hive 3 uses YARN, but YARN is not needed for Hive 2
dataload. This fixes the condition so that YARN does not start
for USE_CDP_HIVE=false (Hive 2). This fixes a similar condition
for manipulating the classpath in testdata/bin/run-hive-server.sh.

Change-Id: If2b0a529eadd5a436f0318229600180f72a27207
Reviewed-on: http://gerrit.cloudera.org:8080/13343
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-16 00:23:02 +00:00
Todd Lipcon
17daa6efb9 IMPALA-8369 (part 2): Hive 3: switch to Tez-on-YARN execution
This switches away from Tez local mode to tez-on-YARN. After spending a
couple of days trying to debug issues with Tez local mode, it seemed
like it was just going to be too much of a lift.

This patch switches on the starting of a Yarn RM and NM when
USE_CDP_HIVE is enabled. It also switches to a new yarn-site.xml with a
minimized set of configurations, generated by the new python templating.

In order for everything to work properly I also had to update the Hadoop
dependency to come from CDP instead of CDH when using CDP Hive.
Otherwise, the classpath of the launched Tez containers had conflicting
versions of various Hadoop classes which caused tasks to fail.

I verified that this fixes concurrent query execution by running queries
in parallel in two beeline sessions. With local mode, these queries
would periodically fail due to various races (HIVE-21682). I'm also able
to get farther along in data loading.

Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d
Reviewed-on: http://gerrit.cloudera.org:8080/13224
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
2019-05-10 13:42:55 +00:00
Hector Acosta
7cdeab7f75 IMPALA-8307 Use test -x to check for ntp-wait
Running ntp-wait --help can return 0 or 1 depending on the
output of ntpq. For example:

$ sudo ntp-wait --help 2> /dev/null; echo $?
0
$ sudo killall ntpd
$ sudo ntp-wait --help 2> /dev/null; echo $?
1

This commit instead tests whether ntp-wait exists and is executable to
determine if it's installed.

Change-Id: I53c63dfa651ac242050171da70540d24c4caf32c
Reviewed-on: http://gerrit.cloudera.org:8080/12726
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-27 15:49:25 +00:00
Michael Brown
72b845279e IMPALA-8091 addendum: use absolute path for ntp-wait
While ntp-wait is in the default $PATH on Ubuntu, it's not on CentOS.
Use the absolute path of the binary (same in both flavors).

Testing:
- Ran privately on CentOS both with and without ntp-wait installed; GVO
  will cover Ubuntu.

Change-Id: I5d709d121a71e62b8ee4027db81b53108f389fdd
Reviewed-on: http://gerrit.cloudera.org:8080/12369
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-05 22:04:44 +00:00
Michael Brown
6c0ec348a7 IMPALA-8091: incremental improvements to NTP sync
- Warn when ntp-wait is not available.

- Install ntp-perl for Redhat-based systems, which provides ntp-wait.
  Debian-based systems already have ntp-wait as provided by ntp.

Change-Id: Ifc8cacf81765d132dd574993f8149231bdbb7bc6
Reviewed-on: http://gerrit.cloudera.org:8080/12253
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-23 20:47:07 +00:00
Tim Armstrong
ff628d2b13 IMPALA-7986,IMPALA-7987: run daemons in docker containers
This refactors start-impala-cluster.py to allow multiple implementations
of the minicluster operations like start and stop. There are now
two classes implementing the same set of operations -
MiniClusterOperations and DockerMiniClusterOperations. The docker
versions start and stop the containers added in IMPALA-7948.

With some configuration (see instructions below), the containers can
connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the
host. Config generation was modified so that services optionally
communicate via the docker bridge network rather than loopback
(the host's loopback interface is not accessible to the containers).

Notes:
* I improved the container build to regenerate containers when cluster
  configs are regenerated (previously the containers could have stale
  configs).
* Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker
  run" without clobbering default args.
* Python 2.6 is not supported for this code path. This only affects
  CentOS 6, which has limited support for docker anyway.
* I deferred implementing wait_for_cluster(), since the existing
  code requires surgery to abstract out assumptions about locating
  processes and web UI ports - see IMPALA-7988.

How to use:
==========
Create a docker network to use for internal cluster communication,
e.g.:
  docker network create -d bridge --gateway=172.17.0.1 \
      --subnet=172.17.0.1/16 impala-cluster

Add the gateway address of the docker network you created to
impala-config-local.sh, e.g.:

  export INTERNAL_LISTEN_HOST=172.17.0.1
  export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500

Regenerate configs and docker images:

  . bin/impala-config.sh
  ./bin/create-test-configuration.sh
  ninja -j $IMPALA_BUILD_THREADS docker_images

Restart the minicluster and Impala services to pick up the config:

  ./testdata/bin/run-all.sh
  start-impala-cluster.py --docker_network impala-cluster

You can connect with impala-shell and run some queries. You will
likely run into issues, particularly if running against an existing
data load, since "localhost" or "127.0.0.1" get baked into HMS
table definitions.

Testing:
Ran exhaustive tests (not using Docker) to make sure I didn't break
anything.

Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11
Reviewed-on: http://gerrit.cloudera.org:8080/12095
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-18 04:56:49 +00:00
David Knupp
6e5ec22b12 IMPALA-7399: Emit a junit xml report when trapping errors
This patch will cause a junitxml file to be emitted in the case of
errors in build scripts. Instead of simply echoing a message to the
console, we set up a trap function that also writes out to a
junit xml report that can be consumed by jenkins.impala.io.

Main things to pay attention to:

- New file that gets sourced by all bash scripts when trapping
  within bash scripts:

  https://gerrit.cloudera.org/c/11257/1/bin/report_build_error.sh

- Installation of the python lib into impala-python venv for use
  from within python files:

  https://gerrit.cloudera.org/c/11257/1/bin/impala-python-common.sh

- Change to the generate_junitxml.py file itself, for ease of
  https://gerrit.cloudera.org/c/11257/1/lib/python/impala_py_lib/jenkins/generate_junitxml.py

Most of the other changes are to source the new report_build_error.sh
script to set up the trap function.

Change-Id: Idd62045bb43357abc2b89a78afff499149d3c3fc
Reviewed-on: http://gerrit.cloudera.org:8080/11257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 18:33:58 +00:00
Tim Armstrong
8f7a3f9c6c IMPALA-7124: delete keystore when formatting cluster
This avoids leftover keystore files from previous instances
of the cluster, e.g. testdata/cluster/cdh6/node-1/data/kms.keystore.
I checked on my local system and that is the only file outside of
the nn and dn subdirectories.

Change-Id: I9e6c1afa0df95b26446ccafa0c942e28ba848dae
Reviewed-on: http://gerrit.cloudera.org:8080/10613
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
2018-06-06 17:16:15 +00:00
Taras Bobrovytsky
c05696dd6a IMPALA-6949: Add the option to start the minicluster with EC enabled
In this patch we add the "ERASURE_CODING" enviornment variable. If we
enable it, a cluster with 5 data nodes will be created during data
loading and HDFS will be started with erasure coding enabled.

Testing:
I ran the core build, and verified that erasure coding gets enabled in
HDFS. Many of our EE tests failed however.

Cherry-picks: not for 2.x

Change-Id: I397aed491354be21b0a8441ca671232dca25146c
Reviewed-on: http://gerrit.cloudera.org:8080/10275
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-05 01:20:59 +00:00
Philip Zeyliger
6dc13d933b Remove Yarn from minicluster by default. (2nd try)
Remove Yarn from minicluster by default.

Turns out that we start Yarn as part of the minicluster, but we never use it.
(HiveServer2 is configured to run MR jobs "locally" in process.) Likely, this
Yarn integration is a vestige of Yarn/Llama integration.  We can save memory by
not starting it by default.

There are some less-common tooks like tests/comparison/cluster.py which use
Yarn (and Hadoop Streaming). In deference to those tools, I've left a mechanism
to start Yarn rather than excising it altogether. After running
buildall the regular way, add Yarn to the cluster by running:
  testdata/cluster/admin -y start_cluster

I tested by running core tests. I did not test the kerberized minicluster.

[Due to a git mishap, a version of this was previously checked in and reverted.]

Change-Id: I97053a44bbe32048e6c35cc28680d1c7696af13f
Reviewed-on: http://gerrit.cloudera.org:8080/9970
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-10 20:15:30 +00:00
Philip Zeyliger
01b6995abf Revert "Remove Yarn from minicluster by default."
This reverts commit c05df104570fa2cb7067599bbe3b87740ca9f09e.

Change-Id: I00151795581d22a9852cceaca1d21325d68dbe59
Reviewed-on: http://gerrit.cloudera.org:8080/9969
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Philip Zeyliger <philip@cloudera.com>
2018-04-10 16:21:09 +00:00
Philip Zeyliger
942781d80f Remove Yarn from minicluster by default.
Turns out that we start Yarn as part of the minicluster, but we never use it.
(HiveServer2 is configured to run MR jobs "locally" in process.) Likely, this
Yarn integration is a vestige of Yarn/Llama integration.  We can save memory by
not starting it by default.

There are some less-common tooks like tests/comparison/cluster.py which use
Yarn (and Hadoop Streaming). In deference to those tools, I've left a mechanism
to start Yarn rather than excising it altogether. After running
buildall the regular way, add Yarn to the cluster by running:
  testdata/cluster/admin -y start_cluster

I tested by running core tests. I did not test the kerberized minicluster.

Change-Id: I5504cc40b89e3c6d53fac0b7aa4b395fa63e8d79
2018-04-10 09:17:28 -07:00
Laszlo Gaal
9e7fb830fd IMPALA-4088: Assign fix values to the minicluster server ports
The minicluster setup logic assigned fixed port numbers to several
but not all listening sockets of the data nodes. This change
assigns similar port ranges to all the listening ports that were
so far allowed to pick their own port numbers, interfering with
other components, e.g. HBase.

Change-Id: Iecf312873b7026c52b0ac0e71adbecab181925a0
Reviewed-on: http://gerrit.cloudera.org:8080/6531
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-07 22:57:16 +00:00
Lars Volker
85d7f5eb2b IMPALA-4733: Change HBase ports to non-ephemeral
We've seen repeated test failures because HBase tries to bind to ports
in the ephemeral port range, which sometimes would already be occupied
by outgoing connections of other proccesses.

This change changes the ports to the new default HBase ports
(HBASE-10123):

HBase Master Port: 60000 -> 16000
HBase Master Web UI Port: 60010 -> 16010
HBase ReqionServer Port: 60020 -> 16020
HBase ReqionServer Web UI Port: 60030 -> 16030
HBase Status Multicast Port: 60100 -> 16100

This made it necessary to change the default KMS port, too
(HADOOP-12811):

KMS HTTP port: 16000 -> 9600

Change-Id: I6f8af325e34b6e352afd75ce5ddd2446ce73d857
Reviewed-on: http://gerrit.cloudera.org:8080/6524
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
2017-04-04 00:28:29 +00:00
Jim Apple
84ee40428d IMPALA-4588: Enable starting the minicluster when offline
IMPALA-4553 made ntp-wait succeed before kudu would start, assuming
ntp-wait was installed, in order to prevent a litany of errors on ec2
about unsynchronized clocks. This patch disables that waiting if no
internet connection is detected in order to make it possible to start
the minicluster when offline.

Change-Id: Ifbb5babebb0ca6d2553be1b001e20e2270e052b6
Reviewed-on: http://gerrit.cloudera.org:8080/5412
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2016-12-09 03:04:07 +00:00
Jim Apple
90bf40d213 IMPALA-4553: ntpd must be synchronized for kudu to start.
When ntpd is not synchronized, kudu initialization fails on the master
node:

    F1129 16:37:28.969956 15230 master_main.cc:68] Check failed:
    _s.ok() Bad status: Service unavailable: Cannot initialize clock:
    Error reading clock. Clock considered unsynchronized

Change-Id: I371e01e21246a8c0ece98ca7d4bf6761615127b4
Reviewed-on: http://gerrit.cloudera.org:8080/5258
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2016-11-30 00:28:02 +00:00
Lars Volker
ef4c9958d0 IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo
This change removes some of the occurrences of the strings 'CDH'/'cdh'
from the Impala repository. References to Cloudera-internal Jiras have
been replaced with upstream Jira issues on issues.cloudera.org.

For several categories of occurrences (e.g. pom.xml files,
DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to
remove the occurrences left after this change.

Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8
Reviewed-on: http://gerrit.cloudera.org:8080/4187
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-13 00:40:41 +00:00
Henry Robinson
1b9d9ea7c1 IMPALA-4160: Remove some leftover Llama references
Change-Id: I62e12363ab3ecca42bf7a82be3c2df01bc47cdca
Reviewed-on: http://gerrit.cloudera.org:8080/4493
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-09-22 02:10:32 +00:00
Henry Robinson
19de09ab7d IMPALA-4160: Remove Llama support.
Alas, poor Llama! I knew him, Impala: a system
of infinite jest, of most excellent fancy: we hath
borne him on our back a thousand times; and now, how
abhorred in my imagination it is!

Done:

* Removed QueryResourceMgr, ResourceBroker, CGroupsMgr
* Removed untested 'offline' mode and NM failure detection from
  ImpalaServer
* Removed all Llama-related Thrift files
* Removed RM-related arguments to MemTracker constructors
* Deprecated all RM-related flags, printing a warning if enable_rm is
  set
* Removed expansion logic from MemTracker
* Removed VCore logic from QuerySchedule
* Removed all reservation-related logic from Scheduler
* Removed RM metric descriptions
* Various misc. small class changes

Not done:

* Remove RM flags (--enable_rm etc.)
* Remove RM query options
* Changes to RequestPoolService (see IMPALA-4159)
* Remove estimates of VCores / memory from plan

Change-Id: Icfb14209e31f6608bb7b8a33789e00411a6447ef
Reviewed-on: http://gerrit.cloudera.org:8080/4445
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-09-20 23:50:43 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Michael Brown
08e8de73b2 IMPALA-3806: remove a few modern shell idioms to improve RHEL5 support
Both `find -executable` and the Bash "&>>" operator are too new to be
supported on RHEL5. Both have reasonable workarounds, so prefer them.
Note that this may not be the exhaustive list of such "modern"
conventions, but RHEL5 isn't working end-to-end, so we can't identify
all of them in a single commit yet.

Testing:

Before, the RHEL5 build would fail quite early here. Now, data load
succeeds and most of the backend tests successfully run.

Change-Id: I7438bed908d8026327923607238808122212d2d8
Reviewed-on: http://gerrit.cloudera.org:8080/3531
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Internal Jenkins
2016-07-05 13:37:26 -07:00
Matthew Jacobs
40b79aecbc IMPALA-3417: run-all.sh fails when no services should start
Change-Id: I268c7ad66c82f2b04b832d520e21662e631572cf
Reviewed-on: http://gerrit.cloudera.org:8080/3250
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-06-02 09:32:54 -07:00
Anuj Phadke
a915293109 IMPALA-1850: Allow fs.defaultFS to be set to a non-HDFS filesystem
This change whitelists the supported filesystems which can be set
as Default FS for Impala to run on.
This patch configures Impala to use S3 as the default filesystem, rather
than a secondary filesystem as before.

Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed
Reviewed-on: http://gerrit.cloudera.org:8080/1121
Reviewed-by: anujphadke <aphadke@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:40 -07:00
casey
a27946e696 Improve mini-cluster usability (testdata/cluster/admin)
Changes:
1) Previously when a service would fail, the user would have to find the
   the log file and open it. Now the end of the log is dumped to stdout.
2) Add start, stop, and restart commands to the "admin" script. For
   example now you can run
     testdata/cluster/admin restart kudu
3) Wait up to 120 seconds for services to shutdown. The timeout is the
   same as for the Impala processes. If the services fail to stop an
   error will be raised.

Change-Id: I537ea5656df2081d4f1f27a9f3fcef4547fdc2fe
Reviewed-on: http://gerrit.cloudera.org:8080/2751
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:37 -07:00
casey
6f4a5e6bb0 Kudu: Use -block_manager=file if "hole punching" isn't supported
By default Kudu requires the underlying file system to support hole
punching. If support isn't there Kudu will fail to start. People using
such a file system can instead start Kudu with -block_manager=file.
Before starting Kudu in the local mini-cluster, the "fallocate"
command will be used to automatically determine if the special flag is
needed.

Note, users who need this must run bin/create-test-configuration.sh
after pulling in this commit.

This also fixes a bug in the delete_kudu_data() in the cluster admin
script. A directory name was incorrect.

Change-Id: I1ca7fedb367444c41e462b72b0b76091ee94e27c
Reviewed-on: http://gerrit.cloudera.org:8080/2750
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:36 -07:00
casey
cef87e39dc Updates for new Kudu toolchain layout and upgrade Kudu
The directory structure of the newer Kudu toolchain artifacts has
changed. Now the root directory is split into /release and /debug. A few
little updates are needed to the build and service scripts.

Since the toolchain no longer provides stubs for platforms that Kudu
doesn't support the stubs need to be generated. This will be done as
part of the toolchain bootstrapping.

Also this upgrades Kudu to 0.8 RC1.

Developers will need to run bin/create-test-configuration.sh after
pulling in this change. Otherwise the Kudu service will fail to start.

Change-Id: I625903bd92afece0ad819a96fc275d5812b5eb2a
Reviewed-on: http://gerrit.cloudera.org:8080/2720
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:35 -07:00
Sailesh Mukil
86fd262dc9 IMPALA-3324: Hive server does not start for S3 builds.
The hive server does not start for S3 builds because HDFS is marked
as an unsupported service in testdata/cluster/admin; and so HDFS is
not started at all, and so the Hive server is unable to start as well.
Due to this, all our S3 builds fail.
Currently our S3 builds need HDFS to run correctly.

(This has to be reverted once IMPALA-1850 goes in, because then S3 can
run as a default FS without HDFS)

Change-Id: Ibda9dc3ef895c2aa4d39eb5694ac5f2dbd83bee4
Reviewed-on: http://gerrit.cloudera.org:8080/2741
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-04-12 14:03:43 -07:00
Casey Ching
9d43aac6ce IMPALA-3274: Always start Kudu for testing
Previously Kudu would only be started when the test configuration was
the standard mini-cluster. That led to failures during data loading when
testing without the mini-cluster (ex: local file system). Kudu doesn't
require any other services so now it'll be started for all test
environments.

Change-Id: I92643ca6ef1acdbf4d4cd2fa5faf9ac97a3f0865
Reviewed-on: http://gerrit.cloudera.org:8080/2690
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-04-12 14:02:35 -07:00
Casey Ching
39a28185e8 Re-enable Kudu in build using client stubs when needed
The stubs in Impala broke during the merge commit. This commit removes
the stubs in hopes of improving robustness of the build. The original
problem (Kudu clients are only available for some OSs) is now addressed
by moving the stubbing into a dummy Kudu client. The dummy client only
allows linking to succeed, if any client method is called, Impala will
crash. Before calling any such method, Kudu availability must be
checked.

Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712
Reviewed-on: http://gerrit.cloudera.org:8080/2585
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-03-29 23:57:54 +00:00
Alex Behm
7e76e92bef Consolidate test and cluster logs under a single directory.
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.

The new structure is as follows:

$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala

$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading

$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests

$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests

$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests

$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests

I tested this change with a full data load which
was successful.

Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-03-28 19:23:22 +00:00
casey
804cfbdd64 Get and use Kudu from the toolchain by default
This is for review purposes only. This patch will be merged with David's
big merge patch.

Changes:
1) Make Kudu compilation dependent on the OS since not all OSs support
   Kudu.
2) Only run Kudu related tests when Kudu is supported (see #1).
3) Look for Kudu locally, but in a different location. To use a local
   build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and
   set KUDU_CLIENT_DIR to the path KUDU was installed in.
   Example:
     git clone https://github.com/cloudera/kudu.git
     ...build 3rd party etc...
     mkdir -p $KUDU_BUILD_DIR
     cd $KUDU_BUILD_DIR
     cmake <path to Kudu source dir>
     make
     DESTDIR=$KUDU_CLIENT_DIR make install
4) Look for Kudu in the toolchain if not using a local Kudu build.
5) Add Kudu service startup scripts. The Kudu in the toolchain is
   actually a parcel that has been renamed (the contents were not
   modified in any way), that mean the Kudu service binaries are there.
   Those binaries are now used to run the Kudu service.

Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58
2016-03-11 11:38:05 -08:00
casey
3a3497e819 Fix cluster start script for RHEL 7
The change to the start script for OSX used "find" with the "-perm
+0111" option as an "executables only" filter but that doesn't work
with newer versions of "find". "-perm +" has been deprecated or removed
(depending on the version) in Linux. I couldn't find a OSX+Linux
compatible filter.

The variable IS_OSX was added and used to choose the appopriate filter.

Change-Id: I0c49f78e816147c820ec539cfc398fb77b83307a
Reviewed-on: http://gerrit.cloudera.org:8080/1630
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-12-29 23:25:02 +00:00
Casey Ching
e2bfb6ae2f Misc improvements to shell scripts about error reporting
Changes:
  1) Consistently use "set -euo pipefail".
  2) When an error happens, print the file and line.
  3) Consolidated some of the kill scripts.
  4) Added better error messages to the load data script.
  5) Changed use of #!/bin/sh to bash.

Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-12-17 18:25:27 +00:00
casey
c56ba5149c Infra scripts: Only attempt to kill processes owned by the current user
This is for compatibility with docker containers. Before this patch,
when the scripts were run on the docker host, the scripts  would try
to kill the mini-cluster in the docker containers and fail because they
didn't have permissions (the user is different). Now the scripts will
only try to kill mini-cluster processes that were started by the current
user.

Also some psutil availability checks were removed because psutil is now
provided by the python virtualenv.

Change-Id: Ida371797bbaffd0a3bd84ab353cb9f466ca510fd
Reviewed-on: http://gerrit.cloudera.org:8080/1541
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-12-17 12:08:33 +00:00
Martin Grund
577e335122 OS X: cluster admin script compatibility
admin was using the -executable flag of find that is not available on
Mac. This patch replaces it with "-perm +0111 -type f" which is similar
semantics. In addition, there seem to be differences in which shell
builtins are available so some changes have been made to fix that issue.

Change-Id: I9b2ecbd5bf6a9b1610e7ca9f15b1a4d1407b94c1
Reviewed-on: http://gerrit.cloudera.org:8080/1612
Reviewed-by: Casey Ching <casey@cloudera.com>
Readability: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-12-11 22:09:45 +00:00
Matthew Jacobs
835d6dbef4 IMPALA-1209: Add KMS service to testdata cluster (pt1)
First change for IMPALA-1209 to address Impala limitations when
using HDFS encryption. This adds a KMS process to the testdata
cluster. This was tested manually by creating a key and an
encryption zone.

Change-Id: I499154506386f04e71c5371b128c10868b1e1318
Reviewed-on: http://gerrit.cloudera.org:8080/41
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-02-13 20:46:14 +00:00
Mike Yoder
75a97d3d7e [CDH5] Kerberize mini-cluster and Impala daemons
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore.  This is sufficient to test impala authentication.

When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.

Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems.  These are
left for later work.

However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.

Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
  'run-all.sh -format', re-source impala-config.sh, and then start
  impala daemons as usual.  You must reformat the cluster because
  kerberizing it will change all the ownership of all files in HDFS.

Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working

Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing

Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-05 12:36:21 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00