Commit Graph

312 Commits

Author SHA1 Message Date
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Nong Li
85be9a5050 Update bin/make* -notests to include other artifacts for packages.
Change-Id: I95e95f0a2e2131875b95d6676620bec7117b7f8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2250
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-16 00:37:00 -07:00
Nong Li
f9dd32724c Cleanup build scripts.
Consolidated our build scripts and added the -notests option which skips
build the BE tests.

Change-Id: Ida6aa064b7fe47e535c142b9af92b7c158e83c32
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2043
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2201
2014-04-13 17:11:39 -07:00
Lenni Kuff
d101ef86e2 [CDH5] Bump version to 1.4.0-cdh5-INTERNAL
Change-Id: I0a0334084e444c948f1718133afb2d7246dde414
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2193
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-11 16:03:09 -07:00
Henry Robinson
8e5848eaf8 RM fixes to get tests passing
* One last NotifyThreadUsageChange() mismatched pair
* Don't set resource in plan fragment params if there isn't a resource
  available. This fixes the problem where if no fragment with resources
  was assigned to the same node as the coordinator, the coordinator
  would have a dummy resource allocation which didn't work with
  expansion.
* Substitute #ID in all impalad arguments to start-impala-cluster.py
  with the 0-indexed ID of the impalad being started. This is required
  to have different Impala processes use different cgroups.

Change-Id: If8c8fd8bef0809bdaf16115a45a9695fc2bf3e1b
(cherry picked from commit c71ce45e97570b8c09900eb5ae2e26984d3306a4)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2060
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-03-24 15:07:45 -07:00
Lenni Kuff
3d82c9a5d6 Bump version from 1.3.0-INTERNAL to cdh5-1.3.0
Change-Id: Ib7a37b190091a3f9eb6d6f0f560dd40aed23e231
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2031
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-20 17:22:11 -07:00
Lenni Kuff
86f69fb96f IMP-1306: Fix build scripts to properly generate Impala version info for packaging builds
The problem was that were were deleting the version.info file because the default
of gen_build_version.py recently changed from --noclean to --clean.

Also fixed a bug in the shell version generation and made debugging a bit easier
by dumping the contents of version.info whenever it is generated.

Change-Id: I764d01c9e46eed1bd39de79bf076c15afa599486
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1901
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit fa673b4d3342fc825ee7fa942bd254234d222906)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1910
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-14 08:45:16 -07:00
Matthew Jacobs
a283d72cdd [cdh5] Add latest cdh5 hadoop, hbase, and hive snapshots to thirdparty
Change-Id: I60c93b259a26e86aca60f2b3b5b6226eabc0b5eb
2014-03-05 01:06:09 -08:00
Lenni Kuff
8a16709265 Perform prioritized load requests for missing tables in HS2 metadata ops
The HS2 metadata operations do not go through analysis() so the prioritized
loading will not happen for them. Most of the HS2 metadata ops work purely
on table/db names, but GetColumns() requires loading the table metadata. This
patch updates MetadataOp to collect a set of missing tables and request these
tables be loaded from the catalog server. The operation will wait until the tables
are loaded in the local catalog before proceeding.

Change-Id: I070f2a0d9194d3317f09431971be9a8dffbc7386
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1542
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1557
2014-03-01 17:16:50 -08:00
Alex Behm
3d764619f7 Run Hive data loading through beeline instead of the Hive shell.
Fixes our log configuration to put the Hive logs in cluster_logs/hive.

Change-Id: I5d98581e35325f2173e4b3170e36bec42d33f8f3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1497
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1615
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-02-20 15:43:31 -08:00
Alex Behm
62338694e4 Skip generation of version and impala-ir .cc files in buildall.sh if -noclean is specified.
Before this patch the -noclean option had almost no effect on the BE build time because
some source files were re-generated with .py scripts regardless.
This change allows ./buildall -skiptests -noclean to do a true incremental rebuild.

Change-Id: Ib3af85db05bdc96a2279a22c1d49d735f2cabd4e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1394
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1415
2014-01-31 13:57:13 -08:00
ishaan
01ef3ef4c1 load-data.py should exit if a bash command returns a non-zero error code.
Change-Id: I2f732a276a42d2697fa55bce0f18ac89e9a6f0a1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1397
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1408
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-30 15:47:13 -08:00
Henry Robinson
5535a8a128 [CDH5] Set CDH major version to 5
Change-Id: Ibc36ed435dd36d3489d27a977bf1726bbf2927a1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1306
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:34:01 -08:00
Henry Robinson
241270044b Add CDH_MAJOR_VERSION environment variable
CDH_MAJOR_VERSION controls where HDFS data is written. In the future, we
can use its value to parameterise Jenkins jobs so that the right code is
run / data is generated.

Change-Id: Id2957df6d708bc6c50faf7a8a609aff5f9571662
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1293
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1305
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:33:18 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
e6299b684b Setting proper cgroup CPU shares based on the reserved resource.
Change-Id: I58992b11e71ed7ad7ea7639050d74fd3eaa4d1d1
2014-01-15 15:12:07 -08:00
Alex Behm
5cfbec9139 Impala creates and manages its own CGroups instead of using the Yarn-NM provided ones.
Change-Id: Id09ba2641ad33fbc109eea2dd6fe80b1863b5cac
2014-01-15 15:12:06 -08:00
Alex Behm
760750af27 Enforcing reserved memory resources via mem limits.
Fixed codepath with rm disabled. Set enable_rm to false by default.

Change-Id: I3bf2d0525d91243ec3c0ea048b0c03680befcda2

Conflicts:
	be/src/runtime/runtime-state.cc
2014-01-15 15:12:05 -08:00
Alex Behm
dc7b398bd3 Impala reserves resources from YARN via LLama.
Impala reserves resources from YARN via Llama and handles resources
preemptions by cancelling affected queries. Adds the Impala Resource
Broker for interacting with Llama. Refactors scheduler and coordinator
to move fragment-to-host assignment logic into scheduler. Local test
setup uses MiniLLama.

Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1
2014-01-15 15:12:04 -08:00
Alex Behm
fc6ecd39e5 [CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting.
Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0
2014-01-15 15:11:32 -08:00
Alex Behm
60003ad211 [CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes.
Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106
2014-01-15 15:11:23 -08:00
Nong Li
752b8e3ee4 [CDH5] Added CDH5 beta2 versions of Hadoop, Hive, HBase and Llama to thirdparty.
Change-Id: Id033c0246c0ffdffd0c7703eaff9600086912380
2014-01-15 15:11:13 -08:00
Lenni Kuff
8571920753 Bump version to v1.3.0-INTERNAL
Change-Id: I32bae4daf093794b09f4ca85b9abdc686791aee8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1281
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-14 21:33:22 -08:00
ishaan
4e9913b52f Fix race in data loading by creating text tables first.
While loading parquet, there are a few table creation queries that use the 'like'
keyword; this ends up opening a small race window when all the table formats are created
concurrently. With this change, we create the text tables first before attempting to
parallelize the rest of the data loading.

Change-Id: Ib84cf0e5120b3588d3f0503d7119ca055e08e53f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1241
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-10 15:01:59 -08:00
Nong Li
056c7d94d6 Remove compute stats option from bin/load-data.py
This option is not implemented in this script and doesn't make it
obvious that it doesn't do anything.

Change-Id: I1a1eff38460fd181c486cfca2840108a58e21603
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1059
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-10 14:01:35 -08:00
Henry Robinson
9a0dc18700 Remove a couple of unused files
* upload_codereview.py is no longer used since Rietveld is long gone
  * runplanservice is deprecated as there is no longer a separate
    PlanService
  * README only mentions a single internal wiki page.

Change-Id: Iba61a3d62381deb882c4168f142574f2492e0969
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1249
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-09 09:56:05 -08:00
Alex Behm
6483f53581 Additional options for JVM debugging in impala startup scripts.
Enables JVM debugging by default for the catalogd and impalads
created via bin/start-impala-cluster.py.
Adds a -jvm_args command line option for passing additional JVM args to
the catalogd and impalads.

Change-Id: I68e901661bd1fd7eefa05ba84dbacf29dd124685
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1213
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:40 -08:00
ishaan
0ed1781323 Invalidate metadata before loading parquet data through Impala.
During a full data load, we load all the data (except parquet) via hive, and then load the
parquet data via Impala. The catalog service does not update the metadata of tables
changed outside Impala, so we need to explicitly invalidate the metadata before loading
parquet data.

Change-Id: Iec39db9ea46e4a11b17589881732629a56444120
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1207
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:39 -08:00
Lenni Kuff
baf79f8185 Call 'invalidate metadata' after loading test data instead of before
Instead of calling 'invalidate metadata' before loading each workload
we should call it once, after loading all test data. This will allow
us to pickup data inserted by Hive. The only reason this worked before
is because we restart Impala before running the tests. This will also
be a bit faster if loading multiple workloads.

Change-Id: I28d42bbf5d7a24b5fde687d67a4b41472ec4b897
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1153
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:37 -08:00
Henry Robinson
177b9ba3b1 Remove nonblocking server (and dependencies) from build
Goodnight, sweet non-blocking prince. We didn't support, or test, this
configuration, and it doesn't work with security or sessions and brings
in some annoying dependencies that are a pain to build.

We have other RPC-stack options to investigate; we may wind up re-adding
the non-blocking server but only in a way that supports all required
features more regularly.

Change-Id: Ifbcabc5014441f6d31c342c4e288dd7fc6201443
2014-01-08 10:54:35 -08:00
ishaan
7e520f8f23 Make workload runner logging more concise and readable.
This patch makes the workload runner's logging concise and more informative. Specifically,
it
 - logs the time taken for each iteration of a query.
 - changes the default log level to INFO.
 - The output is less verbose.

Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:54:35 -08:00
Henry Robinson
0440f26f3e Add -gdb flag to start-impalad.sh to start Impala under gdb
Change-Id: I19f027680cfbf6a7cbc4b311e07f244d67ff683d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1125
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:33 -08:00
Nong Li
1c2e767b89 Bump version to 1.2.3-INTERNAL.
Change-Id: I2baf2aa41587ccf24331da7cba399cedb296a2e0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1132
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:32 -08:00
Nong Li
2489e211f0 Update version to 1.2.2.
Change-Id: Id70f4af930050075a41b1953fc4c5c935bb5b671
2014-01-08 10:54:30 -08:00
Henry Robinson
6d9a7e290d Build Openldap as a thirdparty package
Change-Id: Ifbb0f468a23186f4160fceb462953bc321469c27
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1049
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:54:20 -08:00
Henry Robinson
cb965d259a Build changes to use cyrus-sasl-2.1.23
Change-Id: Ie87e35945b6a415b0383cb75ffcae2fe35755623
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1047
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:54:19 -08:00
Nong Li
b225477ae9 Bump version to 1.2.2-INTERNAL.
Change-Id: I256ef47b6e957a2723422e606d1b87f4e800bbf9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1032
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:54:17 -08:00
Lenni Kuff
01660374c6 Additional fe and testdata pom.xml cleanup
This change cleans up our FE pom.xml file by removing unneeded
dependencies and system dependencies (system dependencies are now pulled in
from the Maven release repository).

The upside is that our pom is cleaner and it will also help reduce the likelihood of
broken dependencies since Maven will pull in the right versions.  The downside
is that we now pull in quite a few more JARs.

Note: I was unable to find release artifacts for Sentry and Parquet so I leaving
those as "system" for now.

Change-Id: I0b917b09a02243d78d89747591ab6bccacf7cf38

Saving changes

Change-Id: I3697a7b44884c40e077b3e354fef76625e1b881d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1011
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:17 -08:00
Lenni Kuff
e86ca62ec7 Do not append any JARs from thirdparty/ to the classpath
Change-Id: Id68c1bc118a1b8efebb6d035ca94a41cf1c4ded1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1005
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:16 -08:00
Henry Robinson
ce2781c48d Remove bad quotes from thrift configure script
Change-Id: Id671f5366813378ead9362f67b082b7af705b005
Reviewed-on: http://gerrit.ent.cloudera.com:8080/994
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-08 10:54:14 -08:00
Sean Mackrory
2b313a9782 IMP-1147. Impala build fails: PIC_LIB_PATH: unbound variable
Change-Id: Ifb173b553b9a52392b5d7caf3630032b89e89c2d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/992
Reviewed-by: Sean Mackrory <sean@cloudera.com>
Tested-by: Sean Mackrory <sean@cloudera.com>
2014-01-08 10:54:14 -08:00
Sean Mackrory
bb39e33101 IMP-1106. Allow libevent location to be overridden in Thrift dependency build
Change-Id: Ia4d92bb4bdfcb7ba29a36904afdb9fd5e398307d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/968
Reviewed-by: Henry Robinson <henry@cloudera.com>
Reviewed-by: Sean Mackrory <sean@cloudera.com>
Tested-by: Sean Mackrory <sean@cloudera.com>
2014-01-08 10:54:14 -08:00
ishaan
287953e87c Better error logging while loading data.
Change-Id: I67cbd9fd1d915ea043a731b7951f29fec25fc446
Reviewed-on: http://gerrit.ent.cloudera.com:8080/982
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:13 -08:00
Lenni Kuff
6e09b90ea3 Properly set timeout in start-impala-cluster
Change-Id: I8cedf484d0ce9d2752e3970883f419ab51a82c3b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/980
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:13 -08:00
Lenni Kuff
e2b9b4a735 Bump version to v1.2.1
Change-Id: I8f1c9ae1fd0ad195fa7817d324d192c2386eac09
Reviewed-on: http://gerrit.ent.cloudera.com:8080/974
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:12 -08:00
ishaan
81b80c702c Upgrade thirdparty to use CDH4.5 bits.
The following changes have been made:
  -- Update hbase
  -- Update hive
  -- Update hadoop
  -- Update the parquet version to 1.2.5

Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb
2014-01-08 10:54:09 -08:00
Lenni Kuff
6282d364a8 IMP-1134: DoAsUser and impersonator are reversed in audit logs
The audit logs currently have the "impersonator" field set to what we call the doAsUser
and the "user" field set as the connected user. They should be reversed.

Added basic tests to validate the correct event gets audited.

Change-Id: Idfa0aaa6c88debedc4993bd0489dbd3f696fcf17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/958
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:03 -08:00
ishaan
bf5359be8d Cleanup Impala connections after data is loaded.
Change-Id: I152b09808740d5344462bcbaf4df4b71d88504cc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/953
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:02 -08:00
Lenni Kuff
6c25e78715 Add option to start-impala-cluster to only restart impalad
This helps speed up the restart time becuase we don't need to restart
the catalog server and reload the table metadata. This is useful if you
want to restart the impalad with a different command line parameter
or if you are making changes to only the impalad binary.

Change-Id: I0b714afaf7e508c450a353a53d67d95165de3486
Reviewed-on: http://gerrit.ent.cloudera.com:8080/897
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:59 -08:00
Lenni Kuff
f579ee8b25 Fix logging in load-data to print the query being executed
Change-Id: I4332e8d3a340f11e1bbb1f6c5126b0b9b4a2ad8e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/949
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:53:58 -08:00