Commit Graph

329 Commits

Author SHA1 Message Date
ishaan
3bed0be1df Refactor the performance framework and change its execution strategy.
This patch introduces new abstractions and changes the way queries are run via the
workload runner. A new class 'Workload' is introduced, which represents the notion of a
workload in the performance framework (i.e, A set of query names mapped to query
strings).

The new workflow is:
 - run-workload acts as a driver. It accepts user parmaters for which queries to
   run and their execution strategy. It generates workload objects and passes them to the
   workload-runner.
 - The workload runner takes a workload, its execution parameters and generates a set of
   test vectors over which the workload is run iteratively.
 - A workload is executed by initialiazing a QueryExecutor for each query being run in a
   test vector. The workload executor is then responsible for execution and gathering
   results.
 - The execution details of every query being executed are are stored and returned to the
   driver (run-workload).

Change-Id: Ia16360140d65e6733e534e823bc5d5614622ab5f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3616
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: jenkins
2014-07-25 18:17:11 -07:00
ishaan
0d0614765d Only use nproc to determine functional test concurrency when it's available in the os.
Some operating systems don't ship which nproc, which causes impala-config.sh to fail. This
change alleviates the problem by checking if nproc exists, and setting a reasonable
default if it fails.

Change-Id: Ic6e4d0fbce57eedc82163cfa17f71bdccbc38b51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3208
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-20 12:52:08 -07:00
ishaan
f92c9a9335 Run local tests at lower concurrency.
Currently, we launch #nproc processes to run tests locally. This patch changes the default
to #proc/2, to not overload the system.

Change-Id: I8bca23eb7462a0c497df93f82a60d85835bedbe9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2972
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-06-19 12:48:29 -07:00
Paden Tomasello
0326f17bb3 Adding Lz4 Codec.
Change-Id: I037d4e0de3b2cd2b8582caea058c8e1f2f880ff3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3027
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
2014-06-16 14:20:34 -07:00
Lenni Kuff
d5a9ada976 [CDH5] Bump version to v1.5.0-cdh5
Change-Id: I80bf635d37a9c98d51acf6dc35527a21c6b88d76
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2983
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-11 22:14:00 -07:00
Lenni Kuff
fa98766ceb IMPALA-1038: Abort test run if any test fails
This behavior regressed recently, this fixes the regression.

Change-Id: I80939131953fc1838da0690c3e7e7bf455bd6180
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2968
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
(cherry picked from commit b6f8b7f4679c82ca2fb443224fcd88402c3a4136)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2975
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-11 14:54:06 -07:00
Srinath Shankar
5755b0bdee Order by without limit for Impala
Enable order-by without limit
Added BufferedBlockMgr to allocate buffers and spill to disk.
Added Sorter for the external sort impelementation
Added new SortNode execution node that completely sorts its input
Changes to enable writing in IoMgr went in a separate patch.

Reviewed-on: http://gerrit.ent.cloudera.com:8080/1539
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins

Conflicts:

	testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test

Change-Id: I3ece32affe5b006f53bbdfcc03ded01471e818ac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2900
Reviewed-by: Srinath Shankar <sshankar@cloudera.com>
Tested-by: jenkins
2014-06-09 16:58:08 -07:00
Henry Robinson
3e7e7ed0dc Fix impala-config.sh when JAVA_HOME not set
Change-Id: Iaefda2039de1a5aafc782bca582d3007abcf6eff
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2803
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 48db5de6825cba8b6a1c1c658ff79a9641341dca)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2814
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-06-03 19:48:57 -07:00
Lenni Kuff
f34a0507bf [CDH5] Add support for Sentry Service to Impala
This change adds support for authorizing based on policy metadata read from the Sentry
Service. Authorization is role based and roles are granted to user groups. Each role
can have zero or more privileges associated with it, granting fine grained access to
specific catalog objects at server, URI, database, or table scope. This patch only
adds support to authorize against metadata read from the Sentry Policy Service, it does
not add support for GRANT/REVOKE statements in Impala.

The authorization metadata is read by the catalog server from the Sentry Service and
propagated to all nodes in the cluster in the "catalog-update" statestore topic. To
enable the Catalog Server to read policy metadata, the --sentry_config must be
set to a valid sentry-site.xml config file.

On the impalad side, we continue to support authorization based on a file-based provider.
To enable file based authorization set the --authorization_policy_file to a
non-empty value. If --authorization_policy_file is not set, authorization will be done
based on cached policy metadata received from the Catalog Server (via the statestore).

TODO: There are still some issues with the Sentry Service that require disabling some of
the authorization tests and adding some workarounds. I have added comments in the code
where these workarounds are needed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-06-03 07:19:52 -07:00
Nong Li
5d80942d42 [CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads.
Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-30 19:14:39 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Lenni Kuff
79d43e1e41 Handle cases where environment variables are not defined in impala-config.sh
Change-Id: Iee2800cb02299a9ed26da6fd079e3a72fe2a2482
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2537
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2539
2014-05-13 08:22:42 -07:00
Lenni Kuff
f1d9c0f58b [CDH5] Update Impala's Sentry dependency to Sentry v1.3 (from v1.2)
This updates Impala to use Sentry v1.3 instead of Sentry v1.2. No major functionality
changed between Sentry versions, but some Sentry classes were moved and APIs changed.

Change-Id: I3765748d2cdbe00f59eefa3c971558efede38ebd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2319
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-13 02:57:07 -07:00
Lenni Kuff
13c794db91 [CDH5] Update dependency versions to CDH5.1.0
This just updates the versions, it doesn't touch anything in /thirdparty.
Change parquet version to append SNAPSHOT
Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS

Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4
2014-05-07 15:10:40 -07:00
casey
192d52c258 Testing: Generate queries and compare results against other databases
This is the intital commit and is a work in progress. See the README for a
list of possible improvements.

As an overview of how the files are related:

  model.py: This is the base upon which the other files are built. It
      contains something like a grammer for queries.

  query_generator.py: Generates random permutations of the model.

  model_translator.py: Produces SQL based on the model

  discrepancy_searcher.py: Uses the above to generate, run, and compare
      query results.

Change-Id: Iaca6277766f5a86568eaa3f05b99c832942ab38b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1648
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
2014-05-01 14:20:35 -07:00
ishaan
88ec1e0a83 Increase default_pool_max_requests in run-all-tests.
Temporarily increase the cap on max requested queries while running tests to unblock
builds. Currently, the exhaustive runs always fails, and there are some intermittent
failures in the core runs.

Change-Id: I26b9ce343d72bab7687e49f7dbd7bf3bf655a294
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2323
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2395
2014-04-28 23:47:24 -07:00
Skye Wanderman-Milne
60db4d4d82 CDH-18416: Don't inline ReadWriteUtil::ReadZLong()
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.

This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.

Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
2014-04-28 15:58:15 -07:00
casey
2351266d0e Replace single process mini-dfs with multiple processes
This should allow individual service components, such as a single nodemanager,
to be shutdown for failure testing. The mini-cluster bundled with hadoop is a
single process that does not expose the ability to control individual roles.
Now each role can be controlled and configured independently of the others.

Change-Id: Ic1d42e024226c6867e79916464d184fce886d783
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432
Tested-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-04-23 18:24:05 -07:00
Nong Li
85be9a5050 Update bin/make* -notests to include other artifacts for packages.
Change-Id: I95e95f0a2e2131875b95d6676620bec7117b7f8a
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2250
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-04-16 00:37:00 -07:00
Nong Li
f9dd32724c Cleanup build scripts.
Consolidated our build scripts and added the -notests option which skips
build the BE tests.

Change-Id: Ida6aa064b7fe47e535c142b9af92b7c158e83c32
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2043
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2201
2014-04-13 17:11:39 -07:00
Lenni Kuff
d101ef86e2 [CDH5] Bump version to 1.4.0-cdh5-INTERNAL
Change-Id: I0a0334084e444c948f1718133afb2d7246dde414
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2193
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-04-11 16:03:09 -07:00
Henry Robinson
8e5848eaf8 RM fixes to get tests passing
* One last NotifyThreadUsageChange() mismatched pair
* Don't set resource in plan fragment params if there isn't a resource
  available. This fixes the problem where if no fragment with resources
  was assigned to the same node as the coordinator, the coordinator
  would have a dummy resource allocation which didn't work with
  expansion.
* Substitute #ID in all impalad arguments to start-impala-cluster.py
  with the 0-indexed ID of the impalad being started. This is required
  to have different Impala processes use different cgroups.

Change-Id: If8c8fd8bef0809bdaf16115a45a9695fc2bf3e1b
(cherry picked from commit c71ce45e97570b8c09900eb5ae2e26984d3306a4)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2060
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2014-03-24 15:07:45 -07:00
Lenni Kuff
3d82c9a5d6 Bump version from 1.3.0-INTERNAL to cdh5-1.3.0
Change-Id: Ib7a37b190091a3f9eb6d6f0f560dd40aed23e231
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2031
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-20 17:22:11 -07:00
Lenni Kuff
86f69fb96f IMP-1306: Fix build scripts to properly generate Impala version info for packaging builds
The problem was that were were deleting the version.info file because the default
of gen_build_version.py recently changed from --noclean to --clean.

Also fixed a bug in the shell version generation and made debugging a bit easier
by dumping the contents of version.info whenever it is generated.

Change-Id: I764d01c9e46eed1bd39de79bf076c15afa599486
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1901
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
(cherry picked from commit fa673b4d3342fc825ee7fa942bd254234d222906)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1910
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-03-14 08:45:16 -07:00
Matthew Jacobs
a283d72cdd [cdh5] Add latest cdh5 hadoop, hbase, and hive snapshots to thirdparty
Change-Id: I60c93b259a26e86aca60f2b3b5b6226eabc0b5eb
2014-03-05 01:06:09 -08:00
Lenni Kuff
8a16709265 Perform prioritized load requests for missing tables in HS2 metadata ops
The HS2 metadata operations do not go through analysis() so the prioritized
loading will not happen for them. Most of the HS2 metadata ops work purely
on table/db names, but GetColumns() requires loading the table metadata. This
patch updates MetadataOp to collect a set of missing tables and request these
tables be loaded from the catalog server. The operation will wait until the tables
are loaded in the local catalog before proceeding.

Change-Id: I070f2a0d9194d3317f09431971be9a8dffbc7386
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1542
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1557
2014-03-01 17:16:50 -08:00
Alex Behm
3d764619f7 Run Hive data loading through beeline instead of the Hive shell.
Fixes our log configuration to put the Hive logs in cluster_logs/hive.

Change-Id: I5d98581e35325f2173e4b3170e36bec42d33f8f3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1497
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1615
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-02-20 15:43:31 -08:00
Alex Behm
62338694e4 Skip generation of version and impala-ir .cc files in buildall.sh if -noclean is specified.
Before this patch the -noclean option had almost no effect on the BE build time because
some source files were re-generated with .py scripts regardless.
This change allows ./buildall -skiptests -noclean to do a true incremental rebuild.

Change-Id: Ib3af85db05bdc96a2279a22c1d49d735f2cabd4e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1394
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1415
2014-01-31 13:57:13 -08:00
ishaan
01ef3ef4c1 load-data.py should exit if a bash command returns a non-zero error code.
Change-Id: I2f732a276a42d2697fa55bce0f18ac89e9a6f0a1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1397
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1408
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-30 15:47:13 -08:00
Henry Robinson
5535a8a128 [CDH5] Set CDH major version to 5
Change-Id: Ibc36ed435dd36d3489d27a977bf1726bbf2927a1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1306
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:34:01 -08:00
Henry Robinson
241270044b Add CDH_MAJOR_VERSION environment variable
CDH_MAJOR_VERSION controls where HDFS data is written. In the future, we
can use its value to parameterise Jenkins jobs so that the right code is
run / data is generated.

Change-Id: Id2957df6d708bc6c50faf7a8a609aff5f9571662
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1293
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1305
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:33:18 -08:00
Nong Li
53d7bbb97a [CDH5] Impala changes for updated thirdparty components.
Changes include:
  - version changes in impala-config
  - version changes in various loading scripts
  - hbase jars are no longer in hive/lib
  - mini-llama script changes
  - updates due to sentry api changes
  - JDBC tests disabled
  - unsupported types tests disabled.

Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee
2014-01-15 15:12:13 -08:00
Alex Behm
e6299b684b Setting proper cgroup CPU shares based on the reserved resource.
Change-Id: I58992b11e71ed7ad7ea7639050d74fd3eaa4d1d1
2014-01-15 15:12:07 -08:00
Alex Behm
5cfbec9139 Impala creates and manages its own CGroups instead of using the Yarn-NM provided ones.
Change-Id: Id09ba2641ad33fbc109eea2dd6fe80b1863b5cac
2014-01-15 15:12:06 -08:00
Alex Behm
760750af27 Enforcing reserved memory resources via mem limits.
Fixed codepath with rm disabled. Set enable_rm to false by default.

Change-Id: I3bf2d0525d91243ec3c0ea048b0c03680befcda2

Conflicts:
	be/src/runtime/runtime-state.cc
2014-01-15 15:12:05 -08:00
Alex Behm
dc7b398bd3 Impala reserves resources from YARN via LLama.
Impala reserves resources from YARN via Llama and handles resources
preemptions by cancelling affected queries. Adds the Impala Resource
Broker for interacting with Llama. Refactors scheduler and coordinator
to move fragment-to-host assignment logic into scheduler. Local test
setup uses MiniLLama.

Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1
2014-01-15 15:12:04 -08:00
Alex Behm
fc6ecd39e5 [CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting.
Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0
2014-01-15 15:11:32 -08:00
Alex Behm
60003ad211 [CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes.
Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106
2014-01-15 15:11:23 -08:00
Nong Li
752b8e3ee4 [CDH5] Added CDH5 beta2 versions of Hadoop, Hive, HBase and Llama to thirdparty.
Change-Id: Id033c0246c0ffdffd0c7703eaff9600086912380
2014-01-15 15:11:13 -08:00
Lenni Kuff
8571920753 Bump version to v1.3.0-INTERNAL
Change-Id: I32bae4daf093794b09f4ca85b9abdc686791aee8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1281
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-14 21:33:22 -08:00
ishaan
4e9913b52f Fix race in data loading by creating text tables first.
While loading parquet, there are a few table creation queries that use the 'like'
keyword; this ends up opening a small race window when all the table formats are created
concurrently. With this change, we create the text tables first before attempting to
parallelize the rest of the data loading.

Change-Id: Ib84cf0e5120b3588d3f0503d7119ca055e08e53f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1241
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-10 15:01:59 -08:00
Nong Li
056c7d94d6 Remove compute stats option from bin/load-data.py
This option is not implemented in this script and doesn't make it
obvious that it doesn't do anything.

Change-Id: I1a1eff38460fd181c486cfca2840108a58e21603
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1059
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-10 14:01:35 -08:00
Henry Robinson
9a0dc18700 Remove a couple of unused files
* upload_codereview.py is no longer used since Rietveld is long gone
  * runplanservice is deprecated as there is no longer a separate
    PlanService
  * README only mentions a single internal wiki page.

Change-Id: Iba61a3d62381deb882c4168f142574f2492e0969
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1249
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-09 09:56:05 -08:00
Alex Behm
6483f53581 Additional options for JVM debugging in impala startup scripts.
Enables JVM debugging by default for the catalogd and impalads
created via bin/start-impala-cluster.py.
Adds a -jvm_args command line option for passing additional JVM args to
the catalogd and impalads.

Change-Id: I68e901661bd1fd7eefa05ba84dbacf29dd124685
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1213
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:40 -08:00
ishaan
0ed1781323 Invalidate metadata before loading parquet data through Impala.
During a full data load, we load all the data (except parquet) via hive, and then load the
parquet data via Impala. The catalog service does not update the metadata of tables
changed outside Impala, so we need to explicitly invalidate the metadata before loading
parquet data.

Change-Id: Iec39db9ea46e4a11b17589881732629a56444120
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1207
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:39 -08:00
Lenni Kuff
baf79f8185 Call 'invalidate metadata' after loading test data instead of before
Instead of calling 'invalidate metadata' before loading each workload
we should call it once, after loading all test data. This will allow
us to pickup data inserted by Hive. The only reason this worked before
is because we restart Impala before running the tests. This will also
be a bit faster if loading multiple workloads.

Change-Id: I28d42bbf5d7a24b5fde687d67a4b41472ec4b897
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1153
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:37 -08:00
Henry Robinson
177b9ba3b1 Remove nonblocking server (and dependencies) from build
Goodnight, sweet non-blocking prince. We didn't support, or test, this
configuration, and it doesn't work with security or sessions and brings
in some annoying dependencies that are a pain to build.

We have other RPC-stack options to investigate; we may wind up re-adding
the non-blocking server but only in a way that supports all required
features more regularly.

Change-Id: Ifbcabc5014441f6d31c342c4e288dd7fc6201443
2014-01-08 10:54:35 -08:00
ishaan
7e520f8f23 Make workload runner logging more concise and readable.
This patch makes the workload runner's logging concise and more informative. Specifically,
it
 - logs the time taken for each iteration of a query.
 - changes the default log level to INFO.
 - The output is less verbose.

Change-Id: I5f964cf76269fd64ce127b9e4c51fe1deafd1d1b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1076
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:54:35 -08:00
Henry Robinson
0440f26f3e Add -gdb flag to start-impalad.sh to start Impala under gdb
Change-Id: I19f027680cfbf6a7cbc4b311e07f244d67ff683d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1125
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:33 -08:00
Nong Li
1c2e767b89 Bump version to 1.2.3-INTERNAL.
Change-Id: I2baf2aa41587ccf24331da7cba399cedb296a2e0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1132
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:32 -08:00