The Kudu query tests were failing on a remote cluster because the Kudu
master was always set to '127.0.0.1', with no way to override it.
This patch corrects the issue with a number of changes:
- Add a pytest command line option to specify an arbitrary Kudu master
- Consolidate the place where the default Kudu master is derived. It
had been stored both in the env and in tests/common/__init__.py,
with different files looking to different places. For now, just look
to the env, and remove the value from __init__.py.
- The kudu_client test fixture in conftest.py was using the connect()
method from impala.dbapi (part of the Impyla library), without
specifying the host param. In the absence of that, the default value
is 'localhost', so add the host param to the connect() call.
- Define the various defaults for pytest config as constants at the top
of conftest.py.
Change-Id: I9df71480a165f4ce21ae3edab6ce7227fbf76f77
Reviewed-on: http://gerrit.cloudera.org:8080/5877
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.
Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU
Changes:
1) Remove the requirement to specify table properties such as key
columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
time of creation in Impala.
4) Disallow table properties that could conflict with an existing
table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
addresses. The flag is used as the default value for the table
property kudu_master_addresses but it can still be overriden
using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
wasn't implemented for Kudu tables and silently ignored. The Kudu
tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
Kudu) the existence of the other delegate and the use of delegates in
general has led to confusion. The Kudu delegate only exists to provide
functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
standard. When used at the column level, only one column can be
marked as a key. When used at the table level, multiple columns can
be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
The old "kudu.key_columns" table property is no longer accepted
though it is still used internally. "PRIMARY" is now a keyword.
The ident style declaration is used for "KEY" because it is also used
for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
The table property "kudu.table_name" is optional for managed tables
and is required for external tables. If for a managed table a Kudu
table name is not provided, a table name will be generated based
on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
of HMS when a table is loaded or refreshed. Table/column metadata
are cached in the catalog and are stored in HMS in order to be
able to use table and column statistics.
Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Reviewed-on: http://gerrit.cloudera.org:8080/4414
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and
effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The
difference between this patch and its original is that I fixed the
changes introduced in infra/python/bootstrap_virtualenv.py to be
python2.4-compatible:
- removed the use of str.format(), preferring a str.join() pattern
- removed the call of the exit() builtin to prefer sys.exit()
The only testing I did for this patch was to ensure
CDH Impala-packaging-on-demand works.
Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325
Reviewed-on: http://gerrit.cloudera.org:8080/3160
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Add the python Kudu module to the virtualenv. Building the virtualenv
is much slower now because Cython and numpy are required. To help with
the rebuild time --no-cache was removed. That option was added to help
when using the dev version of impyla, the version number would be the
same but the module contents were different and the cache used the old
module contents.
2) Add some py.test fixtures to help create Kudu and Impala connections.
Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7
Reviewed-on: http://gerrit.cloudera.org:8080/2855
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.
The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.
The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.
This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.
Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5