This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.
This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.
There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.
This patch also addressed several flake8 errors in modified files.
Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
are created and removed under logs/custom_cluster_tests/ as the tests
go.
Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Impala 4 decided to drop Sentry support in favor of Ranger. This
removes Sentry support and related tests. It retires startup
flags related to Sentry and does the first round of removing
obsolete code. This does not adjust documentation to remove
references to Sentry, and other dead code will be removed
separately.
Some issues came up when implementing this. Here is a summary
of how this patch resolves them:
1. authorization_provider currently defaults to "sentry", but
"ranger" requires extra parameters to be set. This changes the
default value of authorization_provider to "", which translates
internally to the noop policy that does no authorization.
2. These flags are Sentry specific and are now retired:
- authorization_policy_provider_class
- sentry_catalog_polling_frequency_s
- sentry_config
3. The authorization_factory_class may be obsolete now that
there is only one authorization policy, but this leaves it
in place.
4. Sentry is the last component using CDH_COMPONENTS_HOME, so
that is removed. There are still Maven dependencies coming
from the CDH_BUILD_NUMBER repository, so that is not removed.
5. To make the transition easier, testdata/bin/kill-sentry-service.sh
is not removed and it is still called from testdata/bin/kill-all.sh.
Testing:
- Core job passes
Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e
Reviewed-on: http://gerrit.cloudera.org:8080/15833
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds an environment variable DISABLE_SENTRY to allow Impala
to run tests without Sentry. Specifically, we start up Sentry only when
$DISABLE_SENTRY does not evaluate to true. The corresponding Sentry FE
and E2E tests will also be skipped if $DISABLE_SENTRY is true.
Moreover, in this patch we will set DISABLE_SENTRY to true if
$USE_CDP_HIVE evaluates to true, allowing one to only test Impala's
authorization with Ranger when support for Sentry is dropped after we
switch to the CDP Hive.
Note that in this patch we also change the way we generate
hive-site.xml when $DISABLE_SENTRY is true. To be more precise, when
generating hive-site.xml, we do not add the Sentry server as a metastore
event listener if $DISABLE_SENTRY is true. Recall that both CDH Hive and
CDP Hive would make an RPC to the registered listeners every time after
the method of create_database_core() in HiveMetaStore.java is called,
which happens when Hive instead of Impala is used to create a database,
e.g., when some databases in the TPC-DS data set are created during the
execution of create-load-data.sh. Thus the removal of Sentry as an event
listener is necessary when $DISABLE_SENTRY is true in that it prevents
the HiveMetaStore from keeping connecting to the Sentry server that is
not online, which could make create-load-data.sh time out.
Testing:
Except for two currently known issues of IMPALA-9513 AND IMPALA-9451,
verified this patch passes the exhaustive tests in the DEBUG build
- when $USE_CDP_HIVE is false, and
- when $USE_CDP_HIVE is true.
Change-Id: Ifa3f1840a77a7b32310a5c8b78a2c26300ccb41e
Reviewed-on: http://gerrit.cloudera.org:8080/15505
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In IMPALA-9047, we disabled some Ranger-related FE and BE tests due to
changes in Ranger's behavior after upgrading Ranger from 1.2 to 2.0.
This patch aims to re-enable those disabled EE tests in
tests/authorization/test_authorized_proxy.py and
tests/authorization/test_ranger.py to increase Impala's test coverage of
authorization via Ranger.
The Ranger-related tests in test_authorized_proxy.py test Impala's
delegation for clients. Two types of delegation are supported in Impala,
i.e., a user can delegate the execution of a query to either 1) another
user, or 2) a group of users. In the former case, Ranger will check
whether or not the delegated user specified in the option
'authorized_proxy_user_config' possesses sufficient privileges to access
the resources, whereas in the latter case, before checking the delegated
group is granted sufficient privileges, Ranger will check with the help
of Impala whether or not the delegated user specified in
'authorized_proxy_user_config' belongs to the delegated group specified
in 'authorized_proxy_group_config' in the underlying OS. This type of
delegation requires Impala to retrieve the groups the delegated user
belongs to from the underlying OS and thus if the delegated user does
not exist in the underlying OS, Impala would inform Ranger that the
delegated user does not belong to any group, which in turn would fail
the authorization even though in the policies on the Ranger server, the
delegated user belongs to the delegated group and the delegated group is
granted sufficient privileges.
The re-enabled Ranger tests in test_authorized_proxy.py involve queries
in which the delegated user, i.e., 'non_owner', does not exist in the
underlying OS. We use 'non_owner' as the delegated user instead of
getuser() so that we will have to explicitly grant 'non_owner'
sufficient privileges of accessing the resources. To avoid the need for
creating an actual delegated user and its corresponding delegated groups
in the underlying OS when running the EE tests, we added to
'impalad_args' an additional option, i.e.,
'use_customized_user_groups_mapper_for_ranger', which, when set to true,
allows Impala to use a customized user-to-groups mapping when performing
authorization via Ranger. On the other hand, we set the delegated user
to getuser() when running the respective Sentry related tests to avoid
the need for having to provide Sentry with a customized user-to-groups
mapping.
To re-enable test_legacy_catalog_ownership() in test_ranger.py, we
removed in _test_ownership() a test query that was expected to fail the
authorization in Ranger 1.2 but passes the authorization in Ranger 2.0.
This is due to the fact that in Ranger 2.0, a user does not have to be
explicitly granted the privileges of accessing a resource as long as the
user is the owner of the resource.
Testing:
- Passed FE tests.
- Passed the tests in test_authorized_proxy.py.
- Passed the tests in test_ranger.py.
Change-Id: I17420d7ff9beacd1b4d2ad72b68b8b54983e60cb
Reviewed-on: http://gerrit.cloudera.org:8080/15088
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bumps CDP_BUILD_NUMBER to 1471450. The new GBN upgrades
Ranger from 1.2 to 2.0, which includes the change to the default Ranger
policies described in https://issues.apache.org/jira/browse/RANGER-2536.
Some of the Ranger tests fail, because they assume the older behavior.
To address this issue, this patch temporarily disables those affected
Ranger tests. Specifically, the affected tests in the following test
files are disabled for now.
1. test_authorized_proxy.py
2. test_ranger.py
3. AuthorizationStmtTest.java
4. RangerAuditLogTest.java
IMPALA-8842 part 2: (Hive3) Use 'engine' field in HMS stat API
The new CDP GBN includes the fix for HIVE-22046. HIVE-22046 added
'engine' column to TAB_COL_STATS and PART_COL_STATS HMS tables. The new
column is used to differentiate among column stats computed by
different engines. The related HMS API calls were changed accordingly.
Part of this patch is Step 4 in a series of steps to coordinate the
introduction of HMS API changes to Hive3 and Impala. For more
information see IMPALA-8842 part 1. Step 4 replaces *V2 calls with *.
The *V2 names were introduced temporarily and will be removed from the
HMS API in the near future.
Testing:
- This patch passes the affected Ranger tests listed above on a local
machine.
- E2E tests were added to make sure that column statistics are
differentiated by engine for partitioned and non-partitioned tables.
The tests are executed for transactional and non-transactional tables.
Change-Id: I962423cf202ad632b5817669500b3e3479f1a454
Reviewed-on: http://gerrit.cloudera.org:8080/14576
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a test coverage for authorized proxy user/group with
Ranger. This patch also moves the authorized proxy tests into a separate
file, test_authorized_proxy and refactors the tests to be more readable
and reusable.
Testing:
- Added a new test_authorized_proxy.py
- Ran all E2E authorization tests
Change-Id: If6f797600720e6432b85cac8f13afe8fa5624596
Reviewed-on: http://gerrit.cloudera.org:8080/13679
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>