Bug:
When Sentry-based authorization is enabled, a user that isn't authorized
to EXPLAIN a statement that uses a view can still access unauthorized
information, such as view's definition, by running the statement and
asking for the query profile or the execution summary.
Fix:
During query compilation, determine if the user can access the the runtime
profile or the execution summary. Upon request for a runtime profile or
execution summary from a user, determine based on that information and
the user that is asking for the profile if the runtime profile
(or execution summary) will be returned or an authorization error.
The authorization rule enforced is the following:
- User A runs statement S, A asks for profile, A has profile access:
Runtime profile is returned
- User A runs statement S, A asks for profile, A doesn't have profile access:
Authorization error
- User A runs statement S, user B asks for profile:
Authorization error.
This patch doesn't enforce access to the runtime profile or execution summary
through the Web UI.
Change-Id: I2255d587367c2d328590ae8534a5406c4b0c9b15
Reviewed-on: http://gerrit.cloudera.org:8080/7064
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This test would occasionally fail when an attempt to read a query
profile would occasionally return the empty string. The root cause was
that the Impalad that the query was submitted to, and the one used to
retrieve the profile from, could be different processes.
This patch changes the test to use the same process for HS2 and
webserver connections, by making the HS2 port available through the
ImpaladService object that was already used to access the webserver.
Previously the test would fail within ~30 minutes; with this patch it
has run for ~60 minutes without failure.
This patch also adds '--abort_on_failed_audit_event=false' to
test_impersonation to aid debugging.
Change-Id: Ib385438db96ee575d2f135c8b6459506b865e2d3
Reviewed-on: http://gerrit.cloudera.org:8080/1267
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
This patch encapsulates pytests's skipif markers in classes. It leads to the following
benefits:
- Provide context and grouping for tests being skipped.
- As we improve test reporting, annotations will give us a better idea of coverage.
Change-Id: Ib0557fb78c873047c214bb62bb6b045ceabaf0c9
Reviewed-on: http://gerrit.cloudera.org:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/343
As needed, fix up file paths and other misc things to get
more test cases running against S3.
Change-Id: If4eaf9200f2abd17074080a37cd0225d977200ad
Reviewed-on: http://gerrit.cloudera.org:8080/167
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch also adds a small utility file for for supporting different filesystems.
Change-Id: I28b1217b0cb901360e28e8d0ba269c9144117d2e
Reviewed-on: http://gerrit.cloudera.org:8080/124
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch introduces a new pytest marker that skip tests that currently don't work when
s3 is used as the underlying file system. The set of blacklisted tests is a superset of
tests that cannot be run with s3. Follow up patches will remove some of the test files
from the blacklist.
Change-Id: I39a58223d3435f0bd6496ffd00a2d483b751693d
Reviewed-on: http://gerrit.cloudera.org:8080/82
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch adds support for V6 of the HS2 protocol, which notably
includes columnar organisation of result sets. Clients that set their
protocol version to < V6 will receive result sets in the traditional row
orientation.
The performance of fetches over HS2 goes up significantly as a result,
since the V1 protocol had some pathologies in its deserialisation
performance.
Beeswax
Row materialisation: 455ms, client processing time: 523ms
HS2 V6:
Row materialisation: 444ms, client processing time: 1.8s
HS2 V1:
Row materialisation: 585ms, client processing time: 15.9s (!)
TODO: Add support for the CHAR datatype
The following patch is also included:
Fix wait-for-hiveserver2.py when Impala moves to HS2 V6
Due to HIVE-6050, older versions of Hive are not compatible with newer
clients (even those that try to use old protocol
versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2
service, but picks up the newer version from V6, and fails.
This patch temporarily re-adds cli_service.thrift (renaming the Thrift
service as LegacyTCLIService) only for wait-for-hiveserver2.py to
use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid
of this change.
Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This change allows Impala to support specifying a custom AuthorizationProvider class
(rather than the default HadoopGroupAuthorizationProvider). One use case is using the
LocalGroupAuthorizationProvider which makes it easier to test different users without
them actually existing and also allows Sentry be enabled without creating the groups
on all nodes the cluster. This is something we used to support but deprecated in
v1.4-cdh5 release. Added back in the changes and extended tests.
Change-Id: I09612e4e331689402f8f05d2666b3de61d881529
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3868
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3938
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
This is to help ensure naming is consistent across the platform and
also avoid confusion with HS2 "impersonation" which is something very
different.
Change-Id: I48c1b76dff75b92b11ddc7aab0eb9a3a5d20e489
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3315
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 931f6a66c0d8dff25b746d127dc1f36e96b12f98)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3326
Currently, we always display the 'User' as the connected user in the debug webpage and
runtime profiles. This is confusing when impersonation + authorization is enabled because
there is not an easy way to find the impersonated user other than looking at the audit
log records. This change does the following:
* Updates the "User" field in the runtime profile to show the "effective user".
The effective user is the connected user if there is no impersonated user,
otherwise it is the impersonated user. This should help CM display the correct user
as well.
* Add two new fields in the runtime profile "Connected User" & "Impersonated User"
to make it easier to tell which user is which.
* Update the /queries debug webpage to show the effective user rather than the
connected user.
Change-Id: I639de6738242d2c378e785271a72257301a53ade
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2863
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit d4ad768780dfdfe0874f2b3e9c59074f1c3685d7)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2935
The audit logs currently have the "impersonator" field set to what we call the doAsUser
and the "user" field set as the connected user. They should be reversed.
Added basic tests to validate the correct event gets audited.
Change-Id: Idfa0aaa6c88debedc4993bd0489dbd3f696fcf17
Reviewed-on: http://gerrit.ent.cloudera.com:8080/958
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This change adds support for user impersonation for HS2 authorization
requests. It adds a new flag (--authorized_proxy_user_config) that if
set, allows users (ex. hue) to impersonate as another user. The user they
wish to impersonate as is passed using the HS2 configuration property,
'impala.doas.user'.
The configuration allows for specifying the list of users a proxy user
can impersonate as well, or '*' to allow the proxy user to impersonate
any user. For example: hue=user1,user2,admin=*
Change-Id: I2a13e31e5bde2e6df47134458c803168415d0437
Reviewed-on: http://gerrit.ent.cloudera.com:8080/574
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>