impala

mirror of https://github.com/apache/impala.git synced 2026-01-02 21:00:35 -05:00

Author	SHA1	Message	Date
Lenni Kuff	f34a0507bf	[CDH5] Add support for Sentry Service to Impala This change adds support for authorizing based on policy metadata read from the Sentry Service. Authorization is role based and roles are granted to user groups. Each role can have zero or more privileges associated with it, granting fine grained access to specific catalog objects at server, URI, database, or table scope. This patch only adds support to authorize against metadata read from the Sentry Policy Service, it does not add support for GRANT/REVOKE statements in Impala. The authorization metadata is read by the catalog server from the Sentry Service and propagated to all nodes in the cluster in the "catalog-update" statestore topic. To enable the Catalog Server to read policy metadata, the --sentry_config must be set to a valid sentry-site.xml config file. On the impalad side, we continue to support authorization based on a file-based provider. To enable file based authorization set the --authorization_policy_file to a non-empty value. If --authorization_policy_file is not set, authorization will be done based on cached policy metadata received from the Catalog Server (via the statestore). TODO: There are still some issues with the Sentry Service that require disabling some of the authorization tests and adding some workarounds. I have added comments in the code where these workarounds are needed. Change-Id: I3765748d2cdbe00f59eefa3c971558efede38eb1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2552 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-06-03 07:19:52 -07:00
ishaan	10952da6e0	Change the slf4j version to harmonize with the rest of CDH. All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version causes a lot of benign warnings. This patch changes Impala's version to be the same as the rest of the stack. Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-05-27 13:46:17 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
Lenni Kuff	13c794db91	[CDH5] Update dependency versions to CDH5.1.0 This just updates the versions, it doesn't touch anything in /thirdparty. Change parquet version to append SNAPSHOT Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4	2014-05-07 15:10:40 -07:00
Lenni Kuff	15327e8136	Migrate DataErrors tests to Python test framework, re-enable subset of tests This re-enables a subset of the stable data errors tests and updates them to work in our test framework. This includes support for updating results via --update_results. This also lets us remove a lot of old code that was there only to support these disabled tests. Change-Id: I4c40c3976d00dfc710d59f3f96c99c1ed33e7e9b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1952 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2277	2014-04-18 02:25:11 -07:00
Matthew Jacobs	fc5ac1f707	[CDH5] Add yarn dependencies to frontend pom.xml necessary for RequestPoolUtils Change-Id: I2869ec5d81481dd1803a90c5ceae1d0dd3662f6b Reviewed-on: http://gerrit.ent.cloudera.com:8080/1761 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-03-06 11:21:44 -08:00
Henry Robinson	a60c4779c8	[CDH5] Remove system dependencies for Sentry Change-Id: Id0e95798b2c4060d906923756251c3ad7dee6ec5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1590 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-02-18 17:11:26 -08:00
Nong Li	53d7bbb97a	[CDH5] Impala changes for updated thirdparty components. Changes include: - version changes in impala-config - version changes in various loading scripts - hbase jars are no longer in hive/lib - mini-llama script changes - updates due to sentry api changes - JDBC tests disabled - unsupported types tests disabled. Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee	2014-01-15 15:12:13 -08:00
Alex Behm	5ae53c2f80	Compilation fixes after rebasing. Change-Id: I87348336b2489069d65f34821c1a3df3c5ca9512	2014-01-15 15:12:12 -08:00
Alex Behm	0614774706	Fixed reservation from MiniLlama by translating hosts of resource requests from impalad hostports to Hadoop DN hostports. Change-Id: I7a9a26ec4309710f0ad62a1bd18fb076fe6dd120	2014-01-15 15:12:04 -08:00
Alex Behm	56e9c838fb	[CDH5] Changed all Impala dependencies on Hive to use systemPath to avoid pulling incompatible snapshot jars from Maven. Added minimal Hive dependencies. Change-Id: If36b7aa7a29d6c4fe22a4510e3aae52992dd7b74	2014-01-15 15:12:00 -08:00
Alex Behm	62f6c066a6	[CDH5] Added missing FE dependencies for FE tests. Accept LazyBinaryColumnarSerDe for RCFile tables (Hive's new default SerDe for RCFiles). Fixed Jdbc FE test by adding proper auth spec in the connection string. Change-Id: I6ac3effa398ae01846949cebee1b4db273305aea Reviewed-on: http://gerrit.ent.cloudera.com:8080/461 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-15 15:11:51 -08:00
Alex Behm	c295b5eda8	[CDH5] Fixed JDBC connectivity to Impala and Hive and related Impala tests. Hive now uses the simple SASL transport because its NOSASL transport is broken (HIVE-4232). Impala still uses the NOSASL transport. The changes also include more careful dependency management. Change-Id: I16633dcef912dce20c8de8cf2f43c45a49460d20	2014-01-15 15:11:47 -08:00
Alex Behm	60003ad211	[CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes. Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106	2014-01-15 15:11:23 -08:00
Lenni Kuff	01660374c6	Additional fe and testdata pom.xml cleanup This change cleans up our FE pom.xml file by removing unneeded dependencies and system dependencies (system dependencies are now pulled in from the Maven release repository). The upside is that our pom is cleaner and it will also help reduce the likelihood of broken dependencies since Maven will pull in the right versions. The downside is that we now pull in quite a few more JARs. Note: I was unable to find release artifacts for Sentry and Parquet so I leaving those as "system" for now. Change-Id: I0b917b09a02243d78d89747591ab6bccacf7cf38 Saving changes Change-Id: I3697a7b44884c40e077b3e354fef76625e1b881d Reviewed-on: http://gerrit.ent.cloudera.com:8080/1011 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:17 -08:00
Lenni Kuff	882b5f09b2	Update FE pom to use datanucleus 3.2 which is required by CDH4.5 Hive Change-Id: I1c362eb68113e075eb72db71c2a94d74ceb7427f Reviewed-on: http://gerrit.ent.cloudera.com:8080/987 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:16 -08:00
ishaan	81b80c702c	Upgrade thirdparty to use CDH4.5 bits. The following changes have been made: -- Update hbase -- Update hive -- Update hadoop -- Update the parquet version to 1.2.5 Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb	2014-01-08 10:54:09 -08:00
Nong Li	4800995d44	Add execution for Hive UDFs. Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500 Reviewed-on: http://gerrit.ent.cloudera.com:8080/458 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:25 -08:00
Lenni Kuff	24116169ad	Disable DataErrorsTests due to IMPALA-614 Change-Id: I1c670c9a50ebea3ca875f552b02a8bda3b0906b4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/686 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:53:23 -08:00
Lenni Kuff	d6d1557fe7	Capture cluster logs with each test run / don't use mvn for starting cluster services Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643 Reviewed-on: http://gerrit.ent.cloudera.com:8080/404 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:40 -08:00
Nong Li	f63437e62a	Move to parquet mr hive serde. Change-Id: Id831b76b89b83c5ad1f270f76b34bf7390e6a06c Reviewed-on: http://gerrit.ent.cloudera.com:8080/200 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:10 -08:00
Lenni Kuff	8c264f0395	Added Sentry v1.1.0 to thirdparty	2014-01-08 10:51:48 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Lenni Kuff	7ac88e1fa9	IMPALA-400: Add support for SQL statement authorization This changes adds support for SQL statement authorization in Impala. The authorization works by updating the Catalog API to require a User + Privilege when getting Table/Db objects (and in the future can be extended to cover columns as well). If the user doesn't have permission to access the object, an AuthorizationException is thrown. The authorization checks are done during analysis as new Catalog objects are encountered. These changes build on top of the Hive Access code which handles the actually processing of authorization requests. The authorization is currently based on a "policy file" which will be stored in HDFS. This policy file is read once on startup and then reloaded every 5 minutes. It can also be reloaded on a specific impalad by executing a "refresh" command. Authorization is enabled by setting: --server_name='server1' and then pointing the impalad to the policy file using the flag: --authorization_policy_file=/path/to/policy/file any authorization configuration problems will result in impalad failing to start.	2014-01-08 10:50:56 -08:00
Lenni Kuff	2f7198292a	Add support for auxiliary workloads, tests, and datasets This change adds support for auxiliary worksloads, tests, and datasets. This is useful to augment the regular test runs with some additional tests that do not belong in the main Impala repo.	2014-01-08 10:50:32 -08:00
Alan Choi	9c11c0ce2d	HiveServer2 clean up This patch adds 1. use boost uuid 2. add unit test for HiveServer2 metadata operation 3. add JDBC metadata unit test 4. implement all remaining HiveServer2: GetFunctions and GetTableTypes 5. remove in-process impala server from fe-support	2014-01-08 10:48:06 -08:00
Lenni Kuff	d2e4776731	Support passing snapshot file to buildall, add script to run all tests, remove old tests	2014-01-08 10:47:59 -08:00
Alan Choi	251a8a2bf1	IMP-57: rename fe_port to beeswax_port	2014-01-08 10:47:53 -08:00
Alan Choi	be98df19c8	HiveServer2 This patch implements the HiveServer2 API. We have tested it with Lenni's patch against the tpch workload. It has also been tested manually against Hive's beeline with queries and metadata operations. All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax code is refactored to impala-beeswax-server.cc. HiveServer2 has a few more metadata operations. These operations go through impala-hs2-server to ddl-executor and then to FE. The logics are implemented in fe/src/main/java/com/cloudera/impala/service/MetadataOp.java. Because of the Thrift union issue, I have to modify the generated c++ file. Therefore, all the HiveServer2 thrift generated c++ code are checked into be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove these files. Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881	2014-01-08 10:47:24 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Henry Robinson	35e7e2a7a9	Move thirdparty library versions to environment variables	2014-01-08 10:46:38 -08:00
Henry Robinson	5f314a4d7e	Move to Postgresql for metastore	2014-01-08 10:46:34 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
Lenni Kuff	ae97ec5fd8	Enable support for running FindBugs 2 against the FE code	2014-01-08 10:46:12 -08:00
Nong Li	126971edbb	Update Impala to use CDH4.1 rc3.	2014-01-08 10:45:04 -08:00
Henry Robinson	881c88f131	IMP-177: Add Postgres connector jar to frontend dependencies	2014-01-08 10:44:36 -08:00
Lenni Kuff	b96b9640ef	Add script to start multiple Impalad instances locally and update query test to support an external ImpalaD Added a script that starts an impalad "cluster" (impalad + state store) with each impalad running on a different port. Also updated QueryTest to enable running against an external impalad. This enables running all the tests against a remote cluster or a local cluster setup with the script I added. By default we run with the in-process impalad - to enable running against a remove impalad use the flag: mvn test -Duse_external_impalad=true The same host/port flags work with this, for example: mvn test -Duse_external_impalad=true -Dimpalad=hostName -Dfe_port=21000	2014-01-08 10:44:34 -08:00
Alan Choi	f15ef994fb	"mvn test" now uses impalad and beeswax api to submit query and fetch, including insert query. review issue: 260	2014-01-08 10:44:30 -08:00
Lenni Kuff	cef688d0fd	IMP-95: Fix/recognize intermittent data load failures on jenkins Builds now fail on data loading problems. Also a simple test fix.	2014-01-08 10:44:18 -08:00
Lenni Kuff	5af1869475	Enable running Query tests targeting an in-process or out of process (impalad) test env This change enables running of the query tests (and potentially other tests in the future) targeting an in-process or external process test environment. This means that the tests can be run against a remote distributed cluster with ImpalaD deployed - or run locally in-process. To target a remote environment execute the tests using the following two flags: mvn test -Dimpalad=<hostname of coordinator> -Dfe_port=21000 If these are not specified that the existing (in-process) test environment is inferred. The major parts of this change are: ImpaladClientExecutor - this is a new client executor class that uses the beeswax thrift interface to communicate with a target impalad instance. TestUtilities - This class was updated to add support for running queries against impalad using the Impalad client executor. As part of this change I also split the query tests into a few separate files: JoinQueryTest, InsertQueryTest, HBaseQueryTest, etc... This will make it easier to pick which subset of tests you want to run. It will also help reduce our max test log file size in the Jenkins runs. To enable this I created a new 'BaseQueryTest' class that does much of the work of choosing which combinations of File format, compression, batch size, etc to run with. Current shortcomings: 1) It would be nice for "Executor" and "ImpaladClientExecutor" to share a common interface. None currently exists and I wasn't sure what a good one would be any thoughts around this would be appreciated. Because of this I had to resort to passing an "Exector" of type "Object" for the time being. 2) Beeswax API doesn't currently provide a way to specify things like the number of execution nodes. For now we just ignore this parameter (it can be set by the impalad instance). 3) Double and float values are formated with a larger prevision when executed over the Beeswax interface. This causes results to be different and test failures. A second checkin will update the in-process output to match that of the beeswax.	2012-07-10 08:08:01 -07:00
Henry Robinson	ce2ae276c1	Build changes for CDH4 upgrade	2012-06-22 16:05:03 -07:00
Alan Choi	ef10afa439	This changes the Thrift from 0.6.1 to 0.7.0. Please uninstall the old thrift and download/install Thrift 0.7.0. Beeswax service now depends on Hive metastore; fix buildall.sh to clean generated-source in FE; fix .gitignore to clean generated-source in BE;	2012-06-14 18:21:08 -07:00
Lenni Kuff	d4396801f1	This enables running the FE tests in two modes - 'exhaustive' and 'reduced'. Depending on the execution mode, we will use a different set of test vectors so we can help control test execution time. The ideas is that for checkins the tests are run in the 'reduced' input set mode. For nightly builds we will run the exhaustive set of test combinations. This is controlled with a new flag specified when running the tests: mvn -DtestExecutionMode=exhaustive test or mvn -DtestExecutionMode=reduced test Note: If the -DtestExecutionMode is not specified it will default to reduced. As part of this change a bunch of the test files had to be updated to be parameterized. If they are no parameterized then they will not benefit from the new coverage that has been added. This change currently is just for the Query Tests. I would like to extract some of this logic and generalize it for more test suites with a future checkin.	2012-06-01 16:31:15 -07:00
Michael Ubell	7b14187bf1	Install snappy library add create-load-data.sh	2012-05-02 07:31:10 -07:00
Henry Robinson	92673b7852	Add -noformat to buildall.sh. Fix java.library.path in pom.xml; clean up indentation	2012-04-12 16:59:52 -07:00
Marcel Kornacker	5d5333c228	Fixing problem in buildall.sh: hbase/hive-site.xml were generated prior to being wiped out as part of the clean-up phase. Also moving around dependencies in pom.xml to work around problem w/ loading hsql.	2012-03-13 11:53:12 -07:00
Marcel Kornacker	be522a29cd	Upgraded to surefire plugin version 2.12. Temporarily disabled destruction of TestExecEnv in order to avoid crash in HdfsFsCache d'tor. Activated backend tests in buildall.sh.	2012-03-12 18:18:09 -07:00
Marcel Kornacker	4a4a07fde7	A number of changes for the Jenkins build: - added option to run with derby metastore, based on whether env var METASTORE_IS_DERBY is set - emoved hardwired file locations from planner tests - switching to linking statically against libthrift.a Also added script rebuild.sh, which contains the build steps of buildall.sh (against impala sources).	2012-03-08 16:19:47 -08:00
Alan Choi	727ee77ec4	HBase now runs on pseudo-distributed mode with 4 region servers code review : http://review.sf.cloudera.com/r/14695/	2012-03-08 15:07:12 -08:00
Henry Robinson	ac03a01be2	Add hsqldb dependency for front-end JDBC test	2012-02-21 15:30:24 -08:00

1 2

87 Commits