impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 12:01:11 -05:00

Author	SHA1	Message	Date
Lenni Kuff	8b2acf5c22	IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables With this change we now detect if a table is read-only and disable INSERT/LOAD operations on these tables. A table is read-only if Impala does not have write permission on the HDFS base directory of the table or any one of the partition directories (if the table is partitioned). Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/713 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:37 -08:00
Nong Li	4800995d44	Add execution for Hive UDFs. Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500 Reviewed-on: http://gerrit.ent.cloudera.com:8080/458 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:25 -08:00
Nong Li	6b9a7de02e	Add symbol resolution during analysis for create function stmts. Before this, we had to specify the entire mangled symbol. This can be quite long and quite tedious (take a look at some of the create UDA test cases that specify all the symbols). This patch adds some code to convert from the user function signature to the mangled name. This means the user can specify the unmangled name and we can do the symbol lookup. The mangling rules are pretty convoluted but if it is messed up, the user can always specify the full symbol. Some other minor cleanup in: - JNI from FE to BE - UDFs/UDAs that are loaded as test data Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/624 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:20 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Lenni Kuff	79cdeac3d6	Consolidate test cluster under IMPALA_HOME/cluster_logs + store logs during data loading Change-Id: I8f6239e4ccb0515c85bf80193a475788fb18dedb Reviewed-on: http://gerrit.ent.cloudera.com:8080/518 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:56 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00
ishaan	6735e3983f	Fix build failure because of hbase data loading. Change-Id: I796656332c3733a1ffdc338d206009efa6c451ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/360 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:37 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Skye Wanderman-Milne	6e7406df8b	IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string) Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296 Reviewed-on: http://gerrit.ent.cloudera.com:8080/127 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:02 -08:00
Skye Wanderman-Milne	3fecdeb793	IMPALA-441: support default values for Avro tables	2014-01-08 10:51:39 -08:00
Skye Wanderman-Milne	c8a8308ece	Avro schema resolution (minus default values)	2014-01-08 10:51:26 -08:00
Lenni Kuff	7ac88e1fa9	IMPALA-400: Add support for SQL statement authorization This changes adds support for SQL statement authorization in Impala. The authorization works by updating the Catalog API to require a User + Privilege when getting Table/Db objects (and in the future can be extended to cover columns as well). If the user doesn't have permission to access the object, an AuthorizationException is thrown. The authorization checks are done during analysis as new Catalog objects are encountered. These changes build on top of the Hive Access code which handles the actually processing of authorization requests. The authorization is currently based on a "policy file" which will be stored in HDFS. This policy file is read once on startup and then reloaded every 5 minutes. It can also be reloaded on a specific impalad by executing a "refresh" command. Authorization is enabled by setting: --server_name='server1' and then pointing the impalad to the policy file using the flag: --authorization_policy_file=/path/to/policy/file any authorization configuration problems will result in impalad failing to start.	2014-01-08 10:50:56 -08:00
Skye Wanderman-Milne	1ab189c789	Fix build	2014-01-08 10:50:52 -08:00
Skye Wanderman-Milne	c8fd4f8016	IMPALA-362: impalad hangs when read sequence file without contents	2014-01-08 10:50:49 -08:00
Alan Choi	bd59bbb07a	IMPALA-300/356 Always reload region server info. Clear keyRange.start/stopkey before setting it in setKeyRangeStart/End. Split HBase tables into multiple regions. I've to disable HBase scanrangelocations planner test because region assigment is non-deterministic. I'll have a follow up patch to address that.	2014-01-08 10:50:48 -08:00
Lenni Kuff	2f7198292a	Add support for auxiliary workloads, tests, and datasets This change adds support for auxiliary worksloads, tests, and datasets. This is useful to augment the regular test runs with some additional tests that do not belong in the main Impala repo.	2014-01-08 10:50:32 -08:00
Skye Wanderman-Milne	223b1a8e47	IMPALA-293: Impala is unable to query RCFile tables which describe less columns than the file's header.	2014-01-08 10:50:17 -08:00
Skye Wanderman-Milne	cc6007cf9e	IMPALA-262: Querying text/lzo table that is not indexed causes an impalad segfault	2014-01-08 10:49:52 -08:00
Lenni Kuff	36e9fe1c1a	Run compute table stats statements using Hive CLI This works around a problem with computing table stats via the Hive Meta Store client API. When executing these stements via the MetaStoreClient, all tables were getting a num_rows=0 value returned from the ANALYZE TABLE query.	2014-01-08 10:49:19 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Lenni Kuff	e0a7b7cb55	Compute column stats on tables used by Planner tests	2014-01-08 10:48:48 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	6e1f8d178a	Update utility script to compute column and table stats for given table(s)	2014-01-08 10:48:23 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	99bb22dcac	Add db name filter to compute stats, run compute stats on functional/text tables	2014-01-08 10:48:08 -08:00
Alan Choi	73c8ee3d96	IMPALA-18 Ignore hidden file prefixed with . or _	2014-01-08 10:48:00 -08:00
Lenni Kuff	d2e4776731	Support passing snapshot file to buildall, add script to run all tests, remove old tests	2014-01-08 10:47:59 -08:00
Lenni Kuff	d5177c3c30	Update run-workload to support specifying table format test vectors from command line	2014-01-08 10:47:20 -08:00
Lenni Kuff	deea7a86b9	Remove Trevni from mixed format table and fix data loading bug	2014-01-08 10:47:10 -08:00
Lenni Kuff	30dbf59ef2	Final changes to enable Python test infrastructure and tests With this change the Python tests will now be called as part of buildall and the corresponding Java tests have been disabled. The new tests can also be invoked calling ./tests/run-tests.sh directly. This includes a fix from Nong that caused wrong results for limit on non-io manager formats.	2014-01-08 10:46:57 -08:00
Lenni Kuff	bed633c1ae	Extract config/metastore creation from buildall + script for loading warehouse snapshot	2014-01-08 10:46:53 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00
Michael Ubell	8a5297a526	Add HdfsLzoTextScanner	2014-01-08 10:46:35 -08:00
Michael Ubell	85807f6169	Start a single impalad to avoid data load race	2014-01-08 10:46:18 -08:00
Michael Ubell	37aaf06f79	IMP-390 Get rid of test dependencies on InProcessQE and Runquery	2014-01-08 10:46:18 -08:00
Michael Ubell	0c4f025a5e	Fix loading of nulltable data, remove loading functional-planner data	2014-01-08 10:45:58 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Lenni Kuff	84d91fca4f	Fix sequence file data loading for the alltypesmixedformat table Moved this out of the data loading framework because it is kind of a special case. I will consider how we can update the framework to address mixed format tables.	2014-01-08 10:44:18 -08:00
Lenni Kuff	bf27a31f98	Move functional data loading to new framework + initial changes for workload directory structure This change moves (almost) all the functional data loading to the new data loading framework. This removes the need for the create.sql, load.sql, and load-raw-data.sql file. Instead we just have the single schema template file: testdata/datasets/functional/functional_schema_template.sql This template can be used to generate the schema for all file formats and compression variations. It also should help make loading data easier. Now you can run: bin/load-impala-data.sh "query-test" "exhaustive" And get all data needed for running the query tests. This change also includes the initial changes for new dataset/workload directory structure. The new structure looks like: testdata/workload <- Will contain query files and test vectors/dimensions testdata/datasets <- WIll contain the data files and schema templates Note: This is the first part of the change to this directory structure - it's not yet complete. # Please enter the commit message for your changes. Lines starting	2014-01-08 10:44:18 -08:00
Alan Choi	b17e24d654	A few FE fixes [rewritten by hnr] review issue: 198 Change-Id: I84a2f38b0bce5a6f33dfb974de60c822945834e5	2014-01-08 10:44:15 -08:00
Lenni Kuff	e293164b37	Added TPCH functional query tests and schema generation This adds most of the Hive TPCH queries into the functional Impala tests. This code review doesn't actually include the TPCH data. The data set is relatively large. Instead I updated scripts to copy the data from a data host. This change has a few parts: 1) Update the benchmark schema generation/test vector generation to be more generic. This way we can use the same schema creation/data loading steps for TPCH as we do for benchmark tests. 2) Add in schema template for the TPCH workload along with test vectors and dimensions which are used for schema generation. 3) Add in a new test file for each TPC-H query. The Hive TPCH work broke down the queries to generate some "temp" tables, then execute using joins/selects from these temp tables. Since creating the temp tables does some real work it is good to execute these via Impala. Each test a) Runs all the Insert statements to generate the temp tables b) runs the additional TPCH queries 4) Updated all the TPCH insert statements and queries to be parameterized on $TABLE name. This way we can run the tests across all combinations of file format/compression/etc. 5) Updated data loading Change-Id: I6891acc4c7464eaf1dc7dbbb532ddbeb6c259bab	2014-01-08 10:44:06 -08:00
Henry Robinson	3ff3559805	Add support for per-partition file formats to front end and backend. At the same time, this patch removes the partitionKeyRegex in favour of explicitly sending a list of literal expressions for each file path from the front end.	2012-06-05 12:00:09 -07:00
Michael Ubell	7b14187bf1	Install snappy library add create-load-data.sh	2012-05-02 07:31:10 -07:00

45 Commits