impala

mirror of https://github.com/apache/impala.git synced 2026-01-04 09:00:56 -05:00

Author	SHA1	Message	Date
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Nong Li	c1a64d6863	Add kill-mini-llama to CDH4 branch. This makes it easier to switch between our branches and a no-op if for those of us staying on CDH4. Change-Id: Ic07eb8a7ba7e48db118c06c221aabe5e124f3bfb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1033 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:54:17 -08:00
ishaan	fcdcf1a9d8	Parallelize data loaded through Impala to speed up data loading. Currently, we execute all the queries involved in data loading serially. This change creates a separate .sql file for each file format, compression codec and compression scheme combination, and executes all the files in parallel. Additionally, we now store all the .sql files (independent of workload) in $IMPALA_HOME/data_load_files/<dataset_name>. Note that only data loaded through Impala is parallelized, data loaded through hive and hbase remains serial. On our build machines, the time taken to load all the data from snapshot was on the order of 15 minutes. Change-Id: If8a862c43f0e75b506ca05d83eacdc05621cbbf8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/804 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:53 -08:00
Lenni Kuff	498c2529d4	Test CR: Change spacing in run-all.sh Change-Id: I2362799213a7faca3892e38fb874bfbbd0c1718f Reviewed-on: http://gerrit.ent.cloudera.com:8080/803 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:50 -08:00
Lenni Kuff	8b2acf5c22	IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables With this change we now detect if a table is read-only and disable INSERT/LOAD operations on these tables. A table is read-only if Impala does not have write permission on the HDFS base directory of the table or any one of the partition directories (if the table is partitioned). Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/713 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:37 -08:00
Alex Behm	51e914e911	Use hive-exec instead of hive-builtin because hive-builtin does not exist in CDH5 Hive. Change-Id: I11993c7eebc9f5f07f112810d7e81d07ce157193 Reviewed-on: http://gerrit.ent.cloudera.com:8080/715 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:33 -08:00
Lenni Kuff	72e211ca4a	Use Hive Metastore Service instead of HiveServer 1 in test infrastructure Change-Id: I4e2ba02b2101bae95d196ab13f9453e1b3a9d7be Reviewed-on: http://gerrit.ent.cloudera.com:8080/689 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:26 -08:00
Nong Li	4800995d44	Add execution for Hive UDFs. Change-Id: I6a5ad96fed77e2b8a2701f21a917a8eb7a11d500 Reviewed-on: http://gerrit.ent.cloudera.com:8080/458 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:25 -08:00
Nong Li	6b9a7de02e	Add symbol resolution during analysis for create function stmts. Before this, we had to specify the entire mangled symbol. This can be quite long and quite tedious (take a look at some of the create UDA test cases that specify all the symbols). This patch adds some code to convert from the user function signature to the mangled name. This means the user can specify the unmangled name and we can do the symbol lookup. The mangling rules are pretty convoluted but if it is messed up, the user can always specify the full symbol. Some other minor cleanup in: - JNI from FE to BE - UDFs/UDAs that are loaded as test data Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/624 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:20 -08:00
ishaan	8a43426879	Sleep after starting the hiveserver2 service to guards against it not starting on time. Change-Id: I9a0de1cc63089cba2f9b59942ee45abc44b8662e Reviewed-on: http://gerrit.ent.cloudera.com:8080/643 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:53:17 -08:00
Lenni Kuff	b07a3ccfd6	Use an external Hive Metastore Service for local test runs Using an external Hive Metastore Service for local test runs has a number of benefits. Some of the benefits are that it helps separate the metastore logs from the impala logs, and that it is more representative of what is on real cluster environments. It also may help with some of the concurrency issues that we have been seeing when running directly against the backend database since we no longer spin up an in-process metastore server for each client connection. The metastore is started by running "run-hive-server.sh" which is invoked as part of "run-all.sh". Change-Id: If60fa97aa38e4ad5cf578b9b409eeea1e0e29375 Reviewed-on: http://gerrit.ent.cloudera.com:8080/628 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:15 -08:00
Skye Wanderman-Milne	b7f83bcd73	Add support for LLVM IR UDFs. This patch also adds a number of improvements to NativeUdfExpr. Highlights include: * Correctly handling the lowering of AnyVal struct types (required for ABI compatibility) * A rudimentary library cache for reusing handles produced by dlopen * More complicated test cases Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195 Reviewed-on: http://gerrit.ent.cloudera.com:8080/540 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:03 -08:00
Nong Li	e5ed8e4105	Move minicluster_xml_conf to HADOOP_CONF_DIR. The current location gets deleted if you rebuild, making you have to restart mini dfs. Change-Id: If71b144534255fa8df2bfa187c0814ffdf28463e Reviewed-on: http://gerrit.ent.cloudera.com:8080/550 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:03 -08:00
Lenni Kuff	79cdeac3d6	Consolidate test cluster under IMPALA_HOME/cluster_logs + store logs during data loading Change-Id: I8f6239e4ccb0515c85bf80193a475788fb18dedb Reviewed-on: http://gerrit.ent.cloudera.com:8080/518 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:56 -08:00
Skye Wanderman-Milne	fd99db0300	First pass at UdfExpr. Change-Id: I517bf56541749b5c2459554821c7bf838239fdf0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/439 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:50 -08:00
Henry Robinson	a46276325c	IMPALA-415: Don't delete hidden files in the root directory for INSERT OVERWRITE INSERT OVERWRITE into an unpartitioned table is supposed to remove all data files from the root. This should not include hidden files or directories. This patch excludes hidden files from deletion, and adds a test case. Partition directories are still removed in their entirety: the cost of statting a large number of files and directories rather than issuing a single "rm -rf" outweighs the benefits of preserving hidden files for now. Hive does not preserve hidden files in either configuration. Change-Id: Ia73e55e011c26c88f14745075210cf359764e3c1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/418 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:50 -08:00
Lenni Kuff	a1f2f72f49	Add Impala DDL support for creation of AVRO tables + support for CREATE/ALTER SERDEPROPERTIES This change adds Impala DDL support for creation of AVRO tables. Additionally, it add Impala support for CREATE and ALTER SERDEPROPERTIES which are used when creating Avro backed tables. This syntax is not exactly the same as the Hive support since it introduces a new fileformat (AVROFILE) that implies the needed Serialization library, input format, and output format. Change-Id: I5047e419198a89599e9d014fdedfee1a20437a7d Reviewed-on: http://gerrit.ent.cloudera.com:8080/464 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:48 -08:00
Lenni Kuff	d6d1557fe7	Capture cluster logs with each test run / don't use mvn for starting cluster services Change-Id: I708b547e49d035c5f029ea86119cc844ccbc5643 Reviewed-on: http://gerrit.ent.cloudera.com:8080/404 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:40 -08:00
Lenni Kuff	9f54242941	Add retry loop around split-hbase to fix build breaks Change-Id: I539407ce05d705b6b4e88d0791fc4ec236c79c80 Reviewed-on: http://gerrit.ent.cloudera.com:8080/399 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:39 -08:00
ishaan	6735e3983f	Fix build failure because of hbase data loading. Change-Id: I796656332c3733a1ffdc338d206009efa6c451ac Reviewed-on: http://gerrit.ent.cloudera.com:8080/360 Tested-by: jenkins Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:37 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Lenni Kuff	f264db1647	Automatically force load partitioned tables to ensure valid partition metadata Change-Id: Ief91102f30d4669503d473299256a74a50d8fe3c Reviewed-on: http://gerrit.ent.cloudera.com:8080/261 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:17 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Skye Wanderman-Milne	6e7406df8b	IMPALA-502: Impala does not return NULL for case where table has extra string column and data does not (it returns an empty string) Change-Id: I0cfe5ce5fc279d46610a3cc191a501ccbc335296 Reviewed-on: http://gerrit.ent.cloudera.com:8080/127 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:02 -08:00
Skye Wanderman-Milne	3fecdeb793	IMPALA-441: support default values for Avro tables	2014-01-08 10:51:39 -08:00
Skye Wanderman-Milne	c8a8308ece	Avro schema resolution (minus default values)	2014-01-08 10:51:26 -08:00
ishaan	e7c6d57f9c	IMP-773: Add better logging/error detection to start-impala-cluster.py	2014-01-08 10:51:25 -08:00
Alan Choi	254ee6ef89	IMPALA-434 Support binary hbase encoding	2014-01-08 10:51:18 -08:00
Lenni Kuff	abdfae5b24	Update DESCRIBE FORMATTED results to match the Hive HS2 output	2014-01-08 10:51:14 -08:00
Lenni Kuff	7ac88e1fa9	IMPALA-400: Add support for SQL statement authorization This changes adds support for SQL statement authorization in Impala. The authorization works by updating the Catalog API to require a User + Privilege when getting Table/Db objects (and in the future can be extended to cover columns as well). If the user doesn't have permission to access the object, an AuthorizationException is thrown. The authorization checks are done during analysis as new Catalog objects are encountered. These changes build on top of the Hive Access code which handles the actually processing of authorization requests. The authorization is currently based on a "policy file" which will be stored in HDFS. This policy file is read once on startup and then reloaded every 5 minutes. It can also be reloaded on a specific impalad by executing a "refresh" command. Authorization is enabled by setting: --server_name='server1' and then pointing the impalad to the policy file using the flag: --authorization_policy_file=/path/to/policy/file any authorization configuration problems will result in impalad failing to start.	2014-01-08 10:50:56 -08:00
Alan Choi	2bdba77f61	Perform HBase deterministic region assigment and enable HBase scan range location test in the planner test	2014-01-08 10:50:54 -08:00
Skye Wanderman-Milne	1ab189c789	Fix build	2014-01-08 10:50:52 -08:00
Skye Wanderman-Milne	c8fd4f8016	IMPALA-362: impalad hangs when read sequence file without contents	2014-01-08 10:50:49 -08:00
Alan Choi	bd59bbb07a	IMPALA-300/356 Always reload region server info. Clear keyRange.start/stopkey before setting it in setKeyRangeStart/End. Split HBase tables into multiple regions. I've to disable HBase scanrangelocations planner test because region assigment is non-deterministic. I'll have a follow up patch to address that.	2014-01-08 10:50:48 -08:00
Lenni Kuff	2f7198292a	Add support for auxiliary workloads, tests, and datasets This change adds support for auxiliary worksloads, tests, and datasets. This is useful to augment the regular test runs with some additional tests that do not belong in the main Impala repo.	2014-01-08 10:50:32 -08:00
ishaan	f026354721	IMP-912 Make force killing an option. Update run-all-tests to pre-emptively force kill.	2014-01-08 10:50:29 -08:00
Skye Wanderman-Milne	223b1a8e47	IMPALA-293: Impala is unable to query RCFile tables which describe less columns than the file's header.	2014-01-08 10:50:17 -08:00
Skye Wanderman-Milne	cc6007cf9e	IMPALA-262: Querying text/lzo table that is not indexed causes an impalad segfault	2014-01-08 10:49:52 -08:00
Nong Li	563cbfa3a8	Enable parquet testing	2014-01-08 10:49:40 -08:00
Lenni Kuff	cba9cd00dd	Fix full data load build break due to constructing incorrect HDFS paths	2014-01-08 10:49:34 -08:00
Lenni Kuff	558d5ce755	Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists	2014-01-08 10:49:28 -08:00
Nong Li	ebab23841a	Add back pressure in hdfs-scan-node to prevent excessive buffer queueing.	2014-01-08 10:49:25 -08:00
Lenni Kuff	36e9fe1c1a	Run compute table stats statements using Hive CLI This works around a problem with computing table stats via the Hive Meta Store client API. When executing these stements via the MetaStoreClient, all tables were getting a num_rows=0 value returned from the ANALYZE TABLE query.	2014-01-08 10:49:19 -08:00
Elliott Clark	0e0c02b6bd	Add the ability to Select into HBase table. * Changed frontend analysis for HBase tables * Changed Thrift messages to allow HBase as a sink type. * JNI Wrapper around htable * Create hbase-table-sink * Create hbase-table-writer * Static init lots of JNI related code for HBase. * Cleaned up some cpplint issues. * Changed junit analysis tests * Create a new HBase test table. * Added functional tests for HBase inserts.	2014-01-08 10:49:06 -08:00
Lenni Kuff	993da8fcba	Fix bug in how insert tables are generated	2014-01-08 10:49:05 -08:00
Lenni Kuff	5f81becd84	Create tables used by insert tests in a supported insert format	2014-01-08 10:49:00 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Lenni Kuff	7584312540	IMPALA-167: Impala should gracefully handle unsupported Hive table types	2014-01-08 10:48:56 -08:00
Skye Wanderman-Milne	811d5dd00b	Create Avro schema directory in test warehouse	2014-01-08 10:48:50 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00

1 2 3

150 Commits