impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 21:02:41 -05:00

Author	SHA1	Message	Date
Matthew Jacobs	65c1a6f21e	Remove SOURCE keyword by parsing as an identifier and checking the value Reverts "IMPALA-1033: Remove SOURCE keyword; very common identifier" Change-Id: I3fcf6d02786e00287b564cff0a823d0c19504e7a	2014-06-30 16:47:47 -07:00
Dimitris Tsirogiannis	6a795915d6	Fix loading data from snapshopt for alltypesagg table. The alltypesagg table was not loaded correctly from a snapshot file due to a missing ALTER TABLE statement, thereby causing some tests to fail. Change-Id: I74066a99529f24fc268bb5779d3fb64fbd4f66b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3248 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3270 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-06-25 21:52:11 -07:00
Victor Bittorf	2d7f2e19b2	IMPALA 938: Infer schema from Parquet file Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'". Supports all options that CREATE TABLE does. Currently only PARQUET is supported. Run testdata/bin/create-load-data.sh after pulling this patch. Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158	2014-06-20 17:38:01 -07:00
ishaan	99602fb8c2	Force load data if the current HEAD has a schema change. This patch checks the test-warehouse's stored githash (if it exists) to determine if the current patch has changed the schema if a table. If a change is detected, we force load all the data. Change-Id: I314f9f3364d3e6b2d66de38a9e6d9f57c4e279a7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3049 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-06-19 02:25:50 -07:00
Matthew Jacobs	f5da019555	IMPALA-1025: Use converse of data source predicate operators if expr has val before slot Change-Id: I31790c037e2fa9af7b80c01014f7507ba5053e63 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2925 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins	2014-06-09 23:54:09 -07:00
Matthew Jacobs	89ec6b3d7a	IMPALA-1033: Remove SOURCE keyword; very common identifier The SOURCE keyword was introduced for DATA SOURCE ddl commands, but it is also a very common identifier. This removes the SOURCE and SOURCES keywords and instead uses DATASOURCE and DATASOURCES. Change-Id: Ic6c2897d1e23efa169aa8787752fe4aa2bb125d5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2895 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 267c13f9b46d249bfd1b8711fd3fadf6853dc1ef)	2014-06-09 17:17:14 -07:00
Nong Li	5d80942d42	[CDH5] IMPALA-1019: Fix cancellation path in io mgr for cached reads. Change-Id: I11efd65d1efa900f79afe88b781262a44ac5006a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2703 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-30 19:14:39 -07:00
Lenni Kuff	c45e9a70d9	[CDH5] Add DDL support for HDFS caching This change adds DDL support for HDFS caching. The DDL allows the user to indicate a table or partition should be cached and which pool to cache the data into: * Create a cached table: CREATE TABLE ... CACHED IN 'poolName' * Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName' * Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED When a table/partition is marked as cached, a new HDFS caching request is submitted to cache the location (HDFS path) of the table/partition and the ID of that request is stored with in the table metadata (in the table properties). This is stored as: 'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS and persisted across HDFS restarts. When a cached table or partition is dropped it is important to uncache the cached data (drop the associated cache request). For partitioned tables, this means dropping all cache requests from all cached partitions in the table. Likewise, if a partitioned table is created as cached, new partitions should be marked as cached by default. It is desirable to know which cache pools exists early on (in analysis) so the query will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To support this, a new cache pool catalog object type was introduced. The catalog server caches the known pools (periodically refreshing the cache) and sends the known pools out in catalog updates. This allows impalads to perform analysis checks on cache pool existence going to HDFS. It would be easy to use this to add basic cache pool management in the future (ADD/DROP/SHOW CACHE POOL). Waiting for the table/partition to become cached may take a long time. Instead of blocking the user from access the time during this period we will wait for the cache requests to complete in the background and once they have finished the table metadata will be automatically refreshed. Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-05-27 16:47:15 -07:00
Skye Wanderman-Milne	1dff1686aa	Add option to build UDF test libs in copy-udfs-udas.sh The option is off by default, but useful for running this script without building the world. Change-Id: I82d8251cf9bb2763ce69094da1995a4d6ceff167 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2647 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit a7f77643820dcbfbab231a9260c94450564bd2df) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2659 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-05-22 18:01:55 -07:00
Matthew Jacobs	6ccd56bc1f	Enforce slot equivalences at data source scan nodes Change-Id: I2ed606ba398990ab05afa3301b6356c6a636e2bb Reviewed-on: http://gerrit.ent.cloudera.com:8080/2521 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit 55061f6953956f45d433fe227ded539a648e3f9c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2536	2014-05-19 14:37:44 -07:00
Dimitris Tsirogiannis	a7a9cde86f	CDH-18969: Incorrect query result in Impala This commit fixes issue CDH-18969 where Impala returns wrong results when querying an HBase table. This issue is triggered when a column family sorts lexicographically before ":key", which is the column family of the row key, thereby causing the wrong column to be used as a row key by the backend. The following changes are included: 1. Modified the load function in HBaseTable.java to make sure the catalog object of an HBase table always stores the row key column first. Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602	2014-05-18 16:29:11 -07:00
Skye Wanderman-Milne	edbbe6035e	Decimal: read from Avro Allows reading decimal columns with or without codegen. Includes tests based on a data file posted on HIVE-5823. Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6 (cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-05-16 22:26:11 -07:00
Lenni Kuff	61cbdd4f49	[CDH5] Add Sentry Service to local test environment Adds the ability to start/stop the Sentry Service to our local test environment and load the sentry-site.xml configs. Since the existing Sentry startup scripts don't work I wrote a simple wrapper to handle service startup. Change-Id: I1b77a2e50e51e6e6eae58cfed4d5d7c403dbc0b4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2540 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-05-14 12:02:02 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
ishaan	50caed17d7	[CDH5] Fix the format option in run-all Previously, the -format option was a no-op. Moreoever, run-all would not work without the option. This patch fixes both problems. Change-Id: I4726c03452409322fd0cd864cdb6dd395c4e651a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2449 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-05-07 22:56:38 -07:00
Lenni Kuff	13c794db91	[CDH5] Update dependency versions to CDH5.1.0 This just updates the versions, it doesn't touch anything in /thirdparty. Change parquet version to append SNAPSHOT Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4	2014-05-07 15:10:40 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
Skye Wanderman-Milne	60db4d4d82	CDH-18416: Don't inline ReadWriteUtil::ReadZLong() For wide Avro tables, ReadZLong() would get inlined many times into a single function body, causing LLVM to crash. Not inlining doesn't seem to have a performance impact on narrow tables, and helps with wide tables. This change also adds tests over wide (i.e. many-column) tables. The test tables are produced by specifying shell commands to generate test tables in functional_schema_template.sql, which are executed in generate-schema-statements.py. In the SQL templates, sections starting with a ` are treated as shell commands. The output of the shell command is then used as the section text. This is only a starting point; it isn't currently implemented for all sections, and may have to be tweaked if we use this mechanism for all tables. Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353 Tested-by: jenkins	2014-04-28 15:58:15 -07:00
casey	2351266d0e	Replace single process mini-dfs with multiple processes This should allow individual service components, such as a single nodemanager, to be shutdown for failure testing. The mini-cluster bundled with hadoop is a single process that does not expose the ability to control individual roles. Now each role can be controlled and configured independently of the others. Change-Id: Ic1d42e024226c6867e79916464d184fce886d783 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1432 Tested-by: Casey Ching <casey@cloudera.com> Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2297 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-04-23 18:24:05 -07:00
Nong Li	87295a4e06	Decimal implementation. This patch implements decimal support for text based formats. Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238 Tested-by: jenkins	2014-04-14 21:07:32 -07:00
Lenni Kuff	aa0b7a35f5	IMPALA-880: COMPUTE STATS should update partitions in batches When updating partition metadata as part of COMPUTE STATS we would previously attempt to update all partitions at once. This could lead to HMS socket timeouts and also could run into issues if there were > 32K partitions. In this change we now update the partitions in batches, with a max size of 500 partitions per batch. We also compare whether the row count has changed and only update partitions that have been modified. Change-Id: If7bfcc30f86fc2fdd79855b981067ac29a47b5e1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1913 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1918	2014-03-14 19:20:12 -07:00
ishaan	9e043e862c	Fix run-hbase.sh to correctly pick up the classpath. We run wat-for-hbase-master.py after starting hbase to account for a race between the master and region server. This script has not been working for some time. It caused no ill effects sinc the said race was absent. However, the race has manifested itself again, so the script needs to be fixed. Setting the correct classpath does so. Change-Id: I783a7473cfd24a9cb66711f5428f7052ceb96282 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1756 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-03-05 01:04:56 -08:00
ishaan	00724a47da	Prefix the path to the local core-site to the classpath used by minillama With a recent upstream change, a core-site.xml was introduced in a YARN test jar pulled in by thirdparty. This causes MiniLlama to ignore options set in fe/src/test/resources/core-site.xml. The problem manifests itself with the MiniDfsCluster starting on an arbitary port, but it would have also caused a lot of tests to fail as none of the compression codecs are pulled in. This change prepends the classpath used by minillama with the path to the internal core-site. Change-Id: Iee267fe12e02301baec059a1f7469288c038d6fa Reviewed-on: http://gerrit.ent.cloudera.com:8080/1739 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-03-04 09:59:50 -08:00
Lenni Kuff	bf16b5cd0d	IMPALA-749: Fetch partitions in batches, rather than all at once. This updates how Impala fetches partition metadata from the Hive Metastore to fetch partitions in batches, rather than all at once. This helps reduce the load on the HMS and also lets Impala scale to above 32K partitions. The downside is that it may require additional RPCs to get all the partitions. This is done by first querying the metastore to get all the partition names that exist, then splitting the list of names into seperate batches to get the actual partition metadata. Impala uses a default size of 1000 partitions per batch, but it can be configured by setting the 'hive.metastore.batch.retrieve.table.partition.max' parameter in the hive-site.xml config file. Change-Id: Ide0ec30ef8a9e00f79c26551aa8e5e7814c73034 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1662 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1698	2014-02-28 22:30:45 -08:00
Alex Behm	9cabee4a71	Wait for the Metastore to come up before starting HiveServer2. Change-Id: Ic8e29efe63f6745e1ff44248657cbd7882bb16d9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1626 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1670 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-02-25 21:05:33 -08:00
Alex Behm	8223e1e44b	Avoid Hive replication bug (CDH-17414) by 'warming up' HiveServer2 after it starts. The purpose of this patch is to avoid CDH-17414 which causes data files loaded with Hive to incorrectly have a replication factor of 1. When using beeline this problem only appears to occur immediately after creating the first HBase table since starting HiveServer2, i.e., subsequent loads seem to function correctly. This patch add a new script that creates an external HBase table in Hive to 'warm up' HiveServer2 immediately after it is started. Subsequent loads should assign a correct replication factor. Change-Id: Ic54c9401b67b748a8848d19f82b8e7df9535e845 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1640 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-02-25 17:33:53 -08:00
Lenni Kuff	b4f5c1edcf	Enable lazy loading of table metadata for the CatalogService/Impalad This change adds support for lazy loading of table metadata to the CatalogService/Impalad. The way this works is that the CatalogService initially sends out an update with only the databases and table names (wrapped as IncompleteTables). When an Impalad encounters one of these tables, it will contact the catalog service to get the metadata, possibly triggering a metadata load if the catalog server has not yet loaded this table. With these changes the catalog server starts up in just seconds, even for large metastores since it only needs to call into the metastore to get the list of tables and databases. The performance of "invalidate metadata" also improves for the same reason. I also picked up the catalog cleanup patch I had to make the APIs a bit more consistent and remove the need for using a LoadingCache for databases. This also fixes up the FE tests to run in a more realistic fashion. The FE tests now run against catalog object recieved from the catalog server. This actually turned up some bugs in our previous test configuration where we were not running with the correct column stats (we were always running with avgSerializedSize = slotSize). This changed some plans so the planner tests needed to be updated. Still TODO: This does not include the changes to perform background metadata loading. I will send that out as a separate patch on top of this. Change-Id: Ied16f8a7f3a3393e89d6bfea78f0ba708d0ddd0e Saving changes Change-Id: I48c34408826b7396004177f5fc61a9523e664acc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1328 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/1338 Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-21 21:43:29 -08:00
Nong Li	04b501d3a1	[CDH5] Collect metadata for cached blocks. Change-Id: I81026de2f9a08553dc15e07090b8297120aa7462 (cherry picked from commit 69414f67b20016e49b739a46d6e2b4b57e1d1a3c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1252 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-15 15:12:20 -08:00
Nong Li	53d7bbb97a	[CDH5] Impala changes for updated thirdparty components. Changes include: - version changes in impala-config - version changes in various loading scripts - hbase jars are no longer in hive/lib - mini-llama script changes - updates due to sentry api changes - JDBC tests disabled - unsupported types tests disabled. Change-Id: If8cf1b7ad8e22aa4d23094b9a4b1047f7e9d93ee	2014-01-15 15:12:13 -08:00
Alex Behm	c70905628b	Using MiniLlama's --write-hdfs-conf to dump the MiniDfs conf for our test setup. Change-Id: I238f375bda4ef95fa3d5ae9a29bd1dfc2aa3e401	2014-01-15 15:12:06 -08:00
Alex Behm	760750af27	Enforcing reserved memory resources via mem limits. Fixed codepath with rm disabled. Set enable_rm to false by default. Change-Id: I3bf2d0525d91243ec3c0ea048b0c03680befcda2 Conflicts: be/src/runtime/runtime-state.cc	2014-01-15 15:12:05 -08:00
Alex Behm	dc7b398bd3	Impala reserves resources from YARN via LLama. Impala reserves resources from YARN via Llama and handles resources preemptions by cancelling affected queries. Adds the Impala Resource Broker for interacting with Llama. Refactors scheduler and coordinator to move fragment-to-host assignment logic into scheduler. Local test setup uses MiniLLama. Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1	2014-01-15 15:12:04 -08:00
Alex Behm	fc6ecd39e5	[CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting. Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0	2014-01-15 15:11:32 -08:00
Alex Behm	60003ad211	[CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes. Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106	2014-01-15 15:11:23 -08:00
Skye Wanderman-Milne	561da008c7	IMPALA-729: fix resource management in Parquet scanner for multiple row groups We weren't attaching resources to the row batch when starting a new row group, so it was possible for string data to be overwritten. This patch removes CloseStreams() and merges its functionality with AttachCompletedResources() so it's not possible to destroy streams without transferring the resources first. It also merges and removes ScannerContext::Close(). Also adds test cases for IMPALA-720. Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb	2014-01-08 10:56:26 -08:00
Lenni Kuff	fbe79fc47b	Use separate log files for each of our mini-cluster services Also adds a bit more logging on which individual services are starting. Change-Id: I53f12e1825fbf738e2fb8325874c3126e55f3f44 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1147 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:54:37 -08:00
Alex Behm	c6397ca1e3	Revert "Revert to FROM-clause order if any table is lacking stats." This reverts commit 7e84cbe3bab9bf30a57ac58d9ef525ebc10a7b7a. Change-Id: I89d55ca2bcb8eb6eddc244d3e7b005074d04c26a Reviewed-on: http://gerrit.ent.cloudera.com:8080/1104 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:29 -08:00
Alex Behm	df0b28d163	Revert to FROM-clause order if any table is lacking stats. Change-Id: I7d09c0f393e2bfeefa386845fc6bbba4ab6c8812 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1095 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:28 -08:00
Skye Wanderman-Milne	9e17042185	Allow zero bit width dict/RLE decoders. This allows us to read single-value dictionary-encoded columns generated by parquet-mr. Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	de531e15bd	IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8 parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Lenni Kuff	e63cc59a94	Add partitioned tpcds planner tests (SQL-92 style joins) Adds the TPCDS queries as planner tests and fixes a few small issues with the Planner test file parser. This adds the TPC-DS queries using SQL-92 style joins that have a hand optimized (although not perfect) join order. Change-Id: I2d81e66af740b2d826b8ebd0c5ba8553b5faf0a2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1019 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:26 -08:00
Skye Wanderman-Milne	acdc792355	IMPALA-695: Use the local path of Hive UDF jars in the FE. The FE was creating class loaders with the HDFS locations of Hive UDF libs, rather than the local locations created by the BE. Our tests still passed since we only used UDFs already on the classpath (e.g. Hive builtins). Change-Id: Idbe9c98ad6adb84b70cb44efbf9ad0afc53366ca Reviewed-on: http://gerrit.ent.cloudera.com:8080/1081 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Skye Wanderman-Milne	b54d16dabd	IMPALA-679: Append hash of HDFS path to filename in CopyHdfsFile() to avoid collisions. Change-Id: Ia84fa81fe043a9604248d66ed963ef3f91b0601e Reviewed-on: http://gerrit.ent.cloudera.com:8080/1018 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:22 -08:00
Lenni Kuff	0bae3978c9	Update compute-stats.py to execute using Impala Updates our compute stats script to execute using Impala. This allows us to easily compute stats on all tables in a database or all tables in the metastore. The updated stats caused one of the TPCH plans to change so this also updates the TPCH planner test results. Change-Id: I17e5dcd1036a35e40eb4eb2c8e4a20702db9049c Reviewed-on: http://gerrit.ent.cloudera.com:8080/1024 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Lenni Kuff	76fa3b2ded	Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax This change updates our DDL syntax support to allow for using 'STORED AS PARQUET' as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax, but continue to support the old. I made the same change for 'AVROFILE', but since we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax. Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:18 -08:00
Nong Li	c1a64d6863	Add kill-mini-llama to CDH4 branch. This makes it easier to switch between our branches and a no-op if for those of us staying on CDH4. Change-Id: Ic07eb8a7ba7e48db118c06c221aabe5e124f3bfb Reviewed-on: http://gerrit.ent.cloudera.com:8080/1033 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:54:17 -08:00
ishaan	fcdcf1a9d8	Parallelize data loaded through Impala to speed up data loading. Currently, we execute all the queries involved in data loading serially. This change creates a separate .sql file for each file format, compression codec and compression scheme combination, and executes all the files in parallel. Additionally, we now store all the .sql files (independent of workload) in $IMPALA_HOME/data_load_files/<dataset_name>. Note that only data loaded through Impala is parallelized, data loaded through hive and hbase remains serial. On our build machines, the time taken to load all the data from snapshot was on the order of 15 minutes. Change-Id: If8a862c43f0e75b506ca05d83eacdc05621cbbf8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/804 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:53 -08:00
Lenni Kuff	498c2529d4	Test CR: Change spacing in run-all.sh Change-Id: I2362799213a7faca3892e38fb874bfbbd0c1718f Reviewed-on: http://gerrit.ent.cloudera.com:8080/803 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:50 -08:00
Lenni Kuff	8b2acf5c22	IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables With this change we now detect if a table is read-only and disable INSERT/LOAD operations on these tables. A table is read-only if Impala does not have write permission on the HDFS base directory of the table or any one of the partition directories (if the table is partitioned). Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/713 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:37 -08:00
Alex Behm	51e914e911	Use hive-exec instead of hive-builtin because hive-builtin does not exist in CDH5 Hive. Change-Id: I11993c7eebc9f5f07f112810d7e81d07ce157193 Reviewed-on: http://gerrit.ent.cloudera.com:8080/715 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:53:33 -08:00

1 2 3 4

194 Commits