impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Tim Armstrong	4b5ad8cbfd	Reduce log output for postgres db operations Various test scripts operating on postgres databases output unhelpful log messages, including "ERROR" messages that aren't actual errors when trying to drop a database that doesn't exist. Send useless output to /dev/null and consistently use \|\| true to ignore errors from dropdb. Change-Id: I95f123a8e8cc083bf4eb81fe1199be74a64180f5 Reviewed-on: http://gerrit.cloudera.org:8080/1753 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-13 03:58:50 +00:00
Casey Ching	e2bfb6ae2f	Misc improvements to shell scripts about error reporting Changes: 1) Consistently use "set -euo pipefail". 2) When an error happens, print the file and line. 3) Consolidated some of the kill scripts. 4) Added better error messages to the load data script. 5) Changed use of #!/bin/sh to bash. Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1 Reviewed-on: http://gerrit.cloudera.org:8080/1620 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 18:25:27 +00:00
casey	c56ba5149c	Infra scripts: Only attempt to kill processes owned by the current user This is for compatibility with docker containers. Before this patch, when the scripts were run on the docker host, the scripts would try to kill the mini-cluster in the docker containers and fail because they didn't have permissions (the user is different). Now the scripts will only try to kill mini-cluster processes that were started by the current user. Also some psutil availability checks were removed because psutil is now provided by the python virtualenv. Change-Id: Ida371797bbaffd0a3bd84ab353cb9f466ca510fd Reviewed-on: http://gerrit.cloudera.org:8080/1541 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 12:08:33 +00:00
Vlad Berindei	b6c20b2a40	Allow Impala to run against local filesystem. Allow Impala to start only with a running HMS (and no additional services like HDFS, HBase, Hive, YARN) and use the local file system. Skip all tests that need these services, use HDFS caching or assume that multiple impalads are running. To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has permissions since this is the location where the test data will be extracted. Test coverage (with core strategy) in comparison with HDFS and S3: HDFS 1348 tests passed S3 1157 tests passed Local Filesystem 1161 tests passed Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03 Reviewed-on: http://gerrit.cloudera.org:8080/1352 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins Readability: Alex Behm <alex.behm@cloudera.com>	2015-12-05 06:48:32 +00:00
Taras Bobrovytsky	22df1fe1ca	Random nested schema and data generation Change-Id: Ie89f140ed389cd877a84ffe2df892853ac9897f2 Reviewed-on: http://gerrit.cloudera.org:8080/1167 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-11-14 05:19:32 +00:00
Sailesh Mukil	277a92a14a	IMPALA-2479: Failure in TestParquet.test_verify_runtime_profile The test_verify_runtime_profile test failed during C5.5 builds and GVMs because this test relies on the table lineitem_multiblock to have 3 blocks. However, due to the rules to load the data not being followed in the functional_schema_template.sql file, the table ended up being stored with only one block. This change moves the data load to the end of create-load-data.sh file which would load the data even for snapshots. Change-Id: I78030dd390d2453230c4b7b581ae33004dbf71be Reviewed-on: http://gerrit.cloudera.org:8080/1153 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2015-10-08 15:16:35 -07:00
ishaan	1beb8cc36d	Increase Hive's heap size while writing nested tpch. Recently, the full data load started failing because Hive ran out of heap space while writing the nested tpch tables. This patch simply bumps up the heap space, and the query is now successfull. Change-Id: I92d0029659c41417d76a15f703df1d42e5187d5e Reviewed-on: http://gerrit.cloudera.org:8080/776 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-09 10:32:06 +00:00
Dan Hecht	4457eb2df3	IMPALA-2278: fix generate-schema-statements-py --force-reload for text/lzo The combination of --force and text/lzo was broken if the partition directories already contained data. For reasons explained in the comments, the ALTER TABLE ADD PARTITION step is skipped in this case, which causes HIVE to not do a full overwrite with INSERT OVERWRITE. Fix it by manually removing the directories. Testing: Verified the following combinations of load-data.py for text/lzo now work: {--force, ""} x {no partition dirs, partition dirs with files} Change-Id: I3ee34c4d85c58644345eadd8fc0976665c1bbaf5 Reviewed-on: http://gerrit.cloudera.org:8080/752 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-09-05 19:20:22 +00:00
ishaan	4b007666eb	IMPALA-2302: Use a permitted value for parquet.block.size while loading nested tpch. Due to a possible change in behaviour in Hive/MR, it is no longer possible to use arbitrarily large values for parquet.block.size. This breaks the loading of nested tpch data on newer Hive. This patch addresses the problem by using a permissble value. Change-Id: Ib5b14651fb579cec6aa8d45bd2253cecb4346eb9 Reviewed-on: http://gerrit.cloudera.org:8080/755 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-09-05 02:05:11 +00:00
Alex Behm	9d46853fbc	Nested Types: Check un/supported file formats for complex types. Before this patch, we used to accept any query referencing complex types, regardless of the table/partition's file format being scanned. We would ultimately hit a DCHECK in the BE when attempting to scan complex types of a table/partition with an unsupported format. This patch makes queries fail gracefully during planning if a scan would access a table/partition in a format for which we do not support complex types. For mixed-format partitioned Hdfs tables we perform this check at the partition granularity, so such a table can be scanned as long as only partitions with supported formats are accessed. HBase tables with complex-typed columns can be scanned as long as no complex-typed columns are accessed in the query. Change-Id: I2fd2e386c9755faf2cfe326541698a7094fa0ffc Reviewed-on: http://gerrit.cloudera.org:8080/705 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-01 03:26:53 +00:00
Taras Bobrovytsky	b8b7930377	Add nested types support to Create Table Like File Add support for creating a table based on a parquet file which contains arrays, structs and/or maps. Change-Id: I56259d53a3d9b82f318228e864c783b48a03f9ae Reviewed-on: http://gerrit.cloudera.org:8080/582 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-08-22 01:46:26 +00:00
Taras Bobrovytsky	3c9ceb1a2b	Add Parquet nested schemas to testdata A script is added that generates two parquet files with nested data. One file has modern nested types encoding and the other one has legacy encoding. This data will be used for testing nested types support for "create table like file" statement. Change-Id: I8a4f64c9f7b3228583f3cb0af5507a9dd4d152ef Reviewed-on: http://gerrit.cloudera.org:8080/610 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-13 10:25:39 +00:00
Casey Ching	d202d6a967	Use "impala-python" (virtualenv) instead of system python Python tests and infra scripts will now use "python" from the virtualenv via $IMPALA_HOME/bin/impala-python. Some scripts could be simplified now that python 2.6 and a dependable set of third-party libraries are available but that is not done as part of this commit. Change-Id: If1cf96898d6350e78ea107b9026b12ba63a4162f Reviewed-on: http://gerrit.cloudera.org:8080/603 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-06 02:09:09 +00:00
Alex Behm	c908ba1b7e	IMPALA-1136: Support loading Avro tables without an explicit Avro schema Hive allows creating Avro tables without an explicit Avro schema since 0.14.0. For such tables, the Avro schema is inferred from the column definitions, and not stored in the metadata at all (no Avro schema literal or Avro schema file). This patch adds support for loading the metadata of such tables, although Impala currently cannot create such tables (expect a follow-on patch). Change-Id: I9e66921ffbeff7ce6db9619bcfb30278b571cd95 Reviewed-on: http://gerrit.cloudera.org:8080/538 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-31 12:13:37 +00:00
Alex Behm	1b6f14ab16	Nested Types: Compute stats for the nested TPCH database. Change-Id: I7b2b77de1a9c25c2a5d9849b62437a58a18bdaae Reviewed-on: http://gerrit.cloudera.org:8080/506 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-07-21 18:48:17 +00:00
Taras Bobrovytsky	704e3fa6bf	Add loading by partitions option to the loaded_nested script When loading a large nested table using the GROUP_CONCAT function, Impala runs out of memory. We prevent this from happening by adding an option to partition the table and load one partition at a time. Change-Id: I8d517f94ef97e98d36eb8ebc8180865023655114 Reviewed-on: http://gerrit.cloudera.org:8080/448 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-07-02 03:34:53 +00:00
ishaan	377214c469	Use Isilon as the default file system when running Isilon tests. This patch enables running Impala tests against Isilon as the default file system. The intention is to run tests against a realistic deployment, i.e, Isilon replacing HDFS as the underlying filesystem. Specifically, it does the following: - Adds a new environment variable DEFAULT_FS, which points to HDFS by default. - Makes the fs.defaultFs property in core-site.xml use the DEFAULT_FS environment variable, such that all clients talk to Isilon implicitly. - Unset FILESYSTEM_PREFIX when the TARGET_FILESYSTEM is Isilon, since path prefixes are no longer needed. - Only starts the Hive Metastore and the Impala service stack when running tests against Isilon. We don't start KMS/HBase because they're not relevant to Isilon. We also don't start YARN, Hive and LLama because hive queries are disabled with Isilon. The scripts that start/stop Hive, YARN and Llama should be modified to point to a filesystem other than HDFS in the future. Change-Id: Id66bfb160fe57f66a64a089b465b536c6c514b63 Reviewed-on: http://gerrit.cloudera.org:8080/449 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-06-11 01:23:11 +00:00
Casey Ching	060f08ef69	Add tpch_nested_parquet database The database will be used for testing in the future. Change-Id: I60b54b36db9493a5bea308151b4027cd47d73047 Reviewed-on: http://gerrit.cloudera.org:8080/400 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-06-04 21:18:36 +00:00
ishaan	dbc78aaa2c	Enable isilon end to end tests for Impala. This patch introduces changes to run tests against Isilon, combined with minor cleanup of the test and client code. For Isilon, it: - Populates the SkipIfIsilon class with appropriate pytest markers. - Introduces a new default for the hdfs client in order to connect to Isilon. - Cleans up a few test files take the underlying filesystem into account. - Cleans up the interface for metadata/test_insert_behaviour, query_test/test_ddl On the client side, we introduce a wrapper around a few pywebhdfs's methods, specifically: - delete_file_dir does not throw an error if the file does not exist. - get_file_dir_status automatically strips the leading '/' Change-Id: Ic630886e253e43b2daaf5adc8dedc0a271b0391f Reviewed-on: http://gerrit.cloudera.org:8080/370 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-27 22:25:12 +00:00
Alex Behm	1bd3eca22f	Quietly resolve dependencies in Jenkins runs to avoid log spew. Change-Id: If38a683785f3c6c9d92f762a2dfd86f009ce9d84 Reviewed-on: http://gerrit.cloudera.org:8080/392 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-19 09:12:43 +00:00
Alex Behm	013d6f968f	Clean up FE pom.xml to eliminate console spew. This patch makes the following changes in our pom to reduce the build time and signficantly reduce console spew. 1. Remove jar-with-dependencies from package goal. We have no need for creating an uber jar that contains the FE as well as all its dependencies. Locally, we carefully construct our class path manually (relying on copy-dependencies), and in Impala deployments the FE jar is put together with the other dependencies, so the FE jar does not need to be self-contained. 2. Silence copy-dependencies. Changes the configuration of the maven-dependency-plugin to not log every copied file to the console. Change-Id: If351e4e800fd1ca1108f9a0f4d88f52a53fc211c Reviewed-on: http://gerrit.cloudera.org:8080/378 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-05-18 07:20:07 +00:00
ishaan	058978dccb	Enable using isilon as the underlying filesystem. This patch enables the Impala test suite to run the end to end tests against an isilon namenode. There are a few caveats: - The fe test will currently not work. - Only loading data from both the test-warehouse snapshot and the metadata snapshot is supported. - The test suite cannot be run by multiple people (unless we have access to multiple isilon namenodes) Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237 Reviewed-on: http://gerrit.cloudera.org:8080/356 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-05-12 01:28:19 +00:00
Casey Ching	6f1ce232f4	Use java from JAVA_HOME Various build and test machines have multiple versions of java installed and relying on the default "java" command being compatible isn't practical (a machine may also build an older version of Impala that might require a different java version). Since JAVA_HOME is already required that can/should be used to determine which java binary to use. This also includes a minor change to replace a block of code that was using 4-space indent. Instead of using 2-space indent, that block was replaced with one line. Change-Id: I4b8698b2aa5411b5fa6c5bc06291625999478955 Reviewed-on: http://gerrit.cloudera.org:8080/310 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-04-03 00:13:22 +00:00
ishaan	73d7ab11e1	Compute stats for tpch parquet tables while loading the data. This patch removes the logic from the python test file, it should really live in the code that sets up the test-warehouse. Change-Id: Id04dc90c7ab813af2f347ec79e9e43d76de794a2 Reviewed-on: http://gerrit.cloudera.org:8080/224 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-03-12 17:49:55 -07:00
Dan Hecht	2916132283	S3: enable more tests for S3 As needed, fix up file paths and other misc things to get more test cases running against S3. Change-Id: If4eaf9200f2abd17074080a37cd0225d977200ad Reviewed-on: http://gerrit.cloudera.org:8080/167 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 16:39:39 -07:00
ishaan	4a9adfd685	Fix the full data load build by not hardcoding the lzo index file's name. After the hive/hdfs rebase, the indexed lzo file names changed. This patch uses a wildcard rather than a specific file name to protect against such changes. It's safe because the test simply expects a partition that does not have index files. Change-Id: I6d32609b62df83fe2a8ef935d7ca6506ecff5e0d Reviewed-on: http://gerrit.cloudera.org:8080/150 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Internal Jenkins	2015-03-05 09:52:34 +00:00
ishaan	21d24f5295	Infrastructure changes to enable the hive version change from 0.13.1 to 1.1.0 Specifically: - Hive needs some jars from hadoop/tools/lib - Hive has an dependency on apache.snapshots ( added in fe/pom.xml ) - Beeline has to explicitly told not to use jline. Change-Id: Id38956b748f8f667a39505c92355f0298f308718 Conflicts: testdata/bin/load-hive-builtins.sh	2015-02-23 20:27:13 -08:00
Matthew Jacobs	835d6dbef4	IMPALA-1209: Add KMS service to testdata cluster (pt1) First change for IMPALA-1209 to address Impala limitations when using HDFS encryption. This adds a KMS process to the testdata cluster. This was tested manually by creating a key and an encryption zone. Change-Id: I499154506386f04e71c5371b128c10868b1e1318 Reviewed-on: http://gerrit.cloudera.org:8080/41 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-02-13 20:46:14 +00:00
ishaan	b01252267a	Fix the hive environment breakage caused by a malformed environment variable. We build some of the jars that hive needs fe/target/. A recent change resulted in these jars not being loaded, causing a bad hive environment. This patch restores proper behaviour. Change-Id: Icb27ab04f7f77cb4ddab51326eedfd11a6cdf960 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5930 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2015-02-04 02:51:00 -08:00
ishaan	2386fb84a8	Enable the data loading infrastructure to switch the underlying file system. This patch enables loading data to s3 instead of hdfs. It is preliminary in nature, as such, there are a few caveats: - The fe tests do not work. - Only loading from a test-warehouse snapshot and metastore snapshot is enabled. - Until hive works with s3, only a subset of all the tests will work. Change-Id: Ia66a5f836b4245e3b022a49de805eec337a51324 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5851 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2015-02-03 01:02:42 -08:00
ishaan	5ac46af786	Fix the full data load path by explicitly creating the test-warehouse directory in hdfs. Previously, when we started all the services, we created an HBase table from hive to avoid a replication bug. This had the side-effect of creating a test-warehouse directory in hdfs. After that check was removed, we no longer create the test-warehouse, causing the full-data-load build to fail. Change-Id: I75479562d33e08c79ad155c615cecb5b91c0eab6 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5904 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2015-02-03 00:51:49 -08:00
Alex Behm	762cae3fb9	Remove Hive-HBase warmup script because the original CDH-17414 issue has been resolved. At the time when CDH-17414 was filed, the issue could be reproduced very reliably. The issue seems to have been fixed, so our crufty workaround is no longer needed. Change-Id: Ib31ac8f862ab2d06ebfc8656ce49b1b43fe301e8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5892 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2015-01-30 19:55:54 -08:00
ishaan	07efc0cb17	Add the ability to only reload the metastore snapshot in buildall and misc. changes. This commit adds the ability to only load the metastore snapshot, with the assumption that the hdfs data is already loaded. It also additionally adds the ability to specify some buildall parameters via the environment. Change-Id: I4a07d4cf3a63479c377d4be79c4a2140c2a52fb8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5665 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2015-01-09 12:40:06 -08:00
ishaan	dee6911b20	Enable loading metadata from the hive metastore snapshot and cleanup build scripts. This patch contains the following changes: - Add a metastore_snapshot_file parameter to build.sh - Enable skipping loading the metadata. - create-load-data.sh is refactored into functions. - A lot of scripts source impala-config, which creates a lot of log spew. This has now been muted. - Unecessary log spew from compute-table-stats has been muted. - build_thirdparty.sh determins its parallelism from the system, it was previously hard coded to 4 - Only force load data of the particular dataset if a schema change is detected. Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-12-19 13:41:00 -08:00
ishaan	0ff8c99068	Fix generate-schema-statements to account for changed formatting in the hdfs client. Change-Id: I4af5863bc0dd6660aef65e0e9b498002fc45edb8	2014-12-16 11:28:08 -08:00
ishaan	09b97f3881	Add the ability to load a metastore snapshot file. This patch includes the following changes: - Modifies buildall to accept a hive metastore snapshot file as an argument. - Adds a script to load the hive metastore snapshot. Change-Id: I7b9fc5b0643afe62fd4739a81eaa3bf9af1630da Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5510 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-12-08 18:16:45 -08:00
casey	4915ea4ac9	IMPALA-1134: Use copyBytes() to get value from o.a.h.io.Text This affects java UDFs. Previously it was possible that the length of the string returned from a java udf didn't match the actual data. Per the Text.getBytes() documentation "... only data up to getLength() is valid.". Impala just needs to use copyBytes() which is a convenience function for this situation. The same should be done for BytesWritable. Before: Query: select length(echo('12345678901234567890')) +-------------------------------------------+ \| length(java.echo('12345678901234567890')) \| +-------------------------------------------+ \| 22 \| +-------------------------------------------+ After: Query: select length(echo('12345678901234567890')) +-------------------------------------------------+ \| length(functional.echo('12345678901234567890')) \| +-------------------------------------------------+ \| 20 \| +-------------------------------------------------+ Change-Id: If9671278df8abf7529d3bc470c5f9d037ac3da1b Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4897 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins	2014-11-17 15:02:24 -08:00
Martin Grund	f58159d431	[CDH5] IMPALA-1141: HBase Planner Performance This patch improves the performance of the planning phase of a query querying HBase tables. It removes an unnecessary second call to compute stats and adds a new version for estimating the row count in a table. This patch adds an incremental version to estimate the number of rows for a set of regions. This incremental version will start querying up to five regions to calculate the average row size and use this value to estimate the row count based on the size of the regions on disk. Only if the standard deviation from the average is larger than 15% query an additional region, it will query additional regions to calculate an average with more confidence. If the data is balanced it will not be necessary to retrieve data from all regions but only from a subset. In the worst case, all regions are queried. Change-Id: Idcb3bea81b11cb08da6d9329ba66c86aca23e170 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5258 Tested-by: jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2014-11-14 13:47:02 -08:00
Skye Wanderman-Milne	4a722980e5	IMPALA-1401: raise MAX_PAGE_HEADER_SIZE and use scanner context to stitch together header buffer Change-Id: I4f33b90e845e9bef1ac929bf4ebb8e98eaff985c Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4961 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins (cherry picked from commit c3a90183b2f03434a9604f3aa2ef6dd08c9ba97c) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4981 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-10-27 16:30:56 -07:00
Lenni Kuff	758ba08bbb	Silence most of data loading spew by redirecting it to log files Change-Id: I256a3970ce52bbcac816178029f703095fec388f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4610 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-10-06 15:09:42 -07:00
Victor Bittorf	a3767c9f2b	Fix data loading to unblock gvm Change-Id: I5e145f1e8497d340cb72a8112c247e63b1c79362 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4537 Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: Victor Bittorf <victor.bittorf@cloudera.com>	2014-09-26 12:26:37 -07:00
Victor Bittorf	af4b2086dc	Char PARQUET, AVRO, and TEXT tests Adds fixes and tests for Hive CHAR & VARCHAR compatibility. Also fixes a bug in tuple materialization for VARCHAR and non in-lined CHAR. Change-Id: I400b089cb8ddba2e264ef9f2e37956b2ceaaf9fb Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4054 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-09-26 12:24:07 -07:00
Henry Robinson	6bc411c890	Add support for HS2 protocol V6 This patch adds support for V6 of the HS2 protocol, which notably includes columnar organisation of result sets. Clients that set their protocol version to < V6 will receive result sets in the traditional row orientation. The performance of fetches over HS2 goes up significantly as a result, since the V1 protocol had some pathologies in its deserialisation performance. Beeswax Row materialisation: 455ms, client processing time: 523ms HS2 V6: Row materialisation: 444ms, client processing time: 1.8s HS2 V1: Row materialisation: 585ms, client processing time: 15.9s (!) TODO: Add support for the CHAR datatype The following patch is also included: Fix wait-for-hiveserver2.py when Impala moves to HS2 V6 Due to HIVE-6050, older versions of Hive are not compatible with newer clients (even those that try to use old protocol versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2 service, but picks up the newer version from V6, and fails. This patch temporarily re-adds cli_service.thrift (renaming the Thrift service as LegacyTCLIService) only for wait-for-hiveserver2.py to use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid of this change. Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-09-18 20:17:18 -07:00
Mike Yoder	d1e83f8280	Support for simultaneous LDAP and Kerberos authentication. Prior to this work, the impalad could either authenticate with Kerberos, or authenticate with LDAP. This fixes that so that both can co-exist in the same daemon. Prior code had both a KerberosAuthProvider and an LdapAuthProvider; this is refactored into a single SaslAuthProvider that potentially contains both LDAP and Kerberos. The terminology of "client facing" and "server facing" has been replaced with "external" and "internal". External is for clients like the impala shell, odbc, jdbc, etc. Internal is for daemon <-> daemon communication. The notion of the "auxprop" plugin is removed, as that was dead code. The Thrift code is enhanced to pass the Realm information from the SaslAuthProvider down to the underlying SASL library. Change-Id: I0a0b968a107c0b25610ca37295c3fee345ecdd6d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4051 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-18 12:54:45 -07:00
Lenni Kuff	4ecaf036ba	[CDH5] Updates to support running Sentry Service via its service start scripts in /thirdparty We previously had a wrapper script that started Sentry Service up in our test environment. This ran in to some issue with the upgrade to Sentry v1.4 due to classpath conflicts with other components. The fix is to add sentry to /thirdparty and use Sentry's startup scripts to actually run the service. This is also a more realistic test environment. The actual addition of Sentry to /thirdparty is not included in this change. Change-Id: I4c5998cde4fc900b8a34037550459265298da4c4	2014-09-12 22:48:51 -07:00
Lenni Kuff	4d44dac8cf	[CDH5] Fix Zookeeper classpath issue that caused failures in wait-for-hbase-master.py Zookeeper was not executing properly in wait-for-hbase-master due to issue with the classpath that is generated by set-classpath.sh. To fix this problem, set the classpath using "hadoop classpath". This allows us to properly wait for hbase-master to come online. Change-Id: Ic39be1f5ab7997e74092471042ca88f335115be0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4255 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-09-10 17:47:07 -07:00
Mike Yoder	75a97d3d7e	[CDH5] Kerberize mini-cluster and Impala daemons This is the first iteration of a kerberized development environment. All the daemons start and use kerberos, with the sole exception of the hive metastore. This is sufficient to test impala authentication. When buildall.sh is run using '-kerberize', it will stop before loading data or attempting to run tests. Loading data into the cluster is known to not work at this time, the root causes being that Beeline -> HiveServer2 -> MapReduce throws errors, and Beeline -> HiveServer2 -> HBase has problems. These are left for later work. However, the impala daemons will happily authenticate using kerberos both from clients (like the impala shell) and amongst each other. This means that if you can get data into the mini-cluster, you could query it. Usage: * Supply a '-kerberize' option to buildall.sh, or * Supply a '-kerberize' option to create-test-configuration.sh, then 'run-all.sh -format', re-source impala-config.sh, and then start impala daemons as usual. You must reformat the cluster because kerberizing it will change all the ownership of all files in HDFS. Notable changes: * Added clean start/stop script for the llama-minikdc * Creation of Kerberized HDFS - namenode and datanodes * Kerberized HBase (and Zookeeper) * Kerberized Hive (minus the MetaStore) * Kerberized Impala * Loading of data very nearly working Still to go: * Kerberize the MetaStore * Get data loading working * Run all tests * The unknown unknowns * Extensive testing Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-05 12:36:21 -07:00
Dan Hecht	af956e24ef	IMPALA-1143: Tests don't succeed when just running buildall.sh. There are at least two problems: 1) generate-schema-statements.py wasn't putting a newline on the very last insert stmt, and beeline apparently was then ignoring it. 2) If HBaseTestDataRegionAssignment fails, then we reload a couple tables, but were not recomputing stats for those tables. And some query-tests expect those tables to have stats. Tesing: Ran the following commands and see that the tables are now not-empty and include stats: $IMPALA_HOME/bin/load-data.py -w functional-query \ --table_names=alltypesagg,alltypessmall --table_formats=hbase/none --force $IMPALA_HOME/tests/util/compute_table_stats.py --db_names=functional_hbase \ --table_names=alltypesagg,alltypessmall Change-Id: I5183e037d0f5499c81b79f2cc1060b71be2d4873 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3794 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins (cherry picked from commit 306b87b37edbf10fa4b89ed2206484e158cc8e0d) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3802 Reviewed-by: Daniel Hecht <dhecht@cloudera.com>	2014-08-12 01:21:14 -07:00
Lenni Kuff	286e312460	[CDH5] Minor code changes for Hive .13 support Changes include: * Fix compile errors due to new column stats API and other stats related fixes. * Temporarily disable JDBC tests due to new serialization format in Hive .13 * Disable view compatibility tests until we can get them to work in Hive .13 * Test fixes due to Hive's type checking for partition column values Change-Id: I05cc6a95976e0e037be79d91bc330a06d2fdc46c	2014-08-11 09:53:02 -07:00
Ippokratis Pandis	7701c651a4	Reading compressed text. Adding the ability to read compressed text. Reading the compression type from the file descriptors. Trying to homogenize a bit more the interface of the scanners. Removing the LZO_TEXT file format, since it was not actually a file format. Modifying the tests to load and test also text/{snap,gzip,bzip} databases. Note that this patch requires some changes to Impala-lzo as well. Change-Id: Ic0742ba11f106ba545050bdb71795efbff70ef74 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3549 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ippokratis Pandis <ipandis@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3651 Tested-by: jenkins	2014-07-29 20:28:36 -07:00

1 2 3 4 5

244 Commits