impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Riza Suminto	00dc79adf6	IMPALA-13907: Remove reference to create_beeswax_client This patch replace create_beeswax_client() reference to create_hs2_client() or vector-based client creation to prepare towards hs2 test migration. test_session_expiration_with_queued_query is changed to use impala.dbapi directly from Impyla due to limitation in ImpylaHS2Connection. TestAdmissionControllerRawHS2 is migrated to use hs2 as default test protocol. Modify test_query_expiration.py to set query option through client instead of SET query. test_query_expiration is slightly modified due to behavior difference in hs2 ImpylaHS2Connection. Remove remaining reference to BeeswaxConnection.QueryState. Fixed a bug in ImpylaHS2Connection.wait_for_finished_timeout(). Fix some easy flake8 issues caught thorugh this command: git show HEAD --name-only \| grep '^tests.*py' \ \| xargs -I {} impala-flake8 {} \ \| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F... Testing: - Pass exhaustive tests. Change-Id: I1d84251835d458cc87fb8fedfc20ee15aae18d51 Reviewed-on: http://gerrit.cloudera.org:8080/22700 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-29 18:37:45 +00:00
Michael Smith	5e694568d5	IMPALA-11966: Enable cache_ozone_file_handles by default Updates Ozone dependency to 1.3.0 to address HDDS-7135 and enables cache_ozone_file_handles by default for a ~10% improvement on TPC-DS query time. Updates the Ozone CDP dependency for HDDS-8095. Fix for it will be available in Ozone 1.4.0, so testing with TDE currently requires the CDP build. Testing: - ran backend, e2e, and custom cluster test suites with Ozone Change-Id: Icc66551f9b87af785a1c30b516ac39f4640638fe Reviewed-on: http://gerrit.cloudera.org:8080/19573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-24 07:59:21 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
yacai	c953426692	IMPALA-11683: Support Aliyun OSS File System This patch adds support for OSS (Aliyun Object Storage Service). Using the hadoop-aliyun, the implementation is similar to other remote FileSystems. Tests: - Prepare: Initialize OSS-related environment variables: OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT. Compile and create hdfs test data on a ECS instance. Upload test data to an OSS bucket. - Modify all locations in HMS DB to point to the OSS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: I267e6531da58e3ac97029fea4c5e075724587910 Reviewed-on: http://gerrit.cloudera.org:8080/19165 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-11-16 10:14:49 +00:00
Michael Smith	9b4a3ef795	IMPALA-10214, IMPALA-10375: Ozone remote file handle caching Enables support for caching remote file handles for Ozone. Local file handles were already cached unintentionally, similar to HDFS. Updates file handle cache enablement to be more stringent about enabling caching. File handle caching is enabled if a max_cached_file_handles is non-zero and any of the following are true - HDFS file is local - HDFS file is remote and cache_remote_file_handles is enabled - Ozone file is local or remote and cache_ozone_file_handles is enabled - S3 file is remote and cache_s3_file_handles is enabled - ABFS file is remote and cache_abfs_file_handles is enabled Enables testing Ozone in test_hdfs_fd_caching, and adds tests that remote caching can be disabled using individual flags. Change-Id: I9df13208999c6d3b14f4c005a91ee2a92a05bdf9 Reviewed-on: http://gerrit.cloudera.org:8080/18853 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2022-08-31 19:01:27 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
stiga-huang	2dfc68d852	IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Support for caching GCS file handles will be addressed in IMPALA-10568. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on GCS (IMPALA-10562). - Some tests are skipped due to issues introduced by /etc/hosts setting on GCE instances (IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Reviewed-on: http://gerrit.cloudera.org:8080/17121 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-13 11:20:08 +00:00
Riza Suminto	490aff51b9	IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in the warm-up phase. There is a limit on concurrency in writing to the cache such that we may fail to cache data the first time read it. This patch fixes the test by repeating the warm-up query 5 times. This patch also add a proper start_args to the test so that each impalad will write their data cache file in their own directory. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Reviewed-on: http://gerrit.cloudera.org:8080/17054 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-17 05:57:20 +00:00
Riza Suminto	2644203d1c	IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Reviewed-on: http://gerrit.cloudera.org:8080/16963 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-25 16:05:00 +00:00
Sahil Takiar	3382759664	IMPALA-9485: Enable file handle cache for EC files This is essentially a revert of IMPALA-8178. HDFS-14308 added CanUnbuffer support to the EC input stream APIs in the HDFS client lib. This patch enables file handle caching for EC files. Testing: * Ran core tests against an EC build (ERASURE_CODING=true) Change-Id: Ieb455eeed02a229a4559d3972dfdac7df32cdb99 Reviewed-on: http://gerrit.cloudera.org:8080/16567 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-09 06:47:55 +00:00
Sahil Takiar	d09294a4a6	IMPALA-10202: Enable file handle cache for ABFS files Like IMPALA-8428, but for ABFS, instead of S3A. Adds support for adding ABFS file handles to the file handle cache. Support for ABFSInputStream unbuffer operations was added in HADOOP-16859. Ran a full table scan of a 1GB store_sales table on ABFS, made sure the file handles were cached (validated via the runtime profile); did this multiple times, against several different copies of the store_sales table, in order to increase the number of file handles cached by an impalad. Tested: * Tested against a ABFS storage account I have access to Change-Id: I64f12f832980f4e0207af78368402dd09e370fc3 Reviewed-on: http://gerrit.cloudera.org:8080/16532 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-02 05:34:13 +00:00
Sahil Takiar	7188ad32e6	IMPALA-8428: Add support for caching file handles on s3 This patch is based on work done by Joe McDonnell. This change adds support for cacheing file handles from S3. It add a new configuration flag 'cache_s3_file_handles' (set to true by default) which controls whether or not cacheing of S3 file handles is enabled. The S3 file handle cache is dependent on HADOOP-14747 (S3AInputStream to implement CanUnbuffer). HADOOP-14747 adds support for hdfsUnbufferFile to S3A streams. The call to unbuffer closes the underlying S3 object stream. Without this change the S3 file handle cache would quickly cause an impalad to crash because all S3 file handles in the cache would have a dangling HTTP(S) connection open to S3. Testing: * Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as remote HDFS * Ran core tests * Ran TPC-DS on a real cluster and validated that the S3 file handle cache works as expected * Ran several test queries on a real cluster with S3Guard enabled and validated that the S3 file handle cache works as expected Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19 Reviewed-on: http://gerrit.cloudera.org:8080/13221 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-13 22:12:26 +00:00
Sean Mackrory	7a022cf36a	IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support. HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the ADLS Gen2 service. It's in the hadoop-azure module as a replacement for WASB. Filesystem semantics should be the same, so skipped tests and other behavior changes have simply mirrored what is done for ADLS Gen1 by default. Tests skipped on ADLS Gen1 due to eventual consistency of the Python client can be run against ADLS Gen2. Change-Id: I5120b071760e7655e78902dce8483f8f54de445d Reviewed-on: http://gerrit.cloudera.org:8080/11630 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-20 06:43:00 +00:00
Tianyi Wang	21d92aacbf	IMPALA-7019: Schedule EC as remote & disable failed tests This patch schedules HDFS EC files without considering locality. Failed tests are disabled and a jenkins build should succeed with export ERASURE_COINDG=true. Testing: It passes core tests. Cherry-picks: not for 2.x. Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84 Reviewed-on: http://gerrit.cloudera.org:8080/10413 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-22 01:10:14 +00:00
Joe McDonnell	57dae5ec7e	IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Reviewed-on: http://gerrit.cloudera.org:8080/7640 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-23 04:44:21 +00:00
Joe McDonnell	53287df0a1	IMPALA-5488: Fix handling of exclusive HDFS file handles This change fixes three issues: 1. File handle caching is expected to be disabled for remote files (using exclusive HDFS file handles), however the file handles are still being cached. 2. The retry logic for exclusive file handles is broken, leading the number of open files to be incorrect. 3. There is no test coverage for disabling the file handle cache. To fix issue #1, when a scan range is requesting an exclusive file handle from the cache, it will always request a newly opened file handle. It also will destroy the file handle when the scan range is closed. To fix issue #2, exclusive file handles will no longer retry IOs. Since the exclusive file handle is always a fresh file handle, it will never have a bad file handle from the cache. This returns the logic to its state before IMPALA-4623 in these cases. If a file handle is borrowed from the cache, then the code will continue to retry once with a fresh handle. To fix issue #3, custom_cluster/test_hdfs_fd_caching.py now does both positive and negative tests for the file handle cache. It verifies that setting max_cached_file_handles to zero disables caching. It also verifies that caching is disabled on remote files. (This change will resolve IMPALA-5390.) Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861 Reviewed-on: http://gerrit.cloudera.org:8080/7181 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-21 09:42:34 +00:00
Joe McDonnell	5f9f704bde	IMPALA-5386: Fix ReopenCachedHdfsFileHandle failure case This fixes three issues with the file handle cache. The first issue is that ReopenCachedHdfsFileHandle can destroy the passed in file handle without removing the reference to it. The old file handle then refers to a piece of memory that is not a handle in the cache, so future use of the handle fails with an assert. The fix is to always overwrite the reference to the file handle when it has been destroyed. The second issue is that query_test/test_hdfs_fd_caching.py should run on anything that supports the hdfs commandline and tolerate query failure. It's logic is not specific to file handle caching, so it has been renamed to query_test/test_hdfs_file_mods.py. Finally, custom_cluster/test_hdfs_fd_caching.py should not be running on remote files (S3, ADLS, Isilon, remote clusters). The file handle cache semantics won't apply on those platforms. Change-Id: Iee982fa5e964f6c8969b2eb7e5f3eca89e793b3a Reviewed-on: http://gerrit.cloudera.org:8080/7020 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-09 01:45:37 +00:00
Joe McDonnell	effe973a58	IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Reviewed-on: http://gerrit.cloudera.org:8080/6478 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-27 03:04:38 +00:00
Sailesh Mukil	50bd015f2d	IMPALA-5333: Add support for Impala to work with ADLS This patch leverages the AdlFileSystem in Hadoop to allow Impala to talk to the Azure Data Lake Store. This patch has functional changes as well as adds test infrastructure for testing Impala over ADLS. We do not support ACLs on ADLS since the Hadoop ADLS connector does not integrate ADLS ACLs with Hadoop users/groups. For testing, we use the azure-data-lake-store-python client from Microsoft. This client seems to have some consistency issues. For example, a drop table through Impala will delete the files in ADLS, however, listing that directory through the python client immediately after the drop, will still show the files. This behavior is unexpected since ADLS claims to be strongly consistent. Some tests have been skipped due to this limitation with the tag SkipIfADLS.slow_client. Tracked by IMPALA-5335. The azure-data-lake-store-python client also only works on CentOS 6.6 and over, so the python dependencies for Azure will not be downloaded when the TARGET_FILESYSTEM is not "adls". While running ADLS tests, the expectation will be that it runs on a machine that is at least running CentOS 6.6. Note: This is only a test limitation, not a functional one. Clusters with older OSes like CentOS 6.4 will still work with ADLS. Added another dependency to bootstrap_build.sh for the ADLS Python client. Testing: Ran core tests with and without TARGET_FILESYSTEM as 'adls' to make sure that all tests pass and that nothing breaks. Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Reviewed-on: http://gerrit.cloudera.org:8080/6910 Tested-by: Impala Public Jenkins Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>	2017-05-25 19:35:24 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Anuj Phadke	a915293109	IMPALA-1850: Allow fs.defaultFS to be set to a non-HDFS filesystem This change whitelists the supported filesystems which can be set as Default FS for Impala to run on. This patch configures Impala to use S3 as the default filesystem, rather than a secondary filesystem as before. Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed Reviewed-on: http://gerrit.cloudera.org:8080/1121 Reviewed-by: anujphadke <aphadke@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:40 -07:00
Vlad Berindei	b6c20b2a40	Allow Impala to run against local filesystem. Allow Impala to start only with a running HMS (and no additional services like HDFS, HBase, Hive, YARN) and use the local file system. Skip all tests that need these services, use HDFS caching or assume that multiple impalads are running. To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has permissions since this is the location where the test data will be extracted. Test coverage (with core strategy) in comparison with HDFS and S3: HDFS 1348 tests passed S3 1157 tests passed Local Filesystem 1161 tests passed Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03 Reviewed-on: http://gerrit.cloudera.org:8080/1352 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins Readability: Alex Behm <alex.behm@cloudera.com>	2015-12-05 06:48:32 +00:00
Martin Grund	fa5eca09c8	Disable HDFS file handle caching by default This patch modifies the Impala command line flag: --max_cached_file_handles=VAL to disable caching of HDFS file handles if VAL is 0. In addition, it moves the existing functional tests to a custom cluster test and keeps a sanity check for no caching in the original place. Furthermore, it will check that no file handles are leaked. Change-Id: Ic36168bba52346674f57639e1ac216fd531b0fad Reviewed-on: http://gerrit.cloudera.org:8080/691 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-08-27 23:34:30 +00:00

27 Commits