This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch replace create_beeswax_client() reference to
create_hs2_client() or vector-based client creation to prepare towards
hs2 test migration.
test_session_expiration_with_queued_query is changed to use impala.dbapi
directly from Impyla due to limitation in ImpylaHS2Connection.
TestAdmissionControllerRawHS2 is migrated to use hs2 as default test
protocol.
Modify test_query_expiration.py to set query option through client
instead of SET query. test_query_expiration is slightly modified due to
behavior difference in hs2 ImpylaHS2Connection.
Remove remaining reference to BeeswaxConnection.QueryState.
Fixed a bug in ImpylaHS2Connection.wait_for_finished_timeout().
Fix some easy flake8 issues caught thorugh this command:
git show HEAD --name-only | grep '^tests.*py' \
| xargs -I {} impala-flake8 {} \
| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F...
Testing:
- Pass exhaustive tests.
Change-Id: I1d84251835d458cc87fb8fedfc20ee15aae18d51
Reviewed-on: http://gerrit.cloudera.org:8080/22700
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Updates Ozone dependency to 1.3.0 to address HDDS-7135 and enables
cache_ozone_file_handles by default for a ~10% improvement on TPC-DS
query time.
Updates the Ozone CDP dependency for HDDS-8095. Fix for it will be
available in Ozone 1.4.0, so testing with TDE currently requires the CDP
build.
Testing:
- ran backend, e2e, and custom cluster test suites with Ozone
Change-Id: Icc66551f9b87af785a1c30b516ac39f4640638fe
Reviewed-on: http://gerrit.cloudera.org:8080/19573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.
Python 2:
range(0,5) == [0,1,2,3,4]
True
Python 3:
range(0,5) == [0,1,2,3,4]
False
The fix is to wrap locations with list(). i.e.
Python 3:
list(range(0,5)) == [0,1,2,3,4]
True
Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).
Most of the changes were done via these futurize fixes:
- libfuturize.fixes.fix_xrange_with_import
- lib2to3.fixes.fix_map
- lib2to3.fixes.fix_filter
This eliminates the pylint warnings:
- xrange-builtin
- range-builtin-not-iterating
- map-builtin-not-iterating
- zip-builtin-not-iterating
- filter-builtin-not-iterating
- reduce-builtin
- deprecated-itertools-function
Testing:
- Ran core job
Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This patch adds support for OSS (Aliyun Object Storage Service).
Using the hadoop-aliyun, the implementation is similar to other
remote FileSystems.
Tests:
- Prepare:
Initialize OSS-related environment variables:
OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ACCESS_ENDPOINT.
Compile and create hdfs test data on a ECS instance. Upload test data
to an OSS bucket.
- Modify all locations in HMS DB to point to the OSS bucket.
Remove some hdfs caching params. Run CORE tests.
Change-Id: I267e6531da58e3ac97029fea4c5e075724587910
Reviewed-on: http://gerrit.cloudera.org:8080/19165
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Enables support for caching remote file handles for Ozone. Local file
handles were already cached unintentionally, similar to HDFS. Updates
file handle cache enablement to be more stringent about enabling
caching.
File handle caching is enabled if a max_cached_file_handles is non-zero
and any of the following are true
- HDFS file is local
- HDFS file is remote and cache_remote_file_handles is enabled
- Ozone file is local or remote and cache_ozone_file_handles is enabled
- S3 file is remote and cache_s3_file_handles is enabled
- ABFS file is remote and cache_abfs_file_handles is enabled
Enables testing Ozone in test_hdfs_fd_caching, and adds tests that
remote caching can be disabled using individual flags.
Change-Id: I9df13208999c6d3b14f4c005a91ee2a92a05bdf9
Reviewed-on: http://gerrit.cloudera.org:8080/18853
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.
Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.
Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.
Passes tests with FE_TEST=false BE_TEST=false.
Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.
New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.
Follow-up:
- Support for caching COS file handles will be addressed in
IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
COS (IMPALA-10773).
Tests:
- Upload hdfs test data to a COS bucket. Modify all locations in HMS
DB to point to the COS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_no_fd_caching_on_cached_data has been flaky for not having all of
the data fully cached in the warm-up phase. There is a limit on
concurrency in writing to the cache such that we may fail to cache data
the first time read it. This patch fixes the test by repeating the
warm-up query 5 times. This patch also add a proper start_args to the
test so that each impalad will write their data cache file in their own
directory.
Testing:
- Loop the test manually 100 times and see no more failures.
Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Reviewed-on: http://gerrit.cloudera.org:8080/17054
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.
Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.
Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Reviewed-on: http://gerrit.cloudera.org:8080/16963
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is essentially a revert of IMPALA-8178. HDFS-14308 added
CanUnbuffer support to the EC input stream APIs in the HDFS client lib.
This patch enables file handle caching for EC files.
Testing:
* Ran core tests against an EC build (ERASURE_CODING=true)
Change-Id: Ieb455eeed02a229a4559d3972dfdac7df32cdb99
Reviewed-on: http://gerrit.cloudera.org:8080/16567
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Like IMPALA-8428, but for ABFS, instead of S3A. Adds support for adding
ABFS file handles to the file handle cache. Support for ABFSInputStream
unbuffer operations was added in HADOOP-16859.
Ran a full table scan of a 1GB store_sales table on ABFS, made
sure the file handles were cached (validated via the runtime
profile); did this multiple times, against several different
copies of the store_sales table, in order to increase the number
of file handles cached by an impalad.
Tested:
* Tested against a ABFS storage account I have access to
Change-Id: I64f12f832980f4e0207af78368402dd09e370fc3
Reviewed-on: http://gerrit.cloudera.org:8080/16532
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch is based on work done by Joe McDonnell. This change adds
support for cacheing file handles from S3. It add a new configuration
flag 'cache_s3_file_handles' (set to true by default) which controls
whether or not cacheing of S3 file handles is enabled.
The S3 file handle cache is dependent on HADOOP-14747 (S3AInputStream to
implement CanUnbuffer). HADOOP-14747 adds support for hdfsUnbufferFile
to S3A streams. The call to unbuffer closes the underlying S3 object
stream. Without this change the S3 file handle cache would quickly cause
an impalad to crash because all S3 file handles in the cache would have
a dangling HTTP(S) connection open to S3.
Testing:
* Modified test_hdfs_fd_caching.py so it is enabled for S3 as well as
remote HDFS
* Ran core tests
* Ran TPC-DS on a real cluster and validated that the S3 file handle
cache works as expected
* Ran several test queries on a real cluster with S3Guard enabled and
validated that the S3 file handle cache works as expected
Change-Id: I5b304d37bc724377fbe7955441cce0cec6fb7f19
Reviewed-on: http://gerrit.cloudera.org:8080/13221
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the
ADLS Gen2 service. It's in the hadoop-azure module as a replacement for
WASB. Filesystem semantics should be the same, so skipped tests and
other behavior changes have simply mirrored what is done for ADLS Gen1
by default. Tests skipped on ADLS Gen1 due to eventual consistency of
the Python client can be run against ADLS Gen2.
Change-Id: I5120b071760e7655e78902dce8483f8f54de445d
Reviewed-on: http://gerrit.cloudera.org:8080/11630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.
Testing: It passes core tests.
Cherry-picks: not for 2.x.
Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, a file handle in the file handle cache will
only be evicted if the cache reaches its capacity. This
means that file handles can be retained for an indefinite
amount of time. This is true even for files that have
been deleted, replaced, or modified. Since a file handle
maintains a file descriptor for local files, this can
prevent the disk space from being freed. Additionally,
unused file handles are wasted memory.
This adds code to evict file handles that have been
unused for longer than a specified threshold. A thread
periodically checks the file handle cache to see if
any file handle should be evicted. The threshold is
specified by 'unused_file_handle_timeout_sec'; it
defaults to 6 hours.
This adds a test to custom_cluster/test_hdfs_fd_caching.py
to verify the eviction behavior.
Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e
Reviewed-on: http://gerrit.cloudera.org:8080/7640
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This change fixes three issues:
1. File handle caching is expected to be disabled for
remote files (using exclusive HDFS file handles),
however the file handles are still being cached.
2. The retry logic for exclusive file handles is broken,
leading the number of open files to be incorrect.
3. There is no test coverage for disabling the file
handle cache.
To fix issue #1, when a scan range is requesting an
exclusive file handle from the cache, it will always
request a newly opened file handle. It also will destroy
the file handle when the scan range is closed.
To fix issue #2, exclusive file handles will no longer
retry IOs. Since the exclusive file handle is always
a fresh file handle, it will never have a bad file
handle from the cache. This returns the logic to
its state before IMPALA-4623 in these cases. If a
file handle is borrowed from the cache, then the
code will continue to retry once with a fresh handle.
To fix issue #3, custom_cluster/test_hdfs_fd_caching.py
now does both positive and negative tests for the file
handle cache. It verifies that setting
max_cached_file_handles to zero disables caching. It
also verifies that caching is disabled on remote
files. (This change will resolve IMPALA-5390.)
Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
Reviewed-on: http://gerrit.cloudera.org:8080/7181
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This fixes three issues with the file handle cache.
The first issue is that ReopenCachedHdfsFileHandle can
destroy the passed in file handle without removing the
reference to it. The old file handle then refers to
a piece of memory that is not a handle in the cache,
so future use of the handle fails with an assert. The
fix is to always overwrite the reference to the file
handle when it has been destroyed.
The second issue is that query_test/test_hdfs_fd_caching.py
should run on anything that supports the hdfs commandline
and tolerate query failure. It's logic is not specific to
file handle caching, so it has been renamed to
query_test/test_hdfs_file_mods.py.
Finally, custom_cluster/test_hdfs_fd_caching.py should not
be running on remote files (S3, ADLS, Isilon, remote
clusters). The file handle cache semantics won't apply on
those platforms.
Change-Id: Iee982fa5e964f6c8969b2eb7e5f3eca89e793b3a
Reviewed-on: http://gerrit.cloudera.org:8080/7020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.
There are two parts to this transaction:
1. Enable file handle caching by default for local files
2. Share the file handle between scan ranges from the same
file
Local scan ranges no longer maintain their own Hdfs file
handles. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.
To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.
If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs or the
scan range is remote, the scan range will get a
file handle from the cache and hold it until the
scan range is closed. This mimics the existing behavior,
except the file handle stays in the cache and is owned
by the cache. Since it is in use, it will not be evicted.
If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.
Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.
Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Reviewed-on: http://gerrit.cloudera.org:8080/6478
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.
We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.
For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.
The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.
Added another dependency to bootstrap_build.sh for the ADLS Python
client.
Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.
Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
This change whitelists the supported filesystems which can be set
as Default FS for Impala to run on.
This patch configures Impala to use S3 as the default filesystem, rather
than a secondary filesystem as before.
Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed
Reviewed-on: http://gerrit.cloudera.org:8080/1121
Reviewed-by: anujphadke <aphadke@cloudera.com>
Tested-by: Internal Jenkins
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.
Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.
To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.
Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS 1348 tests passed
S3 1157 tests passed
Local Filesystem 1161 tests passed
Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
This patch modifies the Impala command line flag:
--max_cached_file_handles=VAL
to disable caching of HDFS file handles if VAL is 0.
In addition, it moves the existing functional tests to a custom cluster
test and keeps a sanity check for no caching in the original
place. Furthermore, it will check that no file handles are leaked.
Change-Id: Ic36168bba52346674f57639e1ac216fd531b0fad
Reviewed-on: http://gerrit.cloudera.org:8080/691
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins