impala

mirror of https://github.com/apache/impala.git synced 2025-12-23 11:55:25 -05:00

Author	SHA1	Message	Date
Lars Volker	4fbe4cb208	IMPALA-6697: Downgrade setuptools to be compatible with Python 2.6 Change-Id: I0d4727b7a5911269b82287ed9ce759f1e211f386 Reviewed-on: http://gerrit.cloudera.org:8080/9713 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-03-18 23:31:17 +00:00
Lars Volker	b1ef7de0e7	IMPALA-6695: Fix PyPi regex, update setuptools version pytest-runner, which is required by kudu-python requires are more recent version of setuptools. Adding an explicit dependency required an update to the regular expression to parse PyPi URLs. Change-Id: Ia67189f81a31a9a5a0ed80cd4d6661762ef427b2 Reviewed-on: http://gerrit.cloudera.org:8080/9711 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2018-03-18 16:39:32 +00:00
Tianyi Wang	af6769d95a	IMPALA-6690: Fix pip_download.py on python 2.6 IMPALA-6682 used set literal syntax in pip_download.py, which is introduced in python 2.7. This patch changes it to set constructor. It's tested on python 2.6.9. Change-Id: I82b4116ee056f605c8aadf39a8b92b78313cb8bf Reviewed-on: http://gerrit.cloudera.org:8080/9694 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-17 01:32:22 +00:00
Tianyi Wang	8dde41e802	IMPALA-6682: Remove MD5 assumption from pypi download script pip_download.py assumes the python repository to use md5 as the hash algorithm, which is not required by PEP-503 and not always true in reality. This patch removes this assumption and enables support of all hash algorithms in python hashlib. Testing: buildall.sh works with 2 repos. One uses md5 and another uses sha-256. Change-Id: Ie78f851490cbab10daa654aece36dab6e6c4329b Reviewed-on: http://gerrit.cloudera.org:8080/9683 Reviewed-by: Tianyi Wang <twang@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-16 03:39:33 +00:00
Tim Armstrong	dc1282fbc9	IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Reviewed-on: http://gerrit.cloudera.org:8080/8652 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:28:22 +00:00
Sailesh Mukil	4e5497995b	IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies Builds on CentOS 6.4 fail due to dependencies not met for the new 'cryptography' python package. The ADLS commit states that the new packages are only required for ADLS and that ADLS on a dev environment is only supported from CentOS 6.7. This patch moves the compiled requirements for ADLS from compiled-requirements.txt to adls-requirements.txt and passing a compiler to the Pip environment while installing the ADLS requirements. Testing: Tested it on a machine that with TARGET_FILESYSTEM='adls' and also tested it on a CentOS 6.4 machine with the default configuration. Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78 Reviewed-on: http://gerrit.cloudera.org:8080/6998 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-26 07:52:40 +00:00
Sailesh Mukil	50bd015f2d	IMPALA-5333: Add support for Impala to work with ADLS This patch leverages the AdlFileSystem in Hadoop to allow Impala to talk to the Azure Data Lake Store. This patch has functional changes as well as adds test infrastructure for testing Impala over ADLS. We do not support ACLs on ADLS since the Hadoop ADLS connector does not integrate ADLS ACLs with Hadoop users/groups. For testing, we use the azure-data-lake-store-python client from Microsoft. This client seems to have some consistency issues. For example, a drop table through Impala will delete the files in ADLS, however, listing that directory through the python client immediately after the drop, will still show the files. This behavior is unexpected since ADLS claims to be strongly consistent. Some tests have been skipped due to this limitation with the tag SkipIfADLS.slow_client. Tracked by IMPALA-5335. The azure-data-lake-store-python client also only works on CentOS 6.6 and over, so the python dependencies for Azure will not be downloaded when the TARGET_FILESYSTEM is not "adls". While running ADLS tests, the expectation will be that it runs on a machine that is at least running CentOS 6.6. Note: This is only a test limitation, not a functional one. Clusters with older OSes like CentOS 6.4 will still work with ADLS. Added another dependency to bootstrap_build.sh for the ADLS Python client. Testing: Ran core tests with and without TARGET_FILESYSTEM as 'adls' to make sure that all tests pass and that nothing breaks. Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Reviewed-on: http://gerrit.cloudera.org:8080/6910 Tested-by: Impala Public Jenkins Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>	2017-05-25 19:35:24 +00:00
Lars Volker	841fe7f621	IMPALA-5189: Pin version of setuptools-scm A new upstream release of setuptools-scm (1.15.3) broke setting up the python environment. A subsequently released version fixed the breakage. Nonetheless pinning external dependencies seems like a good idea, so this change pins the version of setuptools-scm to the new version (1.15.4) to protect us from similar issues in the future. I tested this by running the following command in a new virtualenv and checking in the output that it installed the correct version of setuptools-scm (1.15.4). pip install --no-binary --no-index --no-cache-dir --find-links infra/python/deps/ -r infra/python/deps/requirements.txt Change-Id: I398972d2cdf3acc9d5d8c598fc5b964b7241f1d2 Reviewed-on: http://gerrit.cloudera.org:8080/6599 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-19 22:03:51 +00:00
Taras Bobrovytsky	4a79c9e7e3	IMPALA-5181: Extract PYPI metadata from a webpage There were some build failures due to a failure to download a JSON file containing package metadata from PYPI. We need to switch to downloading this from a PYPI mirror. In order to be able to download the metadata from a PYPI mirror, we need be able to extract the data from a web page, because PYPI mirrors do not always have a JSON interface. We implement a regex based html parser in this patch. Also, we increase the number of download attempts and randomly vary the amount of time between each attempt. Testing: - Tested locally against PYPI and a PYPI mirror. - Ran a private build that passed (which used a PYPI mirror). Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de Reviewed-on: http://gerrit.cloudera.org:8080/6579 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-08 00:19:08 +00:00
Michael Brown	cc8a119839	IMPALA-5044: test infra: remove backports.tempfile backports.tempfile is not compatible with Python 2.6, so if Python 2.6 is the Python used for end-to-end tests, this test unconditionally fails. Moreover, Py.test provides a builtin tmpdir fixture with equivalent functionality. Remove the requirement and port tests using backports.tempfile.TemporaryDirectory to use tmpdir. Change-Id: I887b62eb1b3425fc8fd62562e28f0c17cb261f6d Reviewed-on: http://gerrit.cloudera.org:8080/6316 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-09 01:57:37 +00:00
Lars Volker	768fc0ea27	IMPALA-4734: Set parquet::RowGroup::sorting_columns This changes the HdfsParquetTableWriter to populate the parquet::RowGroup::sorting_columns list with all columns mentioned in a 'sortby()' hint within INSERT statements. The columns are added to the list in the order in which they appear inside the hint. The change also adds backports.tempfile to the python requirements to provide 'tempfile.TemporaryDirectory' on python 2.7. The change also changes the default ordering for columns mentioned in 'sortby()' hints from descending to ascending. To test this change, we write a table with a 'sortby()' hint and verify, that the sorting_columns get populated correctly. Change-Id: Ib42aab585e9e627796e9510e783652d49d74b56c Reviewed-on: http://gerrit.cloudera.org:8080/6219 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-07 09:07:05 +00:00
Tim Armstrong	c8e15e484c	IMPALA-4593,IMPALA-4635: fix some python build issues Build C/C++ packages with toolchain GCC to avoid ABI compatibility issues. This requires a multi-step bootstrapping process: 1. install basic non-C/C++ packages into the virtualenv 2. use Python 2.7 from the virtualenv to bootstrap the toolchain 3. use toolchain gcc to build C/C++ packages 4. build the kudu-python package with toolchain gcc and Cython To avoid potentially pulling in cached versions of packages built with a different compiler, this patch also disables pip's caching. This should not have a significant effect on performance since we've enabled ccache and cache downloaded packages in infra/python/deps. Improve bootstrapping time significantly by using ccache and by parallelising the numpy build - the most expensive part of the install process. On a system with a warmed-up ccache, bootstrapping after deleting infra/python/env takes 1m16s. Previously it could take over 5m. Testing: Tested manually on Ubuntu 16.04 to confirm that it fixes the ABI problem mentioned in IMPALA-4593. Initially "import kudu" failed in my dev environment. After deleting infra/python/env and re-bootstrapping, "import kudu" succeeded. Also ran the standard test suite on CentOS 6 and built Impala on a range of platforms (CentOS 5,6,7; SLES 11,12; Debian 6,7; Ubuntu12.04,14.04,16.04) to make sure nothing broke. Change-Id: I9e807510eddeb354069e0478363f649a1c1b75cf Reviewed-on: http://gerrit.cloudera.org:8080/6218 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-07 02:56:18 +00:00
Matthew Jacobs	ed711330fc	IMPALA-4934: Disable Kudu OpenSSL initialization Bumps the Kudu version to include the change to the client that allows Impala to disable SSL initialization. In authentication.cc, after Impala initializes OpenSSL, Impala then disables Kudu's OpenSSL init. Fixed a python test case that started failing after bumping the Kudu client version. Change-Id: I3f13f3af512c6d771979638da593685524c73086 Reviewed-on: http://gerrit.cloudera.org:8080/6056 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 05:06:20 +00:00
David Knupp	e5c098b076	IMPALA-4735: Upgrade pytest in python env to version 2.9.2. The current version of pytest in the Impala python environment is quite old (2.7.2) and there have been bug fixes in later versions that we could benefit from. Also, since the passing of params to pytest.main() as a string will be deprecated in upcoming versions of pytest, edit run-tests.py to instead pass params as a list. (This also means we don't need to worry about esoteric bash limitations re: single quotes in strings.) While working on this file, the filtering of commandline args when running the verfier tests was made a little more robust. Tested by doing a standard (non-exhaustive) test run on centos 6.4 and ubuntu 14.04, plus an exhaustive test run on RHEL7. Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Reviewed-on: http://gerrit.cloudera.org:8080/5640 Tested-by: Impala Public Jenkins Reviewed-by: Jim Apple <jbapple-impala@apache.org>	2017-02-02 21:27:39 +00:00
Taras Bobrovytsky	2159beee89	IMPALA-4467: Add support for DML statements in stress test - Add support for insert, upsert, update and and delete statements. - Add support for compute stats with mt_dop query options. - Update impyla version in order to be able to have access to query error text for DML queries. - Made flake8 fixes. flake8 on this file is clean. For every Kudu table in the databases, we make a copy and add a '_original' suffix to the table name. The DML queries will only make modifications to the non original table, the original table will never be modified. The orignal tables could be used to bring the non-original table to the inital state. Two flags were added for doing this: --reset-databases-before-binary-search and --reset-databases-after-binary-search. The DML queries are generated based on the mod values passed in with the following flag: --dml-mod-values 11 13 17. For each mod value 4 DML queries are generated. The DML operations will touch table rows where primary_key % mod_value = 0. So, the larger the mod value, the more rows would be affected. The DML queries are generated in such a way that the data for the insert, upsert, and update queries is taken from the table with the _original suffix. The stress test generates DML queries for only kudu databases. For example, --tpch-kudu-db=tpch_100_kudu --tpch-db=tpch_100 --generate-dml-queries would only generate queries for the tpch_100_kudu database. Here's an example of a full call with the new options that runs the stress test on the local mini cluster: ./concurrent_select.py \ --tpch-kudu-db=tpch_kudu \ --generate-dml-queries \ --dml-mod-values 11 13 17 \ --generate-compute-stats-queries \ --select-probability=0.5 \ --mem-limit-padding-pct=25 \ --mem-limit-padding-abs=50 \ --reset-databases-before-binary-search \ --reset-databases-after-binary-search Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40 Reviewed-on: http://gerrit.cloudera.org:8080/5093 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2016-12-20 01:33:01 +00:00
Matthew Jacobs	f3fe2cfe10	Bump Kudu python version to 1.1 Change-Id: I5834b3aa4eeae363eae938f61e473c52a0fe5596 Reviewed-on: http://gerrit.cloudera.org:8080/5307 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-12-01 23:11:49 +00:00
Tim Armstrong	51b1310681	IMPALA-3872: allow providing PyPi mirror for python packages We still rely on the python.org json API, which doesn't seem to be mirrored (instead there's a html-based index format implemented by the mirrors). The mirror can be provided by setting the PYPI_MIRROR environment variable. The default is "https://pypi.python.org". Change-Id: Ibc11f010332c0225121c86c9930e35c7ac01409c Reviewed-on: http://gerrit.cloudera.org:8080/4770 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-11-08 05:34:50 +00:00
Matthew Jacobs	9b507b6ed6	IMPALA-4379: Fix and test Kudu table type checking Creating Kudu tables shouldn't allow types not supported by Kudu (e.g. VARCHAR/CHAR, DECIMAL, TIMESTAMP, collection types). The behavior is inconsistent: for some types it throws in the catalog, for VARCHAR/CHAR these become strings. This changes behavior so that all fail during analysis. Analysis tests were added. Similarly, external tables cannot contain Kudu types that Impala doesn't support (e.g. UNIXTIME_MICROS, BINARY). Tests were added to validate this behavior. Note that this required upgrading the python Kudu client. This also fixes a small corner case with ALTER TABLE: ALTER TABLE shouldn't allow Kudu tables to change the storage descriptor tblproperty, otherwise the table metadata gets in an inconsistent state. Tests were added for all of the above. Change-Id: I475273cbbf4110db8d0f78ddf9a56abfc6221e3e Reviewed-on: http://gerrit.cloudera.org:8080/4857 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-10-31 16:03:54 +00:00
Dimitris Tsirogiannis	041fa6d946	IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Reviewed-on: http://gerrit.cloudera.org:8080/4414 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 10:52:25 +00:00
David Knupp	a42d18dcc3	IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Reviewed-on: http://gerrit.cloudera.org:8080/4348 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-09-15 00:02:22 +00:00
Jim Apple	bd2947329e	IMPALA-4110: Clean up issues found by Apache RAT. Change-Id: I5bfe77f9a871018e7a67553ed270e2df53006962 Reviewed-on: http://gerrit.cloudera.org:8080/4361 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-09-14 22:09:24 +00:00
Zoltan Ivanfi	a60ba6d274	IMPALA-4006: dangerous rm -rf statements in scripts Quoted variable substitutions in rm -rf commands and in many other places. This prevents disasters if those variables contain whitespace. Redirected output of the cd commands to /dev/null. This prevents polluting the target variable with the directory name when the CDPATH environment variable is set. Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe Reviewed-on: http://gerrit.cloudera.org:8080/4078 Tested-by: Internal Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-09-01 21:26:52 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Tim Armstrong	904265ccb5	Update .gitignore files for ninja, coredumps and pypi packages Change-Id: Ie7d34fbd27150ba6c437207611f71bb95a0e4cba Reviewed-on: http://gerrit.cloudera.org:8080/3814 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-29 21:42:07 +00:00
Lars Volker	b94f88a697	IMPALA-3886: Improve log of pip_download.py pip_download.py prints the following line for each dependency that is already up-to-date: File with matching md5sum already exists, skipping download. This change adds the filename to the message so it is more useful. Change-Id: Ie3d81743814be37ee8ddbe04c264ed2bf37410f9 Reviewed-on: http://gerrit.cloudera.org:8080/3687 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-07-21 08:30:51 -07:00
Taras Bobrovytsky	baf8fe202c	IMPALA-3778: Fix ASF packaging build The tarballs in IMPALA_HOME/infra/python/deps and the thirdparty directory have been removed in the ASF repository. All Python dependencies and CDH components must now be downloaded as part of every build. This caused the ASF packaging build to fail. Before this patch, we used the system pip to download the Python dependencies, which caused flakiness and inconsistency on different operating systems. This patch fixes the problem by using our own script (which requires Python 2.6+ to be installed on the system), to download all the files in requirements.txt. Also replaced all whl and zip Python packages with tar.gz to make it consistent with the ASF build. Change-Id: Ibe5a743096cda2059bd330805d324983f6730e19 Reviewed-on: http://gerrit.cloudera.org:8080/3647 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>	2016-07-14 19:04:45 +00:00
Tim Armstrong	a070217750	IMPALA-3774: fix download_requirements for older Python versions Pip always runs the setup.py file in downloaded tarballs to get metadata. Impyla's setup.py does not work in some older python installations since find_packages() in setuptools does not support the 'include' argument. As a workaround, use our pip_download.py script to download Impyla instead of pip. Testing: Confirmed that Jenkins build successfully downloaded the pip packages and was able to bootstrap the virtualenv. Change-Id: Id8801493c0f4caab2273383333ffbe2729b8339b Reviewed-on: http://gerrit.cloudera.org:8080/3574 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-07-06 14:40:54 -07:00
Michael Ho	a07fc367ee	Revert "IMPALA-1619: Support 64-bit allocations." This reverts commit 1ffb2bd5a2a2faaa759ebdbaf49bf00aa8f86b5e. Unbreak the packaging builds for now. Change-Id: Id079acb83d35b51ba4dfe1c8042e1c5ec891d807 Reviewed-on: http://gerrit.cloudera.org:8080/3543 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>	2016-07-05 13:37:26 -07:00
Michael Ho	5f3dfdf6c7	IMPALA-1619: Support 64-bit allocations. This change extends MemPool, FreePool and StringBuffer to support 64-bit allocations, fixes a bug in decompressor and extends various places in the code to support 64-bit allocation sizes. With this change, the text scanner can now decompress compressed files larger than 1GB. Note that the UDF interfaces FunctionContext::Allocate() and FunctionContext::Reallocate() still use 32-bit for the input argument to avoid breaking compatibility. In addition, the byte size of a tuple is still assumed to be within 32-bit. If it needs to be upgraded to 64-bit, it will be done in a separate change. Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20 Reviewed-on: http://gerrit.cloudera.org:8080/2781 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-05 13:37:25 -07:00
Jim Apple	a5ae2bfd88	IMPALA-3762: Download Python requirements before they are needed. This is needed for ASF builds. It sounds expensive, but takes less than 10 seconds if the packages are already present. Change-Id: I84103c2fb8f9a93336bf28b644ca045f15651dd6 Reviewed-on: http://gerrit.cloudera.org:8080/3452 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Jim Apple <jbapple@cloudera.com>	2016-06-22 14:38:57 -07:00
Jim Apple	140220323d	IMPALA-3767: bootstrap_virtualenv fails to find cython distribution This patch avoids trying to download a Cython binary 0.23.4, which pip has trouble finding. Change-Id: Ic6733ccb71bcf99196075faa2fb6cf2a1d6276ce Reviewed-on: http://gerrit.cloudera.org:8080/3427 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-06-21 19:38:19 -07:00
Tim Armstrong	fc3ff1c52f	IMPALA-3763: download_requirements fixes * Download to infra/python/deps instead of the current directory. * Download the correct virtualenv version, to match the version on cdh5-trunk * Don't re-download packages repeatedly, instead check the md5sum. Testing: Tested manually on the ASF tree, then made sure that bootstrap_virtualenv completed successfully to make sure we had all of the requirements downloaded successfully. Change-Id: I5a3c42236dddfd8a456c82605dc1fdc199a2bc48 Reviewed-on: http://gerrit.cloudera.org:8080/3416 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-06-21 00:37:54 -07:00
Tim Armstrong	ec3a1c7866	download_requirements should download kudu-python and virtualenv This is required for the ASF migration, since we don't want to include all of the tarballs in the repo and we want to allow developers to build using dependencies obtained from the standard upstream sources. Also remove a workaround for an old issue with building an impyla development version package. Change-Id: Ie9216596db0f37d706ea7f77c129cecd5b070429 Reviewed-on: http://gerrit.cloudera.org:8080/3217 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-06-13 17:32:27 -07:00
Michael Brown	22669e23be	IMPALA-3501: ee tests: detect build type and support different timeouts based on the same Impala compiled with the address sanitizer, or compiled with code coverage, runs through code paths much slower. This can cause end-to-end tests that pass on a non-ASAN or non-code coverage build to fail. Some examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These classes of failures tend always to involve some time-sensitive condition that fails to succeed under such "slow builds". The works-around in the past have been to simply increase the timeout. The problem with this approach is that it relaxes conditions for tests on builds that see the field--i.e., release builds--for builds that never will--i.e., ASAN and code coverage. This patch fixes that problem by allowing test authors to set timeout values based on a specific build type. The author may choose timeouts with a default value, and different timeouts for either or both so-called "slow builds": ASAN and code coverage. We detect the so-called "specific build type" by inspecting the binary expected to be at the path under test. This removes the need to make alterations to Impala itself. The inspection done is to read the DWARF information in the binary, specifically the first compile unit's DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic based on these attributes' values to guess the build type. If we can't determine the build type, we will assume it's a debug build. More information on this is in IMPALA-3501. A quick summary of the changes follows: 1. Move some of the logic in tests.common.skip to tests.common.environ and rework some skip marks to be more precise. 2. Add Pyelftools for convenient deserialization of DWARF 3. Our Pyelftools usage requires collections.OrderedDict, which isn't in python2.6; also add Monkeypatch to handle this. 4. Add ImpalaBuild and specific_build_type_timeout, the core of the new functionality 5. Fix the statestore tests that only fail under code coverage (the basis for IMPALA-3501) Testing: The tests that were previously, reliably failing under code coverage now pass. I also ran perfunctory tests of debug, release, and ASAN builds to ensure our detection of build type is working. This patch will not turn the code coverage builds green; there are other tests that fail, and fixing all of them here is out of the scope of this patch. Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47 Reviewed-on: http://gerrit.cloudera.org:8080/3156 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-25 19:41:45 -07:00
Michael Brown	5112e65be2	Revert "Revert "Add Kudu test helpers"" This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The difference between this patch and its original is that I fixed the changes introduced in infra/python/bootstrap_virtualenv.py to be python2.4-compatible: - removed the use of str.format(), preferring a str.join() pattern - removed the call of the exit() builtin to prefer sys.exit() The only testing I did for this patch was to ensure CDH Impala-packaging-on-demand works. Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325 Reviewed-on: http://gerrit.cloudera.org:8080/3160 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-24 16:40:59 -07:00
Shiraz Ali	08eff2bc09	Revert "Add Kudu test helpers" This reverts commit 9248dcb70478b8f93f022893776a0960f45fdc28.	2016-05-20 08:46:00 -07:00
casey	36b524f68c	Add Kudu test helpers Changes: 1) Add the python Kudu module to the virtualenv. Building the virtualenv is much slower now because Cython and numpy are required. To help with the rebuild time --no-cache was removed. That option was added to help when using the dev version of impyla, the version number would be the same but the module contents were different and the cache used the old module contents. 2) Add some py.test fixtures to help create Kudu and Impala connections. Change-Id: I8e5e22b38d5bd09a36238e66a69aa42d1a941de7 Reviewed-on: http://gerrit.cloudera.org:8080/2855 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-05-19 19:45:48 -07:00
Lars Volker	12799fae6c	IMPALA-3489: Add script to extract breakpad symbols from binaries Change-Id: I3ee0972efcb50609407b04cd6f4309b244a84861 Reviewed-on: http://gerrit.cloudera.org:8080/2961 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Internal Jenkins	2016-05-17 01:30:11 -07:00
Sailesh Mukil	ed7f5ebf53	IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems Previously Impala disallowed LOAD DATA and INSERT on S3. This patch functionally enables LOAD DATA and INSERT on S3 without making major changes for the sake of improving performance over S3. This patch also enables both INSERT and LOAD DATA between file systems. S3 does not support the rename operation, so the staged files in S3 are copied instead of renamed, which contributes to the slow performance on S3. The FinalizeSuccessfulInsert() function now does not make any underlying assumptions of the filesystem it is on and works across all supported filesystems. This is done by adding a full URI field to the base directory for a partition in the TInsertPartitionStatus. Also, the HdfsOp class now does not assume a single filesystem and gets connections to the filesystems based on the URI of the file it is operating on. Added a python S3 client called 'boto3' to access S3 from the python tests. A new class called S3Client is introduced which creates wrappers around the boto3 functions and have the same function signatures as PyWebHdfsClient by deriving from a base abstract class BaseFileSystem so that they can be interchangeably through a 'generic_client'. test_load.py is refactored to use this generic client. The ImpalaTestSuite setup creates a client according to the TARGET_FILESYSTEM environment variable and assigns it to the 'generic_client'. P.S: Currently, the test_load.py runs 4x slower on S3 than on HDFS. Performance needs to be improved in future patches. INSERT performance is slower than on HDFS too. This is mainly because of an extra copy that happens between staging and the final location of a file. However, larger INSERTs come closer to HDFS permformance than smaller inserts. ACLs are not taken care of for S3 in this patch. It is something that still needs to be discussed before implementing. Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Reviewed-on: http://gerrit.cloudera.org:8080/2574 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:49 -07:00
Taras Bobrovytsky	f3d2f6bd7e	IMPALA-2898: Fix the leopard framework (qgen) Some recent commits broke the query generator leopard framework, for example QueryResultComparator requires a different number of arguments. Additional changes: - Added better support for running the query generator in nested types mode - Keeping track of the number of queries that returned data - Made it easier to control behavior from a central place by adding flags to controller.py Change-Id: I8f47c52097ccd53df4233b88eea887ce5fab1955 Reviewed-on: http://gerrit.cloudera.org:8080/1968 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-30 06:02:09 +00:00
Alex Behm	838e773797	Add pytest-random to virtual env. I do not plan on enabling this any time soon, but I do want to privately run the tests in a random order to flush out flakiness. Change-Id: Ib14bde2f35c33566e5cdba8a28a789e089f43be6 Reviewed-on: http://gerrit.cloudera.org:8080/1931 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-01-28 04:38:54 +00:00
Casey Ching	f288867833	Stress test: Various changes The major changes are: 1) Collect backtrace and fatal log on crash. 2) Poll memory usage. The data is only displayed at this time. 3) Support kerberos. 4) Add random queries. 5) Generate random and TPC-H nested data on a remote cluster. The random data generator was converted to use MR for scaling. 6) Add a cluster abstraction to run data loading for #5 on a remote or local cluster. This also moves and consolidates some Cloudera Manager utilities that were in the stress test. 7) Cleanup the wrappers around impyla. That stuff was getting messy. Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7 Reviewed-on: http://gerrit.cloudera.org:8080/1298 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 23:00:25 +00:00
Casey Ching	ed95351d46	IMPALA-2773: Python virtualenv fails building readline on Linux Building readline on linux requires that a dev package of ncurses is installed but it's typically not installed by default. It turns out that pip's requirements file format has a way to mark modules as OS dependent, we just need to use it. Change-Id: Iacb5289a8406cfb975dd98867450228f4df275eb Reviewed-on: http://gerrit.cloudera.org:8080/1640 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 05:06:09 +00:00
Martin Grund	31503db4f8	IMPALA-2761: OS X: Add readline dependency for cm_api The cm_api plugin has a readline dependency that is not correctly resolved on OS X. For that reason we want to have this dependency part of the Impala code, even though it might not be used on Linux. Change-Id: I8c386e7b07f8f7e39d03260e96d23aaf4b00180a Reviewed-on: http://gerrit.cloudera.org:8080/1613 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com> Readability: Martin Grund <mgrund@cloudera.com>	2015-12-14 06:35:09 +00:00
Taras Bobrovytsky	f15e4f9033	Add nested types to Query Generator - Parsing of describe statements with nested types - Random query generation that involves nested types - Query flattening (converts a query for a dataset with nested types to an equivalent query for a flattened dataset) Change-Id: If013d104fb90864dcf0934ef92157b95e917e7e8 Reviewed-on: http://gerrit.cloudera.org:8080/1375 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-11-26 00:40:29 +00:00
Casey Ching	8c55f9b2f5	Python: Upgrade impyla to bring in bug fix 0.11.2 has a fix for https://github.com/cloudera/impyla/issues/126 This also remove an extra copy of execnet that was somehow in the deps folder. Change-Id: I7b581c7a44be872b95e31b454ab1c42e1f1b8421 Reviewed-on: http://gerrit.cloudera.org:8080/822 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-09-14 13:43:01 -07:00
Casey Ching	2f62d30ccb	Python: Update impyla for Hive changes Impyla now has better support for Hive with commit c83a11. This is useful for testing. Change-Id: I2814cbff6f1fcfce51a0271b68daababca33dc65 Reviewed-on: http://gerrit.cloudera.org:8080/744 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-09 18:53:50 +00:00
Casey Ching	2e4e89c267	Python: Add thrift_sasl to virtualenv Previously thrift_sasl was brought into the virtualenv by building the shell. That meant the shell had to be built before the virtualenv could be used. By includeing thrift_sasl directly, the virtualenv can be used even if impala/shell is not built. Change-Id: Id1a099036b1ac8add5a314af981789ebf69ce465 Reviewed-on: http://gerrit.cloudera.org:8080/685 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-09-09 03:17:25 +00:00
Jim Apple	927b8a4d39	Python sasl module is needed by tests/util/thrift_util.py. Change-Id: I6769991e8b3de0c05b2236cc8243586113a97368 Reviewed-on: http://gerrit.cloudera.org:8080/686 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-25 03:02:54 +00:00
Casey Ching	ca5856b8f8	Python: Bootstrap a virtualenv and add impala-python command This adds a bootstrap script and a "impala-python" command to $IMPALA_HOME/bin that automatically runs the bootstrap and redirects to the virtualenv python. Existing python scripts will later be updated to use the this new "impala-python" command. The bootstrap script will build a virtualenv to ensure a minimum python version (2.6) and a well known set of dependencies. The bootstrap script can be run with python 2.4 but 2.6 must already be installed on the system. The resulting virtualenv will use 2.6 at a minimum. Only dependencies explicitly listed in requirements.txt will be installed and available (no system packages will ever be used). No packages will ever be downloaded when setting up the virtualenv. In the future new dependencies can be added by editing the requirements.txt file. Installation through requirements.txt is a standard pip feature. When requirements.txt is updated, the next run of "impala-python" will rebuild the virtualenv. Change-Id: I150595d7e09a45d5f2e3c30a845bc8d6a761eeed Reviewed-on: http://gerrit.cloudera.org:8080/424 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-01 01:30:12 +00:00

1 2

100 Commits