impala

mirror of https://github.com/apache/impala.git synced 2026-01-27 06:10:53 -05:00

Author	SHA1	Message	Date
Tim Armstrong	d05f73f415	IMPALA-7647: Add HS2/Impyla dimension to TestQueries I used some ideas from Alex Leblang's abandoned patch: https://gerrit.cloudera.org/#/c/137/ in order to run .test files through HS2. The advantage of using Impyla is that much of the code will be reusable for any Python client implementing the standard Python dbapi and does not require us implementing yet another thrift client. This gives us better coverage of non-trivial result sets from HS2, including handling of NULLs, error logs and more interesting result sets than the basic HS2 tests. I added HS2 coverage to TestQueries, which has a reasonable variety of queries and covers the data types in alltypes. I also added TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of results sets with NULLs was limited so I added a couple of queries. Places where results differ from Beeswax: * Impyla is a Python dbapi client so must convert timestamps into python datetime objects, which only have microsecond precision. Therefore result timestamps within nanosecond precision are truncated. * The HS2 interface reports the NULL type as BOOLEAN as a workaround for IMPALA-914. * The Beeswax interface reported VARCHAR as STRING, but HS2 reports VARCHAR. I dealt with different results by adding additional result sections so that the expected differences between the clients/protocols were explicit. Limitations: * Not all of the same methods are implemented as for beeswax, so some tests that have more complicated interactions with the client will not work with HS2 yet. * We don't have a way to get the affected row count for inserts. I also simplified the ImpalaConnection API by removing some unnecessary methods and moved some generic methods to the base class. Testing: * Confirmed that it detected IMPALA-7588 by re-applying the buggy patch. * Ran exhaustive and CentOS6 tests. Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e Reviewed-on: http://gerrit.cloudera.org:8080/11546 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-09 00:45:10 +00:00
ishaan	8369c3b13b	Remove explicit references to functional_hbase tables from .test files. Additionally, this patch also disabled the hbase/none test dimension if the TARGET_FILESYSTEM environment variable is set to either s3 of isilon. Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb Reviewed-on: http://gerrit.cloudera.org:8080/74 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2015-02-23 23:32:41 +00:00
Lenni Kuff	cd30246f17	Fix flaky group_concat() query tests The ordering of results returned by the group_concat() tests were not deterministic. This fixes the problem by switching the test cases to use a subquery with an order by. Also fixed a similar problem with the limit and union tests. Change-Id: Ibfe3c1597229cf5156af3a69b26bcce93abe28df Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3822 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-08-12 13:38:12 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Matthew Jacobs	8a55982105	Add OFFSET to skip rows returned with a LIMIT Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive does not support OFFSET so creating a view with an OFFSET will not work in Hive. For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip 5 rows, then return the next 20. OFFSET requires an ORDER BY clause. Note this is not very efficient as we must actually keep (limit+offset) rows in memory in the topn-node, and all child sort nodes must as well. Users should be careful when using this feature. Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d Reviewed-on: http://gerrit.ent.cloudera.com:8080/900 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:02 -08:00
Alex Behm	1497002013	Added SHOW TABLE/COLUMN STATS command. Fixed the following stats-related bugs: - Per-partition row count was not distributed properly via CatalogService - HBase column stats were not loaded and distributed properly Enhancements to test framework: - Allow regex specification of expected row or column values - Fixed expected results of some tests because the test framework did not catch that they were incorrect Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51 Reviewed-on: http://gerrit.ent.cloudera.com:8080/813 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:51 -08:00
Matthew Jacobs	00bc971d34	IMPALA-531: Allow arithmetic expressions for LIMIT Change-Id: Ic1901e9dbaeee5fb0aef72a278b4aa262a2abcd7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/829 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:49 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Alex Behm	5db3f2cdf5	IMPALA-227: SELECT * on partitioned table returns columns in different order than Hive.	2014-01-08 10:49:48 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Alan Choi	a5a9ccf8c2	IMP-550 short-circuit queries with limit 0 Impala server would examine the plan. If the first fragment's top plan node has a "limit 0", then the query is set to EOS immediately.	2014-01-08 10:46:41 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00

13 Commits