Commit Graph

9 Commits

Author SHA1 Message Date
Fang-Yu Rao
b3b00da1a1 IMPALA-7608: Estimate row count from file size when no stats available
Added the feature that computes an estimated number of rows in the current
hdfs table if the statistics for the cardinality of the current hdfs table is not
available.

Also added an additional query option to revert the change in case of regression.

Testing:
(1) In CardinalityTest.java, replaced the original statement
"verifyCardinality("SELECT a FROM functional.tinytable", -1);" in
the method testBasicsWithoutStats() with
"verifyCardinality("SELECT a FROM functional.tinytable", 2);".
(2) In CarginalityTest.java, added more tests to check the cardinality
of most PlanNode implementations. For each tested PlanNode, the behaviors
before and after we disable the feature are both tested.
(3) In set.test, modified three related test cases to make sure that
the added query option is included after executing "set all" in various
scenarios.
(4) There are 8 JUnit tests in PlannerTest.java that would produce different
distributed query plans when this feature is enabled. Added an additional
JUnit test for 6 of those 8 affected JUnit tests when this feature is
enabled. Specifically, each tested query in a newly added test files involves
at least one hdfs table without available statistics.
We do not add test cases for 2 of the affected JUnit tests when this feature
is enabled since it results in flaky tests. These two JUnit tests are
testResourceRequirements() and testSpillableBufferSizing(). In this patch
we only test them when the feature is disabled.
(5) There are 5 Python end to end tests that consist of queries that would
produce different results. Added an additional query for each affected query
when this feature is disabled.

Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a
Reviewed-on: http://gerrit.cloudera.org:8080/12974
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-21 03:28:43 +00:00
Tim Armstrong
d05f73f415 IMPALA-7647: Add HS2/Impyla dimension to TestQueries
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.

This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.

I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.

Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
  objects, which only have microsecond precision. Therefore result
  timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
  IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
  VARCHAR.

I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.

Limitations:
* Not all of the same methods are implemented as for beeswax, so some
  tests that have more complicated interactions with the client will not
  work with HS2 yet.
* We don't have a way to get the affected row count for inserts.

I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.

Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.

Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-09 00:45:10 +00:00
Alex Behm
c1781b73b3 Move tests related to the old join node.
No tests were added/dropped or modified. They are consolidated into
fewer .test files.

Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Reviewed-on: http://gerrit.cloudera.org:8080/8153
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-28 18:36:17 +00:00
Matthew Jacobs
38bc1c77b8 IMPALA-2375: Disabling/moving tests that don't work with the old HJ
Change-Id: I6d1d0d0edd3b60e854130c4d8b9fcbe765c1aba0
Reviewed-on: http://gerrit.cloudera.org:8080/1173
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-10-07 14:47:40 -07:00
Dimitris Tsirogiannis
2c1f0a4942 IMPALA-1987: Fix TupleIsNullPredicate to return false if no tuples are
nullable.

This commit fixes the issue where an outer join returns wrong results if
the equi-join predicate contains a TupleIssNullPredicate expr.

Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4
Reviewed-on: http://gerrit.cloudera.org:8080/445
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
2015-06-11 03:37:18 +00:00
ishaan
8369c3b13b Remove explicit references to functional_hbase tables from .test files.
Additionally, this patch also disabled the hbase/none test dimension if the
TARGET_FILESYSTEM environment variable is set to either s3 of isilon.

Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb
Reviewed-on: http://gerrit.cloudera.org:8080/74
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-02-23 23:32:41 +00:00
Dimitris Tsirogiannis
30a5d1d452 IMPALA-1526: Invalid tuple idx from IsNullPredicate exprs cause Impala
to crash

This commit fixes the issue where the tuple ids of semi-joined tuples
are falsely added in the list of materialized tuple ids of
IsNullPredicate exprs, causing Impala to crash. The fix is to exclude
semi-joined tuple ids from the list of materialized tuples ids of select
statements.

Change-Id: I93712be9d03dd54dc9172f51a5ba99e85aa05455
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5405
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5434
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-11-26 14:42:54 -08:00
Skye Wanderman-Milne
8ad6ba9f8c IMPALA-1528: TupleIsNullPredicate is never constant
We were treating it as constant before since it has no children and we
didn't override Expr::IsConstant(). However, it's not constant since
it depends on the input tuple, which caused it to blow up when we
tried to evaluate it as a constant expr.

Change-Id: Ic2c3489ba605f03a7644e6ac9107d4310dd0aa7b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5399
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 10db8f1056e8887dc99b4a334283d4d37d5f757c)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5419
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-11-25 18:18:45 -08:00
Dimitris Tsirogiannis
c2abcd6f3d Query transformation of nested queries.
This commit implements nested queries with [NOT] IN, [NOT] EXISTS and
aggregate subquery predicates in Impala. The following cases are
supported:
1. Correlated and uncorrelated [NOT] IN.
2. Correlated [NOT] EXISTS.
3. Correlated and uncorrelated aggregate subqueries.

Change-Id: Ia3f4843c5f07d4e31ef3faedc58a15e623f91a5d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3754
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4109
2014-08-29 15:35:21 -07:00