Added the feature that computes an estimated number of rows in the current
hdfs table if the statistics for the cardinality of the current hdfs table is not
available.
Also added an additional query option to revert the change in case of regression.
Testing:
(1) In CardinalityTest.java, replaced the original statement
"verifyCardinality("SELECT a FROM functional.tinytable", -1);" in
the method testBasicsWithoutStats() with
"verifyCardinality("SELECT a FROM functional.tinytable", 2);".
(2) In CarginalityTest.java, added more tests to check the cardinality
of most PlanNode implementations. For each tested PlanNode, the behaviors
before and after we disable the feature are both tested.
(3) In set.test, modified three related test cases to make sure that
the added query option is included after executing "set all" in various
scenarios.
(4) There are 8 JUnit tests in PlannerTest.java that would produce different
distributed query plans when this feature is enabled. Added an additional
JUnit test for 6 of those 8 affected JUnit tests when this feature is
enabled. Specifically, each tested query in a newly added test files involves
at least one hdfs table without available statistics.
We do not add test cases for 2 of the affected JUnit tests when this feature
is enabled since it results in flaky tests. These two JUnit tests are
testResourceRequirements() and testSpillableBufferSizing(). In this patch
we only test them when the feature is disabled.
(5) There are 5 Python end to end tests that consist of queries that would
produce different results. Added an additional query for each affected query
when this feature is disabled.
Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a
Reviewed-on: http://gerrit.cloudera.org:8080/12974
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.
This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.
I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.
Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
objects, which only have microsecond precision. Therefore result
timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
VARCHAR.
I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.
Limitations:
* Not all of the same methods are implemented as for beeswax, so some
tests that have more complicated interactions with the client will not
work with HS2 yet.
* We don't have a way to get the affected row count for inserts.
I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.
Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.
Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
No tests were added/dropped or modified. They are consolidated into
fewer .test files.
Change-Id: Idda4b34b5e6e9b5012b177a4c00077aa7fec394c
Reviewed-on: http://gerrit.cloudera.org:8080/8153
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
nullable.
This commit fixes the issue where an outer join returns wrong results if
the equi-join predicate contains a TupleIssNullPredicate expr.
Change-Id: I71f05479a442544d578c0d173e2a8412d7bbb3c4
Reviewed-on: http://gerrit.cloudera.org:8080/445
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Additionally, this patch also disabled the hbase/none test dimension if the
TARGET_FILESYSTEM environment variable is set to either s3 of isilon.
Change-Id: I63aecaa478d2ba9eb68de729e9640071359a2eeb
Reviewed-on: http://gerrit.cloudera.org:8080/74
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
to crash
This commit fixes the issue where the tuple ids of semi-joined tuples
are falsely added in the list of materialized tuple ids of
IsNullPredicate exprs, causing Impala to crash. The fix is to exclude
semi-joined tuple ids from the list of materialized tuples ids of select
statements.
Change-Id: I93712be9d03dd54dc9172f51a5ba99e85aa05455
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5405
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5434
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
We were treating it as constant before since it has no children and we
didn't override Expr::IsConstant(). However, it's not constant since
it depends on the input tuple, which caused it to blow up when we
tried to evaluate it as a constant expr.
Change-Id: Ic2c3489ba605f03a7644e6ac9107d4310dd0aa7b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5399
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 10db8f1056e8887dc99b4a334283d4d37d5f757c)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5419
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
This commit implements nested queries with [NOT] IN, [NOT] EXISTS and
aggregate subquery predicates in Impala. The following cases are
supported:
1. Correlated and uncorrelated [NOT] IN.
2. Correlated [NOT] EXISTS.
3. Correlated and uncorrelated aggregate subqueries.
Change-Id: Ia3f4843c5f07d4e31ef3faedc58a15e623f91a5d
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3754
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4109