precision.
This commit follows 16d8dd58.
This patch adds a test case that inspects the thrift profile of a
completed query, and verifies that the "Start Time" and
"End Time" of the query have nanosecond precision. We chose to
work with the thrift profile directly, rather than parse the debug
web page, as it is the thrift profile which is consumed by
management API clients of Impala.
Change-Id: Id3421a34cc029ebca551730084c7cbd402d5c109
Reviewed-on: http://gerrit.cloudera.org:8080/8784
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
test_profile_fragment_instances was recently added to verify that the
final runtime profile for a query has the expected fragments and exec
nodes. The test fails on local filesystem builds, though, as it
assumes there will be 3 impalads and therefore 3 fragment instances,
but there is only 1 impalad on local filesystem builds.
The fix is to disable the test on local filesystem builds.
Change-Id: I2c98f160406081626f17709809b8efee9eae1450
Reviewed-on: http://gerrit.cloudera.org:8080/8809
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
test_basic_filters has been occasionally failing due to a line missing
from a runtime profile for a particular query.
The problem is that the query returns all of its results before all of
its fragment instances are finished executing (due to a limit). Then,
when one fragment instance reports its status, the coordinator returns
to it a 'cancelled' status, causing all remaining instances for that
backend to be cancelled.
Sometimes this cancellation happens quickly enough that the relevant
fragment instances have not yet sent a status report when they are
cancelled. They will still send a report in finalize, but as the
coordinator only updates its runtime profile for 'ok' status reports,
not 'cancelled', the final runtime profile doesn't end up with any
data for those fragment instances, which means the test does not find
the line in the runtime profile its checking for.
The fix is to have the coordinator update its runtime profile with
every status report it recieves, regardless of error status.
Testing:
- Ran existing runtime profile tests, which rely on profile output,
in a loop.
- Manually tested some scenarios with failed queries and checked that
the new profile output is reasonable.
- Added a new e2e test that runs the affected query and checks for the
presence of info for all expected exec node in the profile. This
repros the underlying issue consistently.
Change-Id: I4f581c7c8039f02a33712515c5bffab942309bba
Reviewed-on: http://gerrit.cloudera.org:8080/8754
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This separation will help the user better understand the query
runtime profile.
Testing:
Modified an existing test case.
Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876
Reviewed-on: http://gerrit.cloudera.org:8080/7721
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Fix to populate the non-default query options set by planner in the
runtime profile.
Added a corresponding test case.
Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab
Reviewed-on: http://gerrit.cloudera.org:8080/7419
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.
We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.
For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.
The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.
Added another dependency to bootstrap_build.sh for the ADLS Python
client.
Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.
Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Previously, updates to the query state in ClientRequestState were
not immediately reflected in the query profile, potentially
leading to the profile showing an incorrect state for an extended
perioud during execution.
In particular, queries were being shown in the 'CREATED' state
long after they had started 'RUNNING'.
The fix is to update the profile whenever the state is updated.
Testing:
- Extended existing hs2 tests and added a beeswax test to check
for expected query states in the profile
Change-Id: I952319b7308a24d4e2dff924199c0c771bce25b3
Reviewed-on: http://gerrit.cloudera.org:8080/6923
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
A test that was recently added, test_observability::test_scan_summary,
uses an HBase table. It needs to be restricted not to run on S3,
localFS or Isilon.
Change-Id: I9863cf3f885eb1d2152186de34e093497af83d99
Reviewed-on: http://gerrit.cloudera.org:8080/6859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
For scan nodes, previously only HDFS tables showed the name
of the table in the 'Detail' section for the scan node. This
change adds the table name for all scan node types (Kudu,
HBase, and DataSource).
Testing:
- Added an e2e test in test_observability.
Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Reviewed-on: http://gerrit.cloudera.org:8080/6832
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-3002:
The shell prints an incorrect value for '#Rows' in the exec
summary for broadcast nodes due to incorrect logic around
whether to use max or agg stats. This patch makes the behavior
consistent with the way the be treats exec summaries in
summary-util.cc. This incorrect logic was also duplicated in
the impala_beeswax test framework.
IMPALA-1473:
When there is a merging exchange with a limit, we may copy rows
into the output batch beyond the limit. In this case, we currently
update the output batch's size to reflect the limit, but we also
need to update ExecNode::num_rows_returned_ or the exec summary
may show that the exchange node returned more rows than it really
did.
Additionally, PlanFragmentExecutor::GetNext does not update
rows_produced_counter_ in some cases, leading the runtime profile
to display an incorrect value for 'RowsProduced'.
Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa
Reviewed-on: http://gerrit.cloudera.org:8080/4679
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins