Commit Graph

3 Commits

Author SHA1 Message Date
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
2b550634d2 IMPALA-11952 (part 2): Fix print function syntax
Python 3 now treats print as a function and requires
the parenthesis in invocation.

print "Hello World!"
is now:
print("Hello World!")

This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.

print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)

To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function

This also fixes random flake8 issues that intersect with
the changes.

Testing:
 - check-python-syntax.sh shows no errors related to print

Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Lars Volker
b5714097e0 IMPALA-7694: Add host resource usage metrics to profile
This change adds a mechanism to collect host resource usage metrics to
profiles. Metric collection can be controlled through a new query option
'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics
collection will be enabled. Collection always happens per query for all
executors that run one or more fragment instances of the query.

This mechanism adds a new time series counter class that collects all
measured values and does not re-sample them. It will re-sample values
when printing them into a string profile, preserving up to 64 values,
but Thrift profiles will contain the full list of values.

We add a new section "Per Node Profiles" to the profile to store and
show these values:

Per Node Profiles:
  lv-desktop:22000:
    CpuIoWaitPercentage (500.000ms): 0, 0
    CpuSysPercentage (500.000ms): 1, 1
    CpuUserPercentage (500.000ms): 4, 0
      - ScratchBytesRead: 0
      - ScratchBytesWritten: 0
      - ScratchFileUsedBytes: 0
      - ScratchReads: 0 (0)
      - ScratchWrites: 0 (0)
      - TotalEncryptionTime: 0.000ns
      - TotalReadBlockTime: 0.000ns

This change also uses the aforementioned mechanism to collect CPU usage
metrics (user, system, and IO wait time).

A future change can then add a tool to decode a Thrift profile and plot
the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a
tool is not included in this change because it will require some
reworking of the python dependencies.

This change also includes a few minor improvements to make the resulting
code more readable:
- Extend the PeriodicCounterUpdater to call functions to update global
  metrics before updating the counters.
- Expose the scratch profile within the per node resource usage section.
- Improve documentation of the profile counter classes.
- Remove synchronization from StreamingSampler.
- Remove a few pieces of dead code that otherwise would have required
  updates.
- Factor some code for profile decoding into the Impala python library

Testing: This change contains a unit test for the system level metrics
collection and e2e tests for the profile changes.

Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Reviewed-on: http://gerrit.cloudera.org:8080/12069
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-05 20:25:51 +00:00