Python 2.6's ElementTree.write() does not have an xml_declaration
argument, so junitxml_prune_notrun.py fails on python 2.6.
This fixes junitxml_prune_notrun.py by using minidom to write
the output. This mirrors how bin/generate_junitxml.py outputs
XML.
Verified that tests now pass on python 2.6 and python 2.7 does
not change.
Change-Id: I9ef8fb77b1ac8c51e3dfb6b04690ae9ccc490d62
Reviewed-on: http://gerrit.cloudera.org:8080/12479
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There are 100+ backend tests and each requires 400+ MB
of disk space when statically linked (the default).
This requires a large amount of disk space and adds
considerable link time.
This introduces a framework to link multiple backend
tests into a single executable. Currently it does
this for several tests in be/src/util. It saves
about 10GB of space.
It maintains several of the same properties that
the current tests have:
1. "make <testname>" rebuilds that test.
2. It generates an executable shell script with the
same name as the original backend test that runs
the same subset of tests.
3. It generates JUnitXML and log files with names that
match the test name.
4. One can run the shell script with "--gtest_filter"
and run a subset of the tests.
5. ctest commands such as ctest -R continue to function.
It validates at build time that every test linked into
the unified executable is covered by an equivalent test
filter pattern. This means that every test in the unified
executable will run as part of normal testing.
Introducing the framework along with a limited number
of trial backend tests gives us a chance to evaluate
this change before continuing to convert tests.
Change-Id: Ia03ef38719b1fbc0fe2025e16b7b3d3dd4488842
Reviewed-on: http://gerrit.cloudera.org:8080/12124
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds a mechanism to collect host resource usage metrics to
profiles. Metric collection can be controlled through a new query option
'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics
collection will be enabled. Collection always happens per query for all
executors that run one or more fragment instances of the query.
This mechanism adds a new time series counter class that collects all
measured values and does not re-sample them. It will re-sample values
when printing them into a string profile, preserving up to 64 values,
but Thrift profiles will contain the full list of values.
We add a new section "Per Node Profiles" to the profile to store and
show these values:
Per Node Profiles:
lv-desktop:22000:
CpuIoWaitPercentage (500.000ms): 0, 0
CpuSysPercentage (500.000ms): 1, 1
CpuUserPercentage (500.000ms): 4, 0
- ScratchBytesRead: 0
- ScratchBytesWritten: 0
- ScratchFileUsedBytes: 0
- ScratchReads: 0 (0)
- ScratchWrites: 0 (0)
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
This change also uses the aforementioned mechanism to collect CPU usage
metrics (user, system, and IO wait time).
A future change can then add a tool to decode a Thrift profile and plot
the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a
tool is not included in this change because it will require some
reworking of the python dependencies.
This change also includes a few minor improvements to make the resulting
code more readable:
- Extend the PeriodicCounterUpdater to call functions to update global
metrics before updating the counters.
- Expose the scratch profile within the per node resource usage section.
- Improve documentation of the profile counter classes.
- Remove synchronization from StreamingSampler.
- Remove a few pieces of dead code that otherwise would have required
updates.
- Factor some code for profile decoding into the Impala python library
Testing: This change contains a unit test for the system level metrics
collection and e2e tests for the profile changes.
Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Reviewed-on: http://gerrit.cloudera.org:8080/12069
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This implements much of the desire in IMPALA-6664 to tag all log
statements with their query ids. It re-uses the existing ThreadDebugInfo
infrastructure as well as the existing
InstallLogMessageListenerFunction() patch to glog (currently used for
log redaction) to prefix log messages with fragment ids or query ids,
when available. The fragment id is the query id with the last bits
incremented, so it's possible to correlate a given query's log messages.
For example:
$ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO
I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b886800000000] Registered query query_id=85420d575b9ff4b9:402b886800000000 session_id=aa45e480434f0516:101ae5ac12679d94
I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b886800000000] Analyzing query: select count(*) from tpcds.web_sales
I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b886800000000] Analysis finished.
I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b886800000000] Schedule for id=85420d575b9ff4b9:402b886800000000 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B
I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b886800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0)
I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b886800000000] Admitted query id=85420d575b9ff4b9:402b886800000000
I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b886800000000] Exec() query_id=85420d575b9ff4b9:402b886800000000 stmt=select count(*) from tpcds.web_sales
I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b886800000000] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b886800000000 coord=pannier.sf.cloudera.com:22000 #instances=2
I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b886800000000] started execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b886800000000] Executing instance. instance_id=85420d575b9ff4b9:402b886800000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1
I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b886800000001] Executing instance. instance_id=85420d575b9ff4b9:402b886800000001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b886800000001] Instance completed. instance_id=85420d575b9ff4b9:402b886800000001 #in-flight=1 status=OK
I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b886800000000] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b886800000000, node=2
I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b886800000000: first in-progress backend: pannier.sf.cloudera.com:22000
I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b886800000000] get_results_metadata(): query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b886800000000] ExecState: query id=85420d575b9ff4b9:402b886800000000 execution completed
I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b886800000000] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b886800000000] Instance completed. instance_id=85420d575b9ff4b9:402b886800000000 #in-flight=0 status=OK
I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b886800000000
I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b886800000000] Release admission control resources for query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.174815 14821 query-state.cc:604] Cancel: query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.174837 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000001
I0108 10:39:17.174856 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.179621 14752 impala-beeswax-server.cc:239] 85420d575b9ff4b9:402b886800000000] close(): query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.179651 14752 impala-server.cc:1131] 85420d575b9ff4b9:402b886800000000] UnregisterQuery(): query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.179666 14752 impala-server.cc:1238] 85420d575b9ff4b9:402b886800000000] Cancel(): query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:17.179814 14752 coordinator.cc:684] 85420d575b9ff4b9:402b886800000000] CancelBackends() query_id=85420d575b9ff4b9:402b886800000000, tried to cancel 0 backends
I0108 10:39:17.203898 14752 query-exec-mgr.cc:184] 85420d575b9ff4b9:402b886800000000] ReleaseQueryState(): deleted query_id=85420d575b9ff4b9:402b886800000000
I0108 10:39:18.108947 14752 impala-server.cc:1993] 85420d575b9ff4b9:402b886800000000] Connection from client ::ffff:172.16.35.186:52096 closed, closing 1 associated session(s)
I0108 10:39:18.108996 14752 impala-server.cc:1249] 85420d575b9ff4b9:402b886800000000] Closing session: aa45e480434f0516:101ae5ac12679d94
I0108 10:39:18.109035 14752 impala-server.cc:1291] 85420d575b9ff4b9:402b886800000000] Closed session: aa45e480434f0516:101ae5ac12679d94
There are a few caveats here: the thread state isn't "scoped", so the "Closing
session" log statement is technically not part of the query. When that thread
is re-used for another query, it corrects itself. Some threads, like 14821,
aren't using the thread locals. In some case, we should go back and
add GetThreadDebugInfo()->SetQueryId(...) statements.
I've used this to debug some crashes (of my own doing) while running
parallel tests, and it's been quite helpful.
An alternative would be to use Kudu's be/src/kudu/util/async_logger.h,
and add the "Listener" functionality to it directly. Another alternative
would be to re-write all the *LOG macros, but this is quite painful (and
presumably was rejected when log redaction was introduced).
I changed thread-debug-info to capture TUniqueId (a thrift struct with
two int64s) rather than the string representation. This made it easier
to compare with the "0:0" id, which we treat as "unset". If a developer
needs to analyze it from a debugger, gdb can print out hex just fine.
I added some context to request-context to be able to pipe ids through
to disk IO threads as well.
To test this, I moved "assert_log_contains" up to impala_test_suite, and
had it handle the default log location case. The test needs a sleep for
log buffering, but, it seems like a test with a sleep running in
parallel is better than a custom cluster test, which reboots the cluster
(and loads metadata).
Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8
Reviewed-on: http://gerrit.cloudera.org:8080/12129
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A few unversioned artifacts crept in over time without corresponding
.gitignore entries. These are the updates based on the git status output
on my dev env.
Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a
Reviewed-on: http://gerrit.cloudera.org:8080/11787
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
find-fragment-instances didn't show all the threads
that worked on some fragment instance. It missed the
top-level "root threads" that started working on the
fragment instances.
I modified the get_fragment_instances() function to
check the local ThreadDebugInfo object of the threads
instead of checking the parent thread's ThreadDebugInfo.
I tested it locally on a core file.
Change-Id: I35ae1a6b384b002b343689469f02ceabd84af1b6
Reviewed-on: http://gerrit.cloudera.org:8080/11396
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With python 2.6, the syntax "{}".format(1) doesn't work:
$docker run centos:6 python -c 'print "{}".format(1)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: zero length field name in format
generate_junitxml was using this incantation and failing.
I've updated the syntax to be py2.6-friendly, and tested
it like so:
$docker run -v $(pwd):/mnt centos:6 bash -c "yum install -y python-argparse; /mnt/lib/python/impala_py_lib/jenkins/generate_junitxml.py --phase phase --step step --stdout out --stderr err; cat /extra_junit_xml_logs/*.xml"
[output from yum...]
Installed:
python-argparse.noarch 0:1.2.1-2.1.el6
Complete!
Generated: ./extra_junit_xml_logs/generate_junitxml.phase.step.20180904_18_04_56.xml
<?xml version="1.0" ?>
<testsuites errors="0" failures="0" tests="1" time="0.0">
<testsuite disabled="0" errors="0" failures="0" file="None" log="None" name="generate_junitxml.phase.step" skipped="0" tests="1" time="0" timestamp="2018-09-04 18:04:56+00:00" url="None">
<testcase classname="generate_junitxml.phase" name="step">
<system-out>
out
</system-out>
<system-err>
err
</system-err>
</testcase>
</testsuite>
</testsuites>
Change-Id: Ic0c1e837a9ed6c2d59906aed1d1098bde6f5d815
Reviewed-on: http://gerrit.cloudera.org:8080/11384
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a new Impala Python GDB module, and implements
a couple of covenience commands to make core dump analysis
easier.
The initial commands let us find queries and fragment instances
currently executing in an impalad at the time the daemon crashed:
(gdb) source impala-gdb.py
(gdb) find-query-ids
f74c863dff66a34d:1d983cc300000000
364525e12495932b:73f5dd0200000000
bc4a3eec25481981:edda04b800000000
(gdb) find-fragment-instances
Fragment Instance Id Thread IDs
364525e12495932b:73f5dd02000000a2 [69]
364525e12495932b:73f5dd0200000171 [196, 136]
bc4a3eec25481981:edda04b8000001a8 [252, 237, 206]
f74c863dff66a34d:1d983cc30000009b [200, 14, 13, 12, 6, 5, 3, 2]
f74c863dff66a34d:1d983cc30000013a [4]
The commands have only been tested with Impala 2.12, and are not
expected to work with older versions since it uses ThreadDebugInfo
stuff from IMPALA-6416.
It is hoped that people contribute more commands to the module.
Change-Id: I24e7026f2265954ed592d6f62110cf8cb2c2202a
Reviewed-on: http://gerrit.cloudera.org:8080/11352
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The original patch for this Jira relied on a third party python lib
for generating Junit XML output, That proved to be limiting because
setting up the necessary virtualenv across a variety of dev and test
scenarios (private dev environment, jenkins.impala.io, and others)
proved to be confusing and messy.
This update to the script maintains the same functionality and the
same interface, but uses only the python standard library. A symlink
has also been added to Impala/bin for convenience.
Change-Id: I958ee0d8420b6a4197aaf0a7e0538a566332ea97
Reviewed-on: http://gerrit.cloudera.org:8080/11235
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>
This patch adds a script to generate junit XML reports for arbitrary
build steps. It's also being used to seed the creation of an internal
python library for Impala development that can be pip installed into
a development environment.
Change-Id: If6024d74075ea69b8ee20d1fc3cc9c1ff821ba5b
Reviewed-on: http://gerrit.cloudera.org:8080/11128
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: David Knupp <dknupp@cloudera.com>