Instead of materializing empty rows when computing count star, we use
the data stored in the Parquet RowGroup.num_rows field. The Parquet
scanner tuple is modified to have one slot into which we will write the
num rows statistic. The aggregate function is changed from count to a
special sum function that gets initialized to 0. We also add a rewrite
rule so that count(<literal>) is rewritten to count(*) in order to make
sure that this optimization is applied in all cases.
Testing:
- Added functional and planner tests
Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Reviewed-on: http://gerrit.cloudera.org:8080/6812
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, the fault injection utility will inject a fault
on every 3 RPC calls for ReportExecStatus() RPCs. As shown
in IMPALA-5588, with an unfortunate sequence in which other
RPCs happen between the retry of ReportExecStatus() RPC in
QueryState::ReportExecStatusAux(), ReportExecStatus() can
hit injected faults 3 times in a row, causing the query
to be cancelled in QueryState::ReportExecStatusAux().
This change fixes the problem by reducing the fault injection
frequency to once every 16 RPC calls for ReportExecStatus(),
CancelQueryFInstances() and ExecQueryFInstances() RPCs.
Also incorporated the fix by Michael Brown for a python bug in
test_rpc_exception.py so tests hitting unexpected exception
will re-throw that exception for better diagnosis on test failure.
Change-Id: I0ce4445e8552a22f23371bed1196caf7d0a3f312
Reviewed-on: http://gerrit.cloudera.org:8080/7310
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
1. Report incorrect results count in the console log table. Previously,
the stress test knew about incorrect results but only reported them to
the console log inline. In was on the onus of a caller to find this. Now
we have a summed count.
2. Fail the process if there are errors, incorrect results, or timeouts.
Previously, the stress test just counted these, but would not fail its
process. This leads to a much stricter pass criteria for the stress
test. This will allow CI to fail and alert a maintainer that something
went wrong.
Testing:
I modified the result hashes for queries in a local runtime_info.json
and observed the reporting of incorrect results, incremented incorrect
results counts, and ultimately process failure.
Change-Id: I9f2174a527193ae01be45b8ed56315c465883346
Reviewed-on: http://gerrit.cloudera.org:8080/7282
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.
This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.
Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).
I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s. To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.
Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)
Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.
Added an e2e test to make sure that the optimisation is applied in
the backend.
Added planner tests for various cases where codegen should and shouldn't
be disabled.
Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.
Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Reviewed-on: http://gerrit.cloudera.org:8080/7153
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.
Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.
Results: ~4X reduction in catalog update topic size
Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Reviewed-on: http://gerrit.cloudera.org:8080/7268
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, the retry logic in DoRpc() only allows retry to
happen if send() didn't complete successfully and the exception
indicates a closed connection. However, send() returning
successfully doesn't guarantee that the bytes have actually
reached the remote peer. According to the man page of send(),
when the message does not fit into the send buffer of the socket,
send() normally blocks. So the payload of RPC may be buffered in
the kernel if there is room for it. TCP allows a connection to
be half-open. If an Impalad node is restarted, a stale client
connection to that node may still allow send() to appear to succeed
even though the payload wasn't sent. However, upon calling recv()
in the RPC call to fetch the response, the client will get a return
value of 0. In which case, thrift will throw an exception as the
connection to the remote peer is closed already. Apparently, the
existing retry logic doesn't quite handle this case. One can
consistently reproduce the problem by warming the client cache
followed by restarting one of the Impalad nodes. It will result
a series of query failures due to stale connections.
This change augments the retry logic to also retry the entire RPC
if the exception string contains the messages "No more data to read."
or "SSL_read: Connection reset by peer" to capture the case of stale
connections. Our usage of thrift doesn't involve half-open TCP connection
so having a broken connection in recv() indicates the remote end has
closed the socket already. The generated thrift code doesn't knowingly
close the socket before an RPC completes unless the process crashes,
the connection is stale (e.g. the remote node was restarted) or the
remote end fails to read from the client. In either cases, the entire
RPC should just be retried with a new connection.
This change also fixes QueryState::ReportExecStatusAux() to
unconditionally for up to 3 times when reporting exec status of a
fragment instance. Previously, it may break out of the loop early
if RPC fails with 'retry_is_safe' == true (e.g. due to recv() timeout)
or if the connection to coordinator fails (IMPALA-5576). Declaring the
RPC to have failed may cause all fragment instances of a query to be
cancelled locally, triggering query hang due to IMPALA-2990. Similarly,
the cancellation RPC is also idempotent so it should be unconditionally
retried up to 3 times with 100ms sleep time in between.
The status reporting is idempotent as the handler simply ignores
RPC if it determines that all fragment instances on a given backend
is done so it should be safe to retry the RPC. This change updates
ApplyExecStatusReport() to handle duplicated status reports with
done bit set. Previously we would drop some other fragment instances'
statuses if we received duplicate 'done' statuses from the same
fragment instance(s).
Testing done: Warmed up client cache by running stress test followed by
restarting some Impalad nodes. Running queries used to fail or hang
consistently in the past. It now works with patch. Also ran CSL enduranace
tests and exhaustive builds.
Change-Id: I4d722c8ad3bf0e78e89887b6cb286c69ca61b8f5
Reviewed-on: http://gerrit.cloudera.org:8080/7284
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Hash join node currently does not apply the limits correctly.
This issue gets masked most of the times since the planner sticks
an exchange node on top of most of the joins. This issue gets
exposed when NUM_NODES=1.
Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Reviewed-on: http://gerrit.cloudera.org:8080/6778
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Kudu recently added the ability to alter a column's default value
and storage attributes (KUDU-861). This patch adds the ability to
modify these from Impala using ALTER.
It also supports altering a column's comment for non-Kudu tables.
It does not support setting a column to be a primary key or
changing a column's nullability, because those are not supported on
the Kudu side yet.
Syntax:
ALTER TABLE <table> ALTER [COLUMN] <column>
SET <attr> <value> [<attr> <value> [<attr> <value>...]]
where <attr> is one of:
- DEFAULT, BLOCK_SIZE, ENCODING, COMPRESSION (Kudu tables)
- COMMENT (non-Kudu tables)
ALTER TABLE <table> ALTER [COLUMN] <column> DROP DEFAULT
This is similar to the existing CHANGE statement:
ALTER TABLE <table> CHANGE <column> <new_col_name> <type>
[COMMENT <comment>]
but the new syntax is more natural for setting column properties
when the column name and type are not being changed. Both ALTER
COLUMN and CHANGE COLUMN operations use AlterTableAlterColStmt and
are sent to the catalog as ALTER_COLUMN operations.
Testing:
- Added FE tests to ParserTest and AnalyzeDDLTest
- Added EE tests to test_kudu.py
Change-Id: Id2e8bd65342b79644a0fdcd925e6f17797e89ad6
Reviewed-on: http://gerrit.cloudera.org:8080/6955
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-3040 was initially fixed to use a timeout with HDFS caching
tests, however some test executions against slow-running builds such as
ASAN indicate this timeout may not be high enough.
Use the specific_build_type_timeout() method to set a much higher
timeout for slower builds such as ASAN. This allows us to virtually
ignore timeout values on slow builds, but doesn't force us to
unconditionally increase the timeout in a release or debug build.
Testing:
Ran all tests that use get_num_cache_requests() in a loop 100 times each
under an ASAN build. All test iterations passed.
Change-Id: I80f1c8a0e634a3726c53ef7297c5b162dd57a3a2
Reviewed-on: http://gerrit.cloudera.org:8080/7115
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
After the fix for IMPALA-5388, all TSSLException thrown will be
treated as fatal error and the query will fail. Turns out that
this is too strict and in a secure cluster under load, queries
can easily hit timeout waiting for RPC response.
When running without SSL, we call RetryRpcRecv() to retry the recv
part of an RPC if the TSocket underlying the RPC gets an EAGAIN
during recv(). This change extends that logic to cover secure
connection. In particular, we pattern match against the exception
string "SSL_read: Resource temporarily unavailable" which corresponds
to EAGAIN error code being thrown in the SSL_read() path.
Similarly, we will handle closed connection in send() path with
secure connection by pattern matching against the exception string
"TTransportException: Transport not open". To verify that the exception
is thrown during the send part of a RPC call, the RPC client interface
has been augmented to take a bool* argument which is set to true after
the send part of the RPC has completed but before the recv part starts.
If DoRPC() catches an exception and the send part isn't done yet, the
entire RPC if the exception string matches certain substrings which are
safe to retry.
The fault injection utility has also been updated to distinguish between
time out and lost connection to exercise different error handling paths
in the send and recv paths.
Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c
Reviewed-on: http://gerrit.cloudera.org:8080/7229
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This change fixes three issues:
1. File handle caching is expected to be disabled for
remote files (using exclusive HDFS file handles),
however the file handles are still being cached.
2. The retry logic for exclusive file handles is broken,
leading the number of open files to be incorrect.
3. There is no test coverage for disabling the file
handle cache.
To fix issue #1, when a scan range is requesting an
exclusive file handle from the cache, it will always
request a newly opened file handle. It also will destroy
the file handle when the scan range is closed.
To fix issue #2, exclusive file handles will no longer
retry IOs. Since the exclusive file handle is always
a fresh file handle, it will never have a bad file
handle from the cache. This returns the logic to
its state before IMPALA-4623 in these cases. If a
file handle is borrowed from the cache, then the
code will continue to retry once with a fresh handle.
To fix issue #3, custom_cluster/test_hdfs_fd_caching.py
now does both positive and negative tests for the file
handle cache. It verifies that setting
max_cached_file_handles to zero disables caching. It
also verifies that caching is disabled on remote
files. (This change will resolve IMPALA-5390.)
Change-Id: I4c03696984285cc9ce463edd969c5149cd83a861
Reviewed-on: http://gerrit.cloudera.org:8080/7181
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Impala is case insensitive for column names and generally deals
with them in all lower case. Kudu is case sensitive. This can
lead to a problems when a table is created externally in Kudu
with a column name with upper case letters.
This patch solves the problem by having KuduColumn always store
its name in lower case, so that general Impala code that has been
written expecting lower cased column names can use Column.getName()
safely.
It also adds the method KuduColumn.getKuduName(), which returns
the column name in the case that it appears in Kudu. Any code that
passes column names into the Kudu API must call this method first
to get the correct column name.
There are four specific situations fixed by this patch:
- When ordering on a Kudu column, the Analyzer would create
two SlotDescriptors that point to the same column because
registerSlotRef() was being called with inconsistent casing.
It is now always called with the lower cased names.
- 'ADD RANGE PARTITION' would fail to find the range partition
column if it isn't all lower case in Kudu.
- 'ALTER TABLE DROP COLUMN' and 'ALTER TABLE CHANGE' only worked
if the column name was specified in Kudu case.
- 'CREATE EXTERNAL TABLE' called on a Kudu table with column names
that differ only in case now returns an error, since Impala has
no way of handling this situation.
Testing:
- Added e2e tests in test_kudu.py.
- Manually edited functional_kudu to change column names to have
mixed casing and ran the kudu tests.
Change-Id: I14aba88510012174716691b9946e1c7d54d01b44
Reviewed-on: http://gerrit.cloudera.org:8080/6902
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
This change avoids printing blank lines when the Impala
shell fetches 0 rows from a statement.
Change-Id: I6e18ce36be07ee90a16b007b1e30d5255ef8a839
Reviewed-on: http://gerrit.cloudera.org:8080/7055
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The null_count in the statistics field is updated each time a null
value is encountered by parquet table writer. The value is written
to the parquet header if it has one or more null values in the
row_group.
Testing: Modified the existing end-to-end test in the
test_insert_parquet.py file to make sure each parquet header has
the appropriate null_count. Verified the correctness of the nulltable
test and added an additional test which populates a parquet file with
the functional_parquet.zipcode_incomes table and ensures that the
expected null_count is populated.
Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Reviewed-on: http://gerrit.cloudera.org:8080/7058
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds the command line option --ca_cert to the common test
infra CLI options for use alongside --use-ssl. This is useful when
testing against a secured Impala cluster in which the SSL certs are
self-signed. This will allow the SSL request to be validated. Using this
option will also suppress noisy console warnings like:
InsecureRequestWarning: Unverified HTTPS request is being made. Adding
certificate verification is strongly advised. See:
https://urllib3.readthedocs.org/en/latest/security.html
We also go further in this patch and use the warnings module to print
these SSL-related warnings once and only once, instead of all over the
place. In the case of the stress test, this greatly reduces the noise in
the console log.
Testing:
- quick concurrent_select.py calls with and without --ca_cert to observe
that connections still get made and the test runs smoothly. Some of
this testing occurred without warning suppression, so that I could be
sure the InsecureRequestWarnings were not occurring when using
--ca_cert anymore.
- ensured warnings are printed once, not multiple times
Change-Id: Ifb9e466e4b7cde704cdc4cf98159c068c0a400a9
Reviewed-on: http://gerrit.cloudera.org:8080/7152
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
This patch includes a change to the framework to permit the passing
of a username to the run_stmt_in_hive() method in the ImpalaTestSuite
class, but retains the same default value as before.
This is to allow a test to issue a 'select count(*) from foo' query
through hive. Hive needs to set up a job to perform this query, and
HDFS write access to do so. In typical cases, the HDFS user is 'hdfs'.
however it may be necessary to change this depending on the cluster.
On a local mini-cluster, the username appears to be irrelevant, so
this won't affect locally run tests.
Tested by running the core set of tests on a local minicluster to
make sure there were no regressions. Also confirmed that the test
in question now passes on a remote physical cluster.
Change-Id: I1cc8824800e4339874b9c4e3a84969baf848d941
Reviewed-on: http://gerrit.cloudera.org:8080/7046
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
Bug:
When Sentry-based authorization is enabled, a user that isn't authorized
to EXPLAIN a statement that uses a view can still access unauthorized
information, such as view's definition, by running the statement and
asking for the query profile or the execution summary.
Fix:
During query compilation, determine if the user can access the the runtime
profile or the execution summary. Upon request for a runtime profile or
execution summary from a user, determine based on that information and
the user that is asking for the profile if the runtime profile
(or execution summary) will be returned or an authorization error.
The authorization rule enforced is the following:
- User A runs statement S, A asks for profile, A has profile access:
Runtime profile is returned
- User A runs statement S, A asks for profile, A doesn't have profile access:
Authorization error
- User A runs statement S, user B asks for profile:
Authorization error.
This patch doesn't enforce access to the runtime profile or execution summary
through the Web UI.
Change-Id: I2255d587367c2d328590ae8534a5406c4b0c9b15
Reviewed-on: http://gerrit.cloudera.org:8080/7064
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
This fixes three issues with the file handle cache.
The first issue is that ReopenCachedHdfsFileHandle can
destroy the passed in file handle without removing the
reference to it. The old file handle then refers to
a piece of memory that is not a handle in the cache,
so future use of the handle fails with an assert. The
fix is to always overwrite the reference to the file
handle when it has been destroyed.
The second issue is that query_test/test_hdfs_fd_caching.py
should run on anything that supports the hdfs commandline
and tolerate query failure. It's logic is not specific to
file handle caching, so it has been renamed to
query_test/test_hdfs_file_mods.py.
Finally, custom_cluster/test_hdfs_fd_caching.py should not
be running on remote files (S3, ADLS, Isilon, remote
clusters). The file handle cache semantics won't apply on
those platforms.
Change-Id: Iee982fa5e964f6c8969b2eb7e5f3eca89e793b3a
Reviewed-on: http://gerrit.cloudera.org:8080/7020
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, DoRpc() blacklists only a couple of conditions
which shouldn't retry the RPC on exception. This is fragile
as the errors could have happened after the payload has been
successfully sent to the destination. Such aggressive retry
behavior can lead to duplicated row batches being sent, causing
wrong results in queries.
This change fixes the problem by whitelisting the conditions
in which the RPC can be retried. Specifically, it pattern-matches
against certain errors in TSocket::write_partial() in the thrift
library and only retries the RPC in those cases. With SSL enabled,
we will never retry. We should investigate whether there are some
cases in which it's safe to retry.
This change also adds fault injection in the TransmitData() RPC
caller's path to emulate different exception cases.
Change-Id: I176975f2aa521d5be8a40de51067b1497923d09b
Reviewed-on: http://gerrit.cloudera.org:8080/7063
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
This patch sets the default --cm-port (for the CM ApiResource
initialization) based on a new flag, --use-tls, which enables test infra
to talk to CM clusters with TLS enabled. It is still possible to set a
port override, but in general it will not be needed.
Reference:
https://cloudera.github.io/cm_api/epydoc/5.4.0/cm_api.api_client.ApiResource-class.html#__init__
Testing:
Connected both to TLS-disabled and TLS-enabled CM instances. Before this
patch, we would fail hard when trying to talk to the TLS-enabled CM
instance.
Change-Id: Ie7dfa6c400687f3c5ccaf578fd4fb17dedd6eded
Reviewed-on: http://gerrit.cloudera.org:8080/7107
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This change executes the tests added to subplans.test and removes
a test which incorrectly references subplannull_data.test (a file
which does not exist)
Change-Id: I02b4f47553fb8f5fe3425cde2e0bcb3245c39b91
Reviewed-on: http://gerrit.cloudera.org:8080/7038
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
After a single Impalad is restarted, it is possible that order in which
it receives roles and privileges from the statestore is incorrect. The
correct order is for the role to appear first in the update, before the
privilege that references it.
If a user updates a role, its catalog version number can become larger
than the catalog numbers of the privileges that reference it. This
causes the role to come after the privilege in the initial metastore
update.
The issue is fixed by doing two passes over the catalog objects in the
Impalad. The first pass updates the top level objects. The second pass
updates the dependent objects
Testing:
- Added a test that reproduced the problem.
Change-Id: I7072e95b74952ce5a51ea1b6e2ae3e80fb0940e0
Reviewed-on: http://gerrit.cloudera.org:8080/7004
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
On developer machines it can happen that /tmp/minidumps does not exists
when test_minidump_relative_path gets executed. In this case errors from
rmtree should be ignored.
Change-Id: Ifab76a30898805d2df5e7452079a536d8747ac50
Reviewed-on: http://gerrit.cloudera.org:8080/7062
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
While support for TIMESTAMP columns in Kudu tables has been
committed (IMPALA-5137), it does not support TIMESTAMP
column default values.
This supports CREATE TABLE syntax to specify the default
values, but more importantly this fixes the loading of Kudu
tables that may have had default values set on
UNIXTIME_MICROS columns, e.g. if the table was created via
the python client. This involves fixing KuduColumn to hide
the LiteralExpr representing the default value because it
will be a BIGINT if the column type is TIMESTAMP. It is only
needed to call toSql() and toStringValue(), so helper
functions are added to KuduColumn to encapsulate special
logic for TIMESTAMP.
TODO: Add support and tests for ALTER setting the default
value (when IMPALA-4622 is committed).
Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Reviewed-on: http://gerrit.cloudera.org:8080/6936
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This is a migration from an old and broken script from another
repository. Example use:
bin/single_node_perf_run.py --ninja --workloads targeted-perf \
--load --scale 4 --iterations 20 --num_impalads 3 \
--start_minicluster --query_names PERF_AGG-Q3 \
$(git rev-parse HEAD~1) $(git rev-parse HEAD)
The script can load data, run benchmarks, and compare the statistics
of those runs for significant differences in performance. It glues
together buildall.sh, bin/load-data.py, bin/run-workload.py, and
tests/benchmark/report_benchmark_results.py.
Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9
Reviewed-on: http://gerrit.cloudera.org:8080/6818
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
PARQUET_FILE_SIZE query option doesn't work with ADLS because the
AdlFileSystem doesn't have a notion of block sizes. And impala depends
on the filesystem remembering the block size which is then used as the
target parquet file size (this is done for Hdfs so that the parquet file
size and block size match even if the parquet_file_size isn't a valid
blocksize).
We special case for Adls just like we do for S3 to bypass the
FileSystem block size, and instead just use the requested
PARQUET_FILE_SIZE as the output partitions block_size (and consequently
the parquet file target size).
Testing: Re-enabled test_insert_parquet_verify_size() for ADLS.
Also fixed a miscellaneous bug with the ADLS client listing helper function.
Change-Id: I474a913b0ff9b2709f397702b58cb1c74251c25b
Reviewed-on: http://gerrit.cloudera.org:8080/7018
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.
Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.
Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Reviewed-on: http://gerrit.cloudera.org:8080/6877
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.
There are two parts to this transaction:
1. Enable file handle caching by default for local files
2. Share the file handle between scan ranges from the same
file
Local scan ranges no longer maintain their own Hdfs file
handles. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.
To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.
If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs or the
scan range is remote, the scan range will get a
file handle from the cache and hold it until the
scan range is closed. This mimics the existing behavior,
except the file handle stays in the cache and is owned
by the cache. Since it is in use, it will not be evicted.
If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.
Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.
Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Reviewed-on: http://gerrit.cloudera.org:8080/6478
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
The main idea of this patch is to use table stats to
extrapolate the row counts for new/modified partitions.
Existing behavior:
- Partitions that lack the row count stat are ignored
when estimating the cardinality of HDFS scans. Such
partitions effectively have an estimated row count
of zero.
- We always use the row count stats for partitions that
have one. The row count may be innaccurate if data in
such partitions has changed significantly.
Summary of changes:
- Enhance COMPUTE STATS to also store the total number
of file bytes in the table.
- Use the table-level row count and file bytes stats
to estimate the number of rows in a scan.
- A new impalad startup flag is added to enable/disable
the extrapolation behavior. The feature is disabled by
default. Note that even with the feature disabled,
COMPUTE STATS stores the file bytes so you can enable
the feature without having to run COMPUTE STATS again.
Testing:
- Added new FE unit test
- Added new EE test
Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Reviewed-on: http://gerrit.cloudera.org:8080/6840
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.
We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.
For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.
The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.
Added another dependency to bootstrap_build.sh for the ADLS Python
client.
Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.
Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.
In this patch, we add a REFRESH FUNCTIONS <db> statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.
Testing:
- Ran a private build which passed.
Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Reviewed-on: http://gerrit.cloudera.org:8080/6878
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins
Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.
This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.
Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.
Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Reviewed-on: http://gerrit.cloudera.org:8080/6707
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins
Syntax:
<tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)]
The first number specifies the percent of table bytes to sample.
The second number specifies the random seed to use.
The sampling is coarse-grained. Impala keeps randomly adding
files to the sample until at least the desired percentage of
file bytes have been reached.
Examples:
SELECT * FROM t TABLESAMPLE SYSTEM(10)
SELECT * FROM t TABLESAMPLE SYSTEM(50) REPEATABLE(1234)
Testing:
- Added parser, analyser, planner, and end-to-end tests
- Private core/hdfs run passed
Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Reviewed-on: http://gerrit.cloudera.org:8080/6868
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
We use the new libHDFS API hdfsGetLastExceptionRootCause() to return
the last seen HDFS error on that thread.
This patch depends on the recent HDFS commit:
fda86ef2a3
Testing: A test has been added which puts HDFS in safe mode and then
verifies that we see a 255 error with the root cause.
Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d
Reviewed-on: http://gerrit.cloudera.org:8080/6894
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
The sortby() hint is superseded by the SORT BY SQL clause, which has
been introduced in IMPALA-4166. This changes removes the hint.
Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594
Reviewed-on: http://gerrit.cloudera.org:8080/6885
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
Previously, updates to the query state in ClientRequestState were
not immediately reflected in the query profile, potentially
leading to the profile showing an incorrect state for an extended
perioud during execution.
In particular, queries were being shown in the 'CREATED' state
long after they had started 'RUNNING'.
The fix is to update the profile whenever the state is updated.
Testing:
- Extended existing hs2 tests and added a beeswax test to check
for expected query states in the profile
Change-Id: I952319b7308a24d4e2dff924199c0c771bce25b3
Reviewed-on: http://gerrit.cloudera.org:8080/6923
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
By default, Kudu assumes it has 80% of system memory which
is far too high for the minicluster. This sets a mem limit
of 2gb and lowers the limit of the block cache. These values
were tested on a gerrit-verify-dryrun job as well as an
exhaustive run.
This patch also simplifies TestKuduMemLimits which was
unnecessarily creating a large table during test execution.
Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5
Reviewed-on: http://gerrit.cloudera.org:8080/6844
Reviewed-by: Todd Lipcon <todd@apache.org>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This change switches to a new Breakpad version, which includes fixes for
Breakpad bugs #681 and #728. The toolchain change was reviewed here:
https://gerrit.cloudera.org/6866
The change also undoes the workaround introduced in IMPALA-3794.
In addition to running test_breakpad.py in a loop for a while, I tested
Then I verified that the test fails with the old toolchain version
(88e5b2) and works with the new one (ffe3e4).
To test #728 I added a sleep() call before SendContinueSignalToChild()
and then killed the parent process, manually observing that the child
would die, too.
Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19
Reviewed-on: http://gerrit.cloudera.org:8080/6883
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins
A memory intensive UDF test takes a while to completely finish and for
the memory in Impala to be completely freed. This caused a problem in
ASAN builds (and potentially in normal builds) because we would start
the next test right away, before the memory is freed.
We fix the issue by checking that all fragments finish executing before
starting the next test.
Testing:
- Ran a private ASAN build which passed.
Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe
Reviewed-on: http://gerrit.cloudera.org:8080/6893
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
A test that was recently added, test_observability::test_scan_summary,
uses an HBase table. It needs to be restricted not to run on S3,
localFS or Isilon.
Change-Id: I9863cf3f885eb1d2152186de34e093497af83d99
Reviewed-on: http://gerrit.cloudera.org:8080/6859
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
Adds Impala support for TIMESTAMP types stored in Kudu.
Impala stores TIMESTAMP values in 96-bits and has nanosecond
precision. Kudu's timestamp is a 64-bit microsecond delta
from the Unix epoch (called UNIXTIME_MICROS), so a conversion
is necessary.
When writing to Kudu, TIMESTAMP values in nanoseconds are
averaged to the nearest microsecond.
When reading from Kudu, the KuduScanner returns
UNIXTIME_MICROS with 8bytes of padding so Impala can convert
the value to a TimestampValue in-line and copy the entire
row.
Testing:
Updated the functional_kudu schema to use TIMESTAMPs instead
of converting to STRING, so this provides some decent
coverage. Some BE tests were added, and some EE tests as
well.
TODO: Support pushing down TIMESTAMP predicates
TODO: Support TIMESTAMPs in range partitioning expressions
Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Reviewed-on: http://gerrit.cloudera.org:8080/6526
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
For scan nodes, previously only HDFS tables showed the name
of the table in the 'Detail' section for the scan node. This
change adds the table name for all scan node types (Kudu,
HBase, and DataSource).
Testing:
- Added an e2e test in test_observability.
Change-Id: If4fd13f893aea4e7df8a2474d7136770660e4324
Reviewed-on: http://gerrit.cloudera.org:8080/6832
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
This change adds functionality to write and read parquet::Statistics for
Decimal, String, and Timestamp values. As an exception, we don't read
statistics for CHAR columns, since CHAR support is broken in Impala
(IMPALA-1652).
This change also switches from using the deprecated fields 'min' and
'max' to populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.
The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating
point.
This change removes the validation of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It uses that file to exercise the fallback logic for
supported types in a test.
This change also cleans up the interface of ParquetPlainEncoder in
parquet-common.h.
Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Reviewed-on: http://gerrit.cloudera.org:8080/6563
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Lars Volker <lv@cloudera.com>
The Parquet file column reader may fail in the middle
of producing a scratch tuple batch for various reasons
such as exceeding memory limit or cancellation. In which
case, the scratch tuple batch may not have materialized
all the rows in a row group. We shouldn't erroneously
report that the file is corrupted in this case as the
column reader didn't completely read the entire row group.
A new test case is added to verify that we won't see this
error message. A new failpoint phase GETNEXT_SCANNER is
also added to differentiate it from the GETNEXT in the
scan node itself.
Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
Reviewed-on: http://gerrit.cloudera.org:8080/6787
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Coordinator:
- FragmentInstanceState -> BackendState, which in turn records
FragmentInstanceStats
QueryState
- does query-wide setup in a separate thread (which also launches
the instance exec threads)
- has a query-wide 'prepared' state at which point all static setup
is done and all FragmentInstanceStates are accessible
Also renamed QueryExecState to ClientRequestState.
Simplified handling of execution status (in FragmentInstanceState):
- status only transmitted via ReportExecStatus rpc
- in particular, it's not returned anymore from the Cancel rpc
FIS: Fixed bugs related to partially-prepared state (in Close() and ReleaseThreadToken())
Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Reviewed-on: http://gerrit.cloudera.org:8080/6535
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Impala Public Jenkins
This change fixed IMPALA-4873 by adding the capability to supply a dict
'test_file_vars' to run_test_case(). Keys in this dict will be replaced
with their values inside test queries before they are executed.
Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70
Reviewed-on: http://gerrit.cloudera.org:8080/6817
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
This was a poorly written test that relies on assumptions about
the behavior of 'rand' and the order that rows get processed in
a table that Impala doesn't actually guarantee.
The new version is still sensitive to the precise behavior of
'rand()', but shouldn't be flaky unless that behavior is changed.
Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Reviewed-on: http://gerrit.cloudera.org:8080/6775
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.
After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.
New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE <file> will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.
HDFS tables created by Impala using CREATE TABLE LIKE <other table>
will copy the property of the table that is copied.
This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.
Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Reviewed-on: http://gerrit.cloudera.org:8080/5939
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins