This patch changes the Parquet scanner to check if it can't read the
full footer scan range, indicating that file has been overwritten by a
shorter file without refreshing the table metadata. Before it would
DCHECK. This patch adds a test for this case, as well as the case
where the new file is longer than the metadata states (which fails
with an existing error).
Change-Id: Ie2031ac2dc90e4f2573bd3ca8a3709db60424f07
Reviewed-on: http://gerrit.cloudera.org:8080/1084
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
If CTAS query fails during the DML part Impala
should drop the newly created table.
Change-Id: I39e04a6923a36afa48f3252addd50ddda83d1706
(cherry picked from commit e03ce43585f68590a95038341e74db458f34bf32)
Reviewed-on: http://gerrit.cloudera.org:8080/870
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
The code in resource-broker.cc that makes RPCs to Llama will
attempt to retry the RPC some number of times (which is
configurable) if the RPC returns a failure. If the RPC
throws (which thrift may do), we try to reset the connection
and then make the RPC again, but this time not guarded by a
try/catch block. If this RPC throws, the process will crash.
This fixes the issue by removing the try/catch and instead
using the ClientCache DoRpc function which handles this
already. Some additional Llama RPC calling wrappers were
removed as well.
Change-Id: Iba5add47a77fe9257e73eea5711ef4b948abe76a
Reviewed-on: http://gerrit.cloudera.org:8080/881
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
A disaster recovery application that uses thrift to directly execute
DDL (instead of using SQL) stopped working. A client call to drop a
function ended up down the code path to drop a view. Apparently
commit 47d061 messed up the enum ordering for old clients by adding
TRUNCATE_TABLE in the middle of the enum list. The fix is to move
TRUNCATE_TABLE to the end. Commit 47d061 was never released so this
there shouldn't be concern about breaking newer clients.
Change-Id: I79ebec65497077471a37e5712061c418403a336a
Reviewed-on: http://gerrit.cloudera.org:8080/899
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
In some cases the planner generated plans with rows with no
materialized tuples. Recent changes to the backend caused these to
hit a DCHECK. This patch addresses one case in the planner where it
was possible to create such plans: when the planner generated an
empty node from a select subquery with no from clause. The fix is to
create a materialized tuple based on the select list expressions, in
the same way as we handle these selects when the planner cannot
statically determine they have no result rows.
An example query is included as a test.
It also adds additional checks to the frontend and backend to catch
these invalid rows earlier.
Change-Id: I851f2fb5d389471d0bb764cb85f3c49031a075e4
Reviewed-on: http://gerrit.cloudera.org:8080/911
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Describe should work if given a path that references to a complex-typed
column of a table. It should produce output that lists the names and
types of all valid subpaths of the column (e.g. struct fields, or
key/val for a map).
This changes some error messages in resolving paths, since we can no
longer definitively determine based on the path length whether the first
path element is meant to be the db or the table.
Change-Id: I8a54e83df67141011ff5396c98f9eb0bde0fb04c
Reviewed-on: http://gerrit.cloudera.org:8080/863
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Tmp devices are blacklisted when a write error is encountered for that
device. No more scratch space will be allocated on the blacklisted
device, based on the assumption that the device is likely to be
misconfigured or failing.
This patch does not attempt to recover the query that experienced the
write error. It also does not attempt to remap any existing blocks away
from the temporary device.
This behaviour is unit tested for several failure scenarios.
This patch adds additional test infrastructure required for testing
BufferedBlockMgr behavior in the presence of faults and in
configurations with multiple tmp directories.
Adds metrics tmp-file-mgr.active-scratch-dirs and
tmp-file-mgr.active-scratch-dirs.list that track the number and set of
active scratch dirs and expose it in the Impala web UI.
Change-Id: I9d80ed3a7afad6ff8e5d739b6ea2bc0949f16746
Reviewed-on: http://gerrit.cloudera.org:8080/579
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch modifies the Parquet scanner to resolve nested schemas, and
read and materialize collection types. The high-level modification is
to create a CollectionColumnReader that recursively materializes map-
and array-type slots.
This patch also adds many tests, most of which query a new table
called complextypestbl. This table contains hand-generated data that
is meant to expose edge cases in the scanner. The tests mostly test
the scanner, with a few tests of other functionality (e.g. array
serialization).
I ran a local benchmark comparing this scanner code to the original
scanner code on an expanded version of tpch_parquet.lineitem with
48009720 rows. My benchmark involved selecting different numbers of
columns with a single scanner thread, and I looked at the HDFS scan
node time in the query profiles. This code introduces a 10%-20%
regression in single-threaded scan time.
Change-Id: Id27fb728934e8346444f61752c9278d8010e5f3a
Reviewed-on: http://gerrit.cloudera.org:8080/576
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
See comment in Descriptors.thrift for what the materialized path is.
Change-Id: I64d00cf1bc2edcbbed3b6cdd5e934c55fff70a49
Reviewed-on: http://gerrit.cloudera.org:8080/650
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Add in missing dfloor alias. This should have been added as part of
IMPALA-1660 as an alias for floor(double) but was overlooked.
Also add in aliases for decimal versions of functions where they exist.
Change-Id: Icb790745714882248d365274e95d45eaaf0ba133
Reviewed-on: http://gerrit.cloudera.org:8080/697
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch extends the deduplication of tuples in row batches to work on
non-adjacent tuples. This deduplication requires an additional data
structure (a hash table) and adds additional performance overhead (up to
3x serialization time), so it is only enabled for row batches with
compositions that are likely to blow up due to non-adjacent duplication
of large tuples. This avoids performance regression in typical cases,
while preventing size blow-ups in problematic cases, such as joining
three streams of tuples some of which contain may contain large
collections.
A test is included that ensures that adjacent deduplication is enabled.
The row batch serialize benchmark shows that deduplication does not regress
performance of serialization or deserialization.
Change-Id: I3c71ad567d1c972a0f417d19919c2b28891fb407
Reviewed-on: http://gerrit.cloudera.org:8080/573
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This commit adds partial support for column-level authorization in
Impala using the Sentry Service. The following changes are included:
* Added support for parsing and analyzing GRANT/REVOKE statements with column-level
privileges. The supporting syntax is:
- GRANT SELECT (<col_names>) ON TABLE <table_name>
TO [ROLE] <role_name> [WITH GRANT OPTION]
- REVOKE [GRANT OPTION FROM] SELECT (<col_names>) ON
TABLE <table_name> FROM [ROLE] <role_name>
* Added support for storing column-level privileges in the Catalog Service and updating
the Sentry Service when GRANT/REVOKE statements are executed.
* Modified the SHOW GRANT ROLE statement to include information about
column-level privileges.
Subsequent patches will add support for enforcing column-level
privileges in SQL queries and other statements.
Change-Id: I0fd9daa92cc5147cb6f4b25eb9651aab8bf3049f
Reviewed-on: http://gerrit.cloudera.org:8080/607
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
When the `numRows` parameter stored in the table properties is
errornously set to 0 and a number of non-empty files are present
the table statistics are considered to be corrupt.
To hint that there might be a problem, the explain statement will emit
an additional warning if it detects potentially corrupt table stats like
in the following example:
Estimated Per-Host Requirements: Memory=42.00MB VCores=1
WARNING: The following tables have potentially corrupt table and/or
column statistics.
compute_stats_db.corrupted
03:AGGREGATE [FINALIZE]
| output: count:merge(*)
|
02:EXCHANGE [UNPARTITIONED]
|
01:AGGREGATE
| output: count(*)
|
00:SCAN HDFS [compute_stats_db.corrupted]
partitions=1/2 files=1 size=24B
In addition, the small query optimization is disabled for such queries.
Change-Id: I0fa911f5132aa62195b854248663a94dcd8b14de
Reviewed-on: http://gerrit.cloudera.org:8080/689
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
This patch allows administrators to configure all Impala daemons with a
password for the private key file used to negotiate connections with
clients which present the corresponding public key. This private key is
obtained by running a user-supplied shell command and using the result.
The command is supplied by setting --ssl_private_key_password_cmd. The
output of the command is truncated to a maximum of 1024 bytes (this is a
limitation of RunShellProcess(), but should not be significant for this
use case), and then all trailing whitespace is trimmed (this is to avoid
unexpected trailing newlines etc. from shell output).
If the password is incorrect clients will be unable to connect to the
server, whether or not they have the correct public key. If the command
exits with an error, the server will not start.
Change-Id: Icc13933fdf50a6170c859989626da5772fe5040d
Reviewed-on: http://gerrit.cloudera.org:8080/623
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch updates AssembleRows() to have fewer exit and error paths,
as well as to explicitly distinguish between the row group being
finished and an error occurring. It functionally changes the behavior
in only two minor ways:
- The entire row group will be read regardless of how many values
the file metadata says there are. Previously it would only read up
to the number stated in the metadata, and then had extra logic for
checking if there were any values remaining.
- If abort_on_error is false and there is an error reading a row
group, subsequent row groups will still be read (except if
OOM). Before this would sometimes happen and sometimes not.
Change-Id: Id1836cfe2a507e46cb030be32b4c1553f478f639
Reviewed-on: http://gerrit.cloudera.org:8080/624
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
This patch changes HdfsScanNode.init() to collect conjuncts that can be evaluated
while materializing the items (tuples) of collection-typed slots, and assign these
conjuncts to the scan node.
Limitation: Conjuncts that must first be migrated into inline views and that cannot
be captured by slot binding will not be assigned here, but in an UnnestNode.
This limitation applies to conjuncts bound by inline-view slots that are backed by
non-SlotRef exprs in the inline-view's select list. We only capture value transfers
between slots, and not between arbitrary exprs.
Change-Id: I20f2522070b257411c5e5d4ba9430e74b215308f
Reviewed-on: http://gerrit.cloudera.org:8080/665
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.
Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.
Change-Id: I238ec7dc0080f661847e5e1b84e30d61c3b0bb5c
Reviewed-on: http://gerrit.cloudera.org:8080/652
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Added the SPLIT_PART and the REGEXP_LIKE builtin functions and tests for both.
The REGEXP_LIKE has an optional third parameter which if used, uses a different
'prepare' function (RegexpLikePrepare in like-predicate.cc) so that the appropriate
options can be set in the RE2 library.
Added a patch for the RE2 library so that the 'dot matches all' option is exposed
via the RE2 class.
Fixed a bug in the case when the function to be evaluated for the WHERE clause
operates on constants, proper cleanup isn't guaranteed on certain edge cases.
Change-Id: Ia2a8de9eeb2854100a2d949f612cfaba317c5a7b
Reviewed-on: http://gerrit.cloudera.org:8080/501
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
It turns out there is a variety of cases where boost incorrectly adds
intervals if the interval is at (or beyond) an edge case value. This
change defines a max interval and returns NULL if the user supplies
an interval beyond the max.
Change-Id: I4fb6869be22ab06089b66eeffaea04b0c0880080
Reviewed-on: http://gerrit.cloudera.org:8080/492
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Fixes the comment about how the runtime profile tree is flattened using
pre-order traversal instead of in-order traversal. The implementation in
RuntimeProfile::ToThrift() shows exactly that.
Change-Id: Ib6c3dc7506a14d6b1d467177669b6d701ffedd45
Reviewed-on: http://gerrit.cloudera.org:8080/615
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Implement nested-loop join in Impala with support for multiple join
modes, including inner, outer, semi and anti joins. Null-aware left
anti-join is not currently supported.
Summary of changes:
Introduced the NestedLoopJoinNode class in the FE that represents the nested
loop join. Common functionality between NestedLoopJoinNode and HashJoinNode
(e.g. cardinality estimation) was moved to the JoinNode class.
In the BE, introduced the NestedLoopJoinNode class that implements the nested-loop
join execution strategy.
Change-Id: Id65a1aae84335bba53f06339bdfa64a1b0be079e
Reviewed-on: http://gerrit.cloudera.org:8080/457
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.
Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The ClientCache has a set of metrics that are registered by
calling the Init() function. This adds the missing call to the
ResourceBroker's ClientCache Init() and adds the metric
definitions.
Change-Id: I879e8a176021589d28d2276fd7b3e5edc08fefb7
Reviewed-on: http://gerrit.cloudera.org:8080/569
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Bit and byte functions for compatibility with Teradata: bitand, bitor, bitxor, bitnot,
countset, getbit, setbit, shiftleft, shiftright, rotateleft, rotateright.
Interfaces and behavior follow Teradata documentation.
All bit* functions are compatible with DB2. bitand only is compatible with Oracle.
Change-Id: Idba3fb7beb029de493b602e6279aa68e32688df3
Implements suffix n! operator for factorial and factorial function.
Slightly refactor operators in fe to share code between unary operators.
Based partially on work by Arthur Peng <arthur.peng@intel.com>.
Change-Id: I71b6c824c59fc5305f16b8c4457805126a1da93b
Reviewed-on: http://gerrit.cloudera.org:8080/531
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Most of this patch is rewriting the schema resolution logic to handle
recursive schemas. The other changes are for reading and codegening
recursive schemas.
Change-Id: I257db05e02ed99c62c8dcfd0136b9e8f392d5933
Reviewed-on: http://gerrit.cloudera.org:8080/86
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
Boost handles a couple of edge cases differently than other databases
such as Postgres and MySQL when adding year/month intervals to
timestamps. This change makes Impala consistent for the other databases.
The performance difference was not noticeable (<5% if any).
Change-Id: Icb02a06281b53753938cab88e0d28f20709fee06
Reviewed-on: http://gerrit.cloudera.org:8080/489
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This patch adds a way to allow for dynamic progress reporting in the
shell. There are two new command line flags for the shell
--live_progress - will print the completed vs total # of scan ranges
--live_summary - prints an updated exec summary
In addition to the command line flags, these options can be set from
within the shell using:
set LIVE_SUMMARY=True
set LIVE_PROGRESS=True
The new options will be listed under shell options. Both reports will be
updated at most every second, for longer running queries it will be
adjusted to the time between two RPC calls to get the query status. To
provide this information in the ExecSummary, the Thrift structure for
the ExecSummary was extended to contain a progress indicator. The output
is printed to stderr and only available in interactive mode.
An example video is available here:
https://asciinema.org/a/5wi7ypckx4ol4ha1hlg3e3q1k
Change-Id: I70b2ab5fa74dc2ba5bc3b338ef13ddc6ccf367d2
Reviewed-on: http://gerrit.cloudera.org:8080/508
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
An upcoming patch will add a function that will not be user visible.
This patch allows a non-visible function to be added in the same way
that visible functions are added (using impala_functions.py).
Change-Id: I70971ced0d595a7aaa975985e589d2676423e221
Reviewed-on: http://gerrit.cloudera.org:8080/528
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The item tuple descriptor is already set in SlotDescriptor.java, this
patch just plumbs it through to the backend.
Change-Id: I4b67ef50ccfde422829d4d2698b04b32666746be
Reviewed-on: http://gerrit.cloudera.org:8080/483
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Internal Jenkins
The mdl file will be consumed by CM. They have asked for the units
to be lower-case.
Change-Id: Iacc583ff2c1680ec02a41feab558fbb2890d95be
Reviewed-on: http://gerrit.cloudera.org:8080/499
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
Before we only had type information for each SlotDescriptor, rather
than for the entire table. However, in order to do schema resolution
for nested fields, we need to be able to traverse the table schema
starting from the table-level columns.
We could theoretically expose only the paths needed to resolve each
slot, but it's simpler to have the whole table.
Change-Id: I026c1f1f552d1ac5d1b267f876e1c39a258714b5
Reviewed-on: http://gerrit.cloudera.org:8080/404
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Internal Jenkins
The user() builtin always returns the connecteduser. However, if the
client wants to see which user its queries are actually delegated to,
there was no easy way to do that.
This patch adds effective_user(), which returns the proxy delegated user
for authorization purposes. If no delegated user is set, the effective
user is the same as that returned from user().
The only way to test this is via a new custom cluster test, which sets
impala.doas.user so that the effective user might be different from the
connected one.
Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76
Reviewed-on: http://gerrit.cloudera.org:8080/458
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
In CMake 2.6 dependencies for custom targets could not be expressed
inline. This patch fixes an issue when generating the thrift files with
CMake versions prior to 2.8.
Change-Id: Ie04fbcb45b3efb45a6bbaa806a1630c26357185f
Reviewed-on: http://gerrit.cloudera.org:8080/461
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
If a static version of zlib and bzip2 is picked up we assumed that it
would be compiled with -fPIC. However, this is not always the case. Thus
in the non-toolchain case we specifically dynamic link with zlib and
bzip2 for the dynamic targets.
In addition, this patch removes static linking of libgcc in the
toolchain case as LLVM is not able to find the exception handling
symbols even if they are present in the binary. Static linking of libgcc
is postponed.
Next, if Impala is build with -notests the external data source thrift
files would not be generated. This patch make sure the dependencies are
expressed correctly.
Finally, if a user would have google perftools installed on the system
we would accidentally pick up the system libraries and the thirdparty
headers which will end in linker errors. This patch fixes the path
issues.
Change-Id: Ic000101c33da26d75a0cd733f7ef02f1bd694937
Reviewed-on: http://gerrit.cloudera.org:8080/460
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch allows to optionally enable the new Impala binary
toolchain. For now there are now major version differences in the
toolchain dependencies and what is currently kept in thirdparty.
To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the
folder where the binaries are available.
In addition this patch moves gutil from the thirdparty directory into
the source tree of be/src to allow easy propagation of compiler and
linker flags. Furthermore, the thrift-cpp target was added as a
dependency to all targets that require the generated thrift sources to
be available before the build is started.
What is the new toolchain: The goal of the toolchain is to homogenize
the build environment and to make sure that Impala is build nearly
identical on every platform. To achieve this, we limit the flexibility
of using the systems host libraries and rather rely on a set of custom
produced binaries including the necessary compiler.
Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b
Reviewed-on: http://gerrit.cloudera.org:8080/427
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Adds support in the script generate_metrics.py to produce
a CM compatible metric definition (MDL) file.
Fixes some metrics missing descriptions and changing some
metrics created as gauges that are really counters.
TODO: Support histograms, stats, and metric defs with args
Change-Id: I3ebb45145035facab5d4408118150f8c8eb8786a
Reviewed-on: http://gerrit.cloudera.org:8080/423
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This commit will be backported to 5.4.x to improve plans when using
Isilon and S3.
The planner currently estimates the number of backends that an hdfs scan
node will execute on as the number of datanodes holding block replica
for the corresponding table. This can be a bad estimate for various reasons:
1) It's completely wrong when the scan is remote (e.g. S3 or Isilon).
2) It doesn't account for partition pruning.
3) The size of the set of hosts holding block replica may larger than
the number of scan ranges.
Improve the estimate by examing the scan ranges and taking locality into
account. While this new estimate will eventually be used in all cases,
this change uses the new estimate only when there is a remote scan range
as to not change plans produced for local ranges (since this commit will
be backported to 5.4.x). So, this commit purposely addresses only case
1. A follow on commit will enable the new logic for all cases.
Also set up the S3PlannerTest so that we can enable it in the nightly
jenkins S3 run. It was inadvertantly never enabled there.
Change-Id: I3fd3f7c5431a535fb044c98c326338c21b8a1898
Reviewed-on: http://gerrit.cloudera.org:8080/425
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Scan HDFS path of the table and find the partitions
which are missing in metastore.
Add these partitions into metastore.
Change-Id: I150f114db576bc18d39f3791be7af581ab49dfab
Reviewed-on: http://gerrit.cloudera.org:8080/24
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
This patch enables caching of HDFS file handles to avoid opening the
file over and over again. When a file is opened for the first time, a
HdfsCachedFileHandle object is created that is a small wrapper around
the hdfsFile instance allowing to associate the last modified timestamp
with this instance.
When the file handle is no longer needed, it is returned to the
DsikIoMgr where it is cached under the given path. When the file is
opened again, first a lookup is performed to see if an existing handle
can be reused. If there is an existing handle, the last modified time
of the cached handle is compared with the last modified time of the file
to be opened. If they are equal the handle can be reused, otherwise it
is closed and the file is opened regularly.
The new flag `-max_cached_file_handles` controls the overall size of the
cache by defining an upper bound of cached file handles. Furthermore,
five new metrics were added to report the number of currently cached
file handles in the DiskIoMgr and the hit ratio of the cache (including
hit and miss count).
impala-server.io.mgr.num-cached-file-handles
impala-server.io.mgr.cached-file-handles-hit-ratio
impala-server.io.mgr.cached-file-handles-hit-count
impala-server.io.mgr.cached-file-handles-miss-count
mpala-server.io.mgr.num-file-handles-outstanding
Due to the way how Impala performs the scan operations the cache may
contain multiple entries for the same file. If the limit of open files
in the context of the process is smaller than `max_cached_file_handles`,
the lower limit is used as the cache capacity.
Performance and Memory Evaluation:
The patch was evaluated in three tests
1) Throughput, parallel scans on a small table with 200 small files. TP
increased from ~50 QPS to ~150 QPS with FD caching.
2) Latency: single table with 300k files. Running select count(*) on the
table was executed in 2792.30s with FD caching and in 2764.81s without
FD caching (based HEAD~1 commit). No overhead.
3) Memory consumption. For the above table the delta in RSS memory
consumption after running the query is 30MB which equals roughly the
expected 2-3kB per FD for 10k cached descriptors.
Change-Id: Ifa6560d141188c329d7bc73c2dabcc1352d69cd7
Reviewed-on: http://gerrit.cloudera.org:8080/366
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>