This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This patch added OAuth support with following functionality:
* Load and parse OAuth JWKS from configured JSON file or url.
* Read the OAuth Access token from the HTTP Header which is
the same format as JWT Authorization Bearer token.
* Verify the OAuth's signature with public key in JWKS.
* Get the username out of the payload of OAuth Access token.
* If kerberos or ldap is enabled, then both jwt and oauth are
supported together. Else only one of jwt or oauth is supported.
This has been a pre existing flow for jwt. So OAuth will follow
the same policy.
* Impala Shell side changes: OAuth options -a and --oauth_cmd
Testing:
- Added 3 custom cluster be test in test_shell_jwt_auth.py:
- test_oauth_auth_valid: authenticate with valid token.
- test_oauth_auth_expired: authentication failure with
expired token.
- test_oauth_auth_invalid_jwk: authentication failure with
valid signature but expired.
- Added 1 custom cluster fe test in JwtWebserverTest.java
- testWebserverOAuthAuth: Basic tests for OAuth
- Added 1 custom cluster fe test in LdapHS2Test.java
- testHiveserver2JwtAndOAuthAuth: tests all combinations of
jwt and oauth token verification with separate jwks keys.
- Manually tested with a valid, invalid and expired oauth
access token.
- Passed core run.
Change-Id: I65dc8db917476b0f0d29b659b9fa51ebaf45b7a6
Reviewed-on: http://gerrit.cloudera.org:8080/21728
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The patch adds a feature to the automated correctness check for
tuple cache. The purpose of this feature is to enable the
verification of the correctness of the tuple cache by comparing
caches with the same key across different queries.
The feature consists of two main components: cache dumping and
runtime correctness validation.
During the cache dumping phase, if a tuple cache is detected,
we retrieve the cache from the global cache and dump it to a
subdirectory as a reference file within the specified debug
dumping directory. The subdirectory is using the cache key as
its name. Additionally, data from the child is also read and
dumped to a separate file in the same directory. We expect
these two files to be identical, assuming the results are
deterministic. For non-deterministic cases like TOP-N or others,
we may detect them and exclude them from dumping later.
Furthermore, the cache data will be transformed into a
human-readable text format on a row-by-row basis before dumping.
This approach allows for easier investigation and later analysis.
The verification process starts by comparing the entire file
content sharing with the same key. If the content matches, the
verification is considered successful. However, if the content
doesn't match, we enter a slower mode where we compare all the
rows individually. In the slow mode, we will create a hash map
from the reference cache file, then iterate the current cache
file row by row and search if every row exists in the hash map.
Additionally, a counter is integrated into the hash map to
handle scenarios involving duplicated rows. Once verification is
complete, if no discrepancies are found, both files will be removed.
If discrepancies are detected, the files will be kept and appended
with a '.bad' postfix.
New start flags:
Added a starting flag tuple_cache_debug_dump_dir for specifying
the directory for dumping the result caches. if
tuple_cache_debug_dump_dir is empty, the feature is disabled.
Added a query option enable_tuple_cache_verification to enable
or disable the tuple cache verification. Default is true. Only
valid when tuple_cache_debug_dump_dir is specified.
Tests:
Ran the testcase test_tuple_cache_tpc_queries and caught known
inconsistencies.
Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Reviewed-on: http://gerrit.cloudera.org:8080/21754
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Decimal type is a primitive data type for Impala. Current code returns
wrong values for columns with decimal data type in external JDBC tables.
This patch fixes wrong values returned from JDBC data source, and
supports pushing down decimal type of predicates to remote database
and remote Impala.
The decimal precision and scale of the columns in external JDBC table
must be no less than the decimal precision and scale of the
corresponding columns in the table of remote database. Otherwise,
Impala fails with an error since it may cause truncation of decimal
data.
Testing:
- Added Planner test for pushing down decimal type of predicates.
- Added end-to-end unit-tests for tables with decimal type of columns
for Postgres, MySQL, and Impala-to-Impala.
- Passed core-tests.
Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Reviewed-on: http://gerrit.cloudera.org:8080/21218
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some Thrift request/response structs in CatalogService were changed to
add new variables in the middle, which caused cross version
incompatibility issue for CatalogService.
Impala cluster membership is managed by the statestore. During upgrade
scenarios where different versions of Impala daemons are upgraded one
at a time, the upgraded daemons have incompatible message formats.
Even through protocol versions numbers were already defined for
Statestore and Catalog Services, they were not used. The Statestore and
Catalog server did not check the protocol version in the requests, which
allowed incompatible Impala daemons to join one cluster. This causes
unexpected query failures during rolling upgrade.
We need a way to detect this and enforce that some rules are followed:
- Statestore refuses the registration requests from incompatible
subscribers.
- Catalog server refuses the requests from incompatible clients.
- Scheduler assigns tasks to a group of compatible executors.
This patch isolate Impala daemons into separate clusters based on
protocol versions of Statestore service to prevent incompatible Impala
daemons from communicating with each other. It covers the Thrift RPC
communications between catalogd and coordinators, and communication
between statestore and its subscribers (executor, coordinators,
catalogd and admissiond). This change should work for future upgrade.
Following changes were made:
- Bump StatestoreServiceVersion and CatalogServiceVersion to V2 for
all requests of Statestore and Catalog services.
- Update the request and response structs in CatalogService to ensure
each Thrift request struct has protocol version and each Thrift
response struct has returned status.
- Update the request and response struct in StatestoreService to
ensure each Thrift request struct has protocol version and each
Thrift response struct has returned status.
- Add subscriber type so that statestore could distinguish different
types of subscribers.
- Statestore checks protocol version for registration requests from
subscribers. It refuses the requests with incompatible version.
- Catalog server checks protocol version for Catalog service APIs, and
returns error for requests with incompatible version.
- Catalog daemon sends its address and the protocol version of Catalog
service when it registers to statestore, statestore forwards the
address and the protocol version of Catalog service to all
subscribers during registration.
- Add UpdateCatalogd API for StatestoreSubscriber service so that the
coordinators could receive the address and the protocol version of
Catalog service from statestore if the coordinators register to
statestore before catalog daemon.
- Add GetProtocolVersion API for Statestore service so that the
subscribers can check the protocol version of statestore before
calling RegisterSubscriber API.
- Add starting flag tolerate_statestore_startup_delay. It is off by
default. When it's enabled, the subscriber is able to tolerate
the delay of the statestore's availability. The subscriber's
process will not exit if it cannot register with the specified
statestore on startup. But instead it enter into Recovery mode,
it will loop, sleep and retry till it successfully register with
the statestore. This flag should be enabled during rolling upgrade.
CatalogServiceVersion is defined in CatalogService.thrift. In future,
if we make non backward version compatible changes in the request or
response structures for CatalogService APIs, we need to bump the
protocol version of Catalog service.
StatestoreServiceVersion is defined in StatestoreService.thrift.
Similarly if we make non backward version compatible changes in the
request or response structures for StatestoreService APIs, we need
to bump the protocol version of Statestore service.
Message formats for KRPC communications between coordinators and
executors, and between admissiond and coordinators are defined
in proto files under common/protobuf. If we make non backward version
compatible changes in these structures, we need to bump the
protocol version of Statestore service.
Testing:
- Added end-to-end unit tests.
- Passed the core tests.
- Ran manual test to verify old version of executors cannot register
with new version of statestore, and new version of executors cannot
register with old version of statestore.
Change-Id: If61506dab38c4d1c50419c1b3f7bc4f9ee3676bc
Reviewed-on: http://gerrit.cloudera.org:8080/19959
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Python 3 now treats print as a function and requires
the parenthesis in invocation.
print "Hello World!"
is now:
print("Hello World!")
This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.
print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)
To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function
This also fixes random flake8 issues that intersect with
the changes.
Testing:
- check-python-syntax.sh shows no errors related to print
Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This change adds a more generic approach to validate numeric
query options and report parse and validation errors.
Supported types: integers, floats, memory specifications.
Range and bound validator helper functions are added to make
validation unified on call sites.
Testing:
- Error messages got more generic, therefore the existing tests
around query options are aligned to match them
Change-Id: Ia7757b52393c094d2c661918d73cbfad7214f855
Reviewed-on: http://gerrit.cloudera.org:8080/19096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala returns "Couldn't skip rows in file" error for old Parquet
file written by an old Impala (e.g. Impala 2.5, 2.6) In DEBUG build
Impala crashes by a DCHECK:
Check failed: num_buffered_values_ > 0 (-1 vs. 0)
The problem is that in some old Parquet files there can be a mismatch
between 'num_values' in a page and the encoded def/rep levels.
There is usually one more def/rep levels encoded in these files.
In SkipTopLevelRows() we skipped values based on how many def levels are
92ce6fe48e/be/src/exec/parquet/parquet-column-readers.cc (L1308-L1314)
Since there are more def levels than values in some old files,
num_buferred_values_ could become negative.
This patch also takes the value of num_buferred_values_ into account
when calculating 'read_count', so we can deal with such files. With
this patch we also include the column name in the "Couldn't skip rows"
error message, so in the future it'll be easier to identify the
problematic columns.
Testing:
* added Parquet file written by Impala 2.5 and e2e test for it
Change-Id: I568fe59df720ea040be4926812412ba4c1510a26
Reviewed-on: http://gerrit.cloudera.org:8080/18257
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The previous patch added checks on illegal decimal schemas of parquet
files. However, it doesn't return a non-ok status in
ParquetMetadataUtils::ValidateColumn if abort_on_error is set to false.
So we continue to use the illegal file schema and hit the DCHECK.
This patch fixes this and adding test coverage for illegal decimal
schemas.
Tests:
- Add a bad parquet file with illegal decimal schemas.
- Add e2e tests on the file.
- Ran test_fuzz_decimal_tbl 100 times. Saw the errors are caught as
expected.
Change-Id: I623f255a7f40be57bfa4ade98827842cee6f1fee
Reviewed-on: http://gerrit.cloudera.org:8080/17748
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch added JWT support with following functionality:
* Load and parse JWKS from pre-installed JSON file.
* Read the JWT token from the HTTP Header.
* Verify the JWT's signature with public key in JWKS.
* Get the username out of the payload of JWT token.
* Support following JSON Web Algorithms (JWA):
HS256, HS384, HS512, RS256, RS384, RS512.
We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchain.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.
Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.
Testing:
- Passed core run.
Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Reviewed-on: http://gerrit.cloudera.org:8080/17435
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch extends blacklist functionality by adding executor node to
blacklist if a query fails caused by disk failure during spill-to-disk.
Also classifies disk error codes and defines a blacklistable error set
for non-transient disk errors. Coordinator blacklists executor only if
the executor hitted blacklistable error during spill-to-disk.
Adds a new debug action to simulate disk write error during spill-to-
disk. To use, specify in query options as:
'debug_action': 'IMPALA_TMP_FILE_WRITE:<hostname>:<port>:<action>'
where <hostname> and <port> represent the impalad which execute the
fragment instances, <port> is the BE krpc port (default 27000).
Adds new test cases for blacklist and query-retry to cover the code
changes.
Testing:
- Passed new test cases.
- Passed exhaustive test.
- Manually simulated disk failures in scratch directories on nodes
of a cluster, verified that the nodes were blacklisted as
expected.
Change-Id: I04bfcb7f2e0b1ef24a5b4350f270feecd8c47437
Reviewed-on: http://gerrit.cloudera.org:8080/16949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.
The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.
JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This work addresses the current limitation in admission controller by
appending the last known memory consumption statistics about the set of
queries running or waiting on a host or in a pool to the existing memory
exhaustion message. The statistics is logged in impalad.INFO when a
query is queued or queued and then timed out due to memory pressure in
the pool or on the host. The statistics can also be part of the query
profile.
The new memory consumption statistics can be either stats on host or
aggregated pool stats. The stats on host describes memory consumption
for every pool on a host. The aggregated pool stats describes the
aggregated memory consumption on all hosts for a pool. For each stats
type, information such as query Ids and memory consumption of up to top
5 queries is provided, in addition to the min, the max, the average and
the total memory consumption for the query set.
When a query request is queued due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 2.
When a query request is timed out due to memory exhaustion, the above
new consumption statistics is logged when the BE logging level is set
at 1.
Testing:
1. Added a new test TopNQueryCheck in admission-controller-test.cc to
verify that the topN query memory consumption details are reported
correctly.
2. Add two new tests in test_admission_controller.py to simulate
queries being queued and then timed out due to pool or host memory
pressure.
3. Added a new test TopN in mem-tracker-test.cc to
verify that the topN query memory consumption details are computed
correctly from a mem tracker hierarchy.
4. Ran Core tests successfully.
Change-Id: Id995a9d044082c3b8f044e1ec25bb4c64347f781
Reviewed-on: http://gerrit.cloudera.org:8080/16220
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fix addresses the current limitation in that an ill-formatted
Parquet version string is not properly formatted before appearing
in an error message or impalad.INFO. With the fix, any such string is
converted to a hex string first. The hex string is a sequence of
four hex digit groups separated by spaces and each group is one or
two hex digits, such as "6c 65 2e a".
Testing:
Ran "core" tests successfully.
Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1
Reviewed-on: http://gerrit.cloudera.org:8080/16331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
This switches null-aware anti-join (NAAJ) to use shared
join builds with mt_dop > 0. To support this, we
make all access to the join build data structures
from the probe read-only. NAAJ requires iterating
over rows from build partitions at various steps
in the algorithm and before this patch this was not
thread-safe. We avoided that problem by having a
separate builder for each join node and duplicating
the data.
The main challenge was iteration over
null_aware_partition()->build_rows() from the probe
side, because it uses an embedded iterator in the
stream so was not thread-safe (since each thread
would be trying to use the same iterator).
The solution is to extend BufferedTupleStream to
allow multiple read iterators into a pinned,
read-only, stream. Each probe thread can then
iterate over the stream independently with no
thread safety issues.
With BufferedTupleStream changes, I partially abstracted
ReadIterator more from the rest of BufferedTupleStream,
but decided not to completely refactor so that this patchset
didn't cause excessive churn. I.e. much BufferedTupleStream
code still accesses internal fields of ReadIterator.
Fix a pre-existing bug in grouping-aggregator where
Spill() hit a DCHECK because the hash table was
destroyed unnecessarily when it hit an OOM. This was
flushed out by the parameter change in test_spilling.
Testing:
Add test to buffered-tuple-stream-test for multiple readers
to BTS.
Tweaked test_spilling_naaj_no_deny_reservation to have
a smaller minimum reservation, required to keep the
test passing with the new, lower, memory requirement.
Updated a TPC-H planner test where resource requirements
slightly decreased for the NAAJ.
Ran the naaj tests in test_spilling.py with TSAN enabled,
confirmed no data races.
Ran exhaustive tests, which passed after fixing IMPALA-9611.
Ran core tests with ASAN.
Ran backend tests with TSAN.
Perf:
I ran this query that exercises EvaluateNullProbe() heavily.
select l_orderkey, l_partkey, l_suppkey, l_linenumber
from tpch30_parquet.lineitem
where l_suppkey = 4162 and l_shipmode = 'AIR'
and l_returnflag = 'A' and l_shipdate > '1993-01-01'
and if(l_orderkey > 5500000, NULL, l_orderkey) not in (
select if(o_orderkey % 2 = 0, NULL, o_orderkey + 1)
from orders
where l_orderkey = o_orderkey)
order by 1,2,3,4;
It went from ~13s to ~11s running on a single impalad with
this change, because of the inlining of CreateOutputRow() and
EvalConjuncts().
I also ran TPC-H SF 30 on Parquet with mt_dop=4, and there was
no change in performance.
Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9
Reviewed-on: http://gerrit.cloudera.org:8080/15612
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We don't support reading UNION columns. Queries on tables containing
UNION types will fail in planning. Error message is metadata loading
error. However, scanner may need to read an ORC file with UNION types if
the table schema doesn't map to the UNION columns. Though the UNION
values won't be read, the scanner need to resolve the file schema,
including the UNION types, correctly.
In OrcSchemaResolver::BuildSchemaPath, we create a map from ORC type ids
to Impala SchemaPath representation for all types of the file. We should
deal with UNION types as well.
This patch also include some refactor to improve code readability.
Tests:
- Add tests for table schema and file schema mismatching on all complex
types.
Change-Id: I452d27b4e281eada00b62ac58af773a3479163ec
Reviewed-on: http://gerrit.cloudera.org:8080/15103
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.
Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.
Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Reviewed-on: http://gerrit.cloudera.org:8080/14982
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive can write timestamps that are outside Impala's valid
range (Impala: 1400-9999 Hive: 0001-9999). This change adds
validation logic to ORC reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.
The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
sense with ORC's encoding
- instead of column name only column id is added to the warning
Testing:
- added a simple EE test that scans an existing ORC file
Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Reviewed-on: http://gerrit.cloudera.org:8080/14832
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch the supported year range for DATE type started with
year 0. This contradicts the ANSI SQL standard that defines the valid
DATE value range to be 0001-01-01 to 9999-12-31.
Change-Id: Iefdf1c036834763f52d44d0c39a25a1f04e41e07
Reviewed-on: http://gerrit.cloudera.org:8080/14349
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the avro scanner.
Similarly to parquet, avro uses DATE logical type for dates. DATE
logical type annotates an INT32 that stores the number of days since
the unix epoch, 1 January 1970.
This representation introduces an avro interoperability issue between
Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
up to 1582-10-05 and Gregorian calendar for dates starting with
1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
calendar backward to dates preceding its official introduction in
1582-10-15.
This means that pre-1582-10-15 dates written to an avro table by Hive
will be read back incorrectly by Impala.
Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.
Dependency changes:
- BE uses avro 1.7.4-p5 from native-toolchain.
Change-Id: I7a9d5b93a22cf3a00244037e187f8c145cacc959
Reviewed-on: http://gerrit.cloudera.org:8080/13944
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.
SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.
Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.
Planner Changes:
PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.
BufferedTupleStream Re-Factoring:
For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').
Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.
Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.
Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.
Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Various BI tools generate and run SQL. When used incorrectly or
misconfigured, the tools can generate extremely large SQLs.
Some of these SQL statements reach 10s of megabytes. Large SQL
statements impose costs throughout execution, including
statement rewrite logic in the frontend and codegen in the
backend. The resource usage of these statements can impact
the stability of the system or the ability to run other SQL
statements.
This implements two new query options that provide controls
to reject large SQL statements.
- The first, MAX_STATEMENT_LENGTH_BYTES is a cap on the
total size of the SQL statement (in bytes). It is
applied before any parsing or analysis. It uses a
default value of 16MB.
- The second, STATEMENT_EXPRESSION_LIMIT, is a limit on
the total number of expressions in a statement or any
views that it references. The limit is applied upon the
first round of analysis, but it is not reapplied when
statement rewrite rules are applied. Certain expressions
such as literals in IN lists or VALUES clauses are not
analyzed and do not count towards the limit. It uses
a default value of 250,000.
The two are complementary. Since enforcing the statement
expression limit requires parsing and analyzing the
statement, the MAX_STATEMENT_LENGTH_BYTES sets an upper
bound on the size of statement that needs to be parsed
and analyzed. Testing confirms that even statements
approaching 16MB get through the first round of analysis
within a few seconds and then are rejected.
This also changes the logging in tests/common/impala_connection.py
to limit the total SQL size that it will print to 128KB. This is
prevents the JUnitXML (which includes this logging) from being too
large. Existing tests do not run SQL larger than about 80KB, so
this only applies to tests added in this change that run multi-MB
SQLs to verify limits.
Testing:
- This adds frontend tests that verify the low level
semantics about how expressions are counted and verifies
that the expression limits are enforced.
- This adds end-to-end tests that verify both the
MAX_STATEMENT_LENGTH_BYTES and STATEMENT_EXPRESSION_LIMIT
at their defaults values.
- There is also an end-to-end test that runs in exhaustive
mode that runs a SQL with close to 250,000 expressions.
Change-Id: I5675fb4a08c1dc51ae5bcf467cbb969cc064602c
Reviewed-on: http://gerrit.cloudera.org:8080/14012
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This extends the --scratch_dirs syntax to support specifying a max
capacity per directory, similarly to the --data_cache confirmation.
The capacity is delimited from the directory name with ":" and
uses the usual syntax for specifying memory. The following are
valid arguments:
* --scratch_dirs=/dir1,/dir2 (no limits)
* --scratch_dirs=/dir1,/dir2:25G (only a limit on /dir2)
* --scratch_dirs=/dir1:5MB,/dir2 (only a limit on /dir)
* --scratch_dirs=/dir1:-1,/dir2:0 (alternative ways of
expressing no limit)
The usage is tracked with a metric per directory. Allocations
from that directory start to fail when the limit is exceeded.
These metrics are exposed as
tmp-file-mgr.scratch-space-bytes-used.dir-0,
tmp-file-mgr.scratch-space-bytes-used.dir-1, etc.
Also add support for parsing terabyte specifiers to a utility
function that is used for parsing many configurations.
Testing:
Added a unit test to exercise TmpFileMgr.
Manually ran a spilling query on an impalad with multiple scratch dirs
configured with different limits. Confirmed via metrics that the
capacities were enforced.
Change-Id: I696146a65dbb97f1ba200ae472358ae2db6eb441
Reviewed-on: http://gerrit.cloudera.org:8080/13986
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A new enum value LZ4_BLOCKED was added to the THdfsCompression enum, to
distinguish it from the existing LZ4 codec. LZ4_BLOCKED codec represents
the block compression scheme used by Hadoop. Its similar to
SNAPPY_BLOCKED as far as the block format is concerned, with the only
difference being the codec used for compression and decompression.
Added Lz4BlockCompressor and Lz4BlockDecompressor classes for
compressing and decompressing parquet data using Hadoop's
lz4 block compression scheme.
The Lz4BlockCompressor treats the input
as a single block and generates a compressed block with following layout
<4 byte big endian uncompressed size>
<4 byte big endian compressed size>
<lz4 compressed block>
The hdfs parquet table writer should call the Lz4BlockCompressor
using the ideal input size (unit of compression in parquet is a page),
and so the Lz4BlockCompressor does not further break down the input
into smaller blocks.
The Lz4BlockDecompressor on the other hand should be compatible with
blocks written by Impala and other engines in Hadoop ecosystem. It can
decompress compressed data in following format
<4 byte big endian uncompressed size>
<4 byte big endian compressed size>
<lz4 compressed block>
...
<4 byte big endian compressed size>
<lz4 compressed block>
...
<repeated untill uncompressed size from outer block is consumed>
Externally users can now set the lz4 codec for parquet using:
set COMPRESSION_CODEC=lz4
This gets translated into LZ4_BLOCKED codec for the
HdfsParquetTableWriter. Similarly, when reading lz4 compressed parquet
data, the LZ4_BLOCKED codec is used.
Testing:
- Added unit tests for LZ4_BLOCKED in decompress-test.cc
- Added unit tests for Hadoop compatibility in decompress-test.cc,
basically being able to decompress an outer block with multiple inner
blocks (the Lz4BlockDecompressor description above)
- Added interoperability tests for Hive and Impala for all parquet
codecs. New test added to
tests/custom_cluster/test_hive_parquet_codec_interop.py
Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c
Reviewed-on: http://gerrit.cloudera.org:8080/13582
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.
Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.
HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
set COMPRESSION_CODEC=ZSTD:<clevel>;
set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT
Testing:
- Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
clevel and a random clevel. The test unit decompresses an input
compressed data and validates the result. It also tests for
expected behavior when passing an over/under sized buffer for
decompressing.
- Added unit tests for valid/invalid values for COMPRESSION_CODEC.
- Added e2e test in test_insert_parquet.py which tests writing/read-
ing (null/non-null) data into/from a table (w different data type
columns) using multiple codecs. Other existing e2e tests were
updated to also use parquet/zstd table format.
- Manual interoperability tests were run between Impala and Hive.
Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds an additional hiveserver2 endpoint for clients to
connect to that uses HTTP. The endpoint can be disabled by setting
--hs2_http_port=0. HTTP(S) also works when external TLS is
enabled using --ssl_server_certificate.
Thrift's http transport is modified to support BASIC authentication
via ldap. For convenience of developing and reviewing, this patch
is based on another that copied THttpServer and THttpTransport into
Impala's codebase. Kerberos authentication is not supported, so the
http endpoint is turned off if Kerberos is enabled and LDAP isn't.
TODO
=====
- Fuzz test the http endpoint
- Add tests for LDAP + HTTPS
Testing
=======
- Parameterized JdbcTest and LdapJdbcTest to work for HS2 + HTTP mode
- Added LdapHS2Test, which directly calls into the Hiveserver2
interface using a thrift http client.
Manual testing with Beeline client (from Apache Hive), which has
builtin support to connect to HTTP(S) based HS2 compatible endpoints.
Example
========
-- HTTP mode:
> start-impala-cluster.py
> JDBC_URL="jdbc:hive2://localhost:<port>/default;transportMode=http"
> beeline -u "$JDBC_URL"
-- HTTPS mode:
> cd $IMPALA_HOME
> SSL_ARGS="--ssl_client_ca_certificate=./be/src/testutil/server-cert.pem \
--ssl_server_certificate=./be/src/testutil/server-cert.pem \
--ssl_private_key=./be/src/testutil/server-key.pem --hostname=localhost"
> start-impala-cluster.py --impalad_args="$SSL_ARGS" \
--catalogd_args="$SSL_ARGS" --state_store_args="$SSL_ARGS"
- Create a local trust store using 'keytool' and import the certificate
from server-cert.pem (./clientkeystore in the example).
> JDBC_URL="jdbc:hive2://localhost:<port>/default;ssl=true;sslTrustStore= \
./clientkeystore;trustStorePassword=password;transportMode=http"
> beeline -u "$JDBC_URL"
-- BASIC Auth with LDAP:
> LDAP_ARGS="--enable_ldap_auth --ldap_uri='ldap://...' \
--ldap_bind_pattern='...' --ldap_passwords_in_clear_ok"
> start-impala-cluster.py --impalad_args="$LDAP_ARGS"
> JDBC_URL="jdbc:hive2://localhost:28000/default;user=...;password=\
...;transportMode=http"
> beeline -u "$JDBC_URL"
-- HTTPS mode with LDAP:
> start-impala-cluster.py --impalad_args="$LDAP_ARGS $SSL_ARGS" \
--catalogd_args="$SSL_ARGS" --state_store_args="$SSL_ARGS"
> JDBC_URL="jdbc:hive2://localhost:28000/default;user=...;password=\
...;ssl=true;sslTrustStore=./clientkeystore;trustStorePassword=\
password;transportMode=http"
> beeline -u "$JDBC_URL"
Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
Reviewed-on: http://gerrit.cloudera.org:8080/13299
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, when a client connection is closed, we always close any
session started over that connection. This is a requirement for
beeswax, which always ties sessions to connections, but it is not
required for hiveserver2, which allows sessions to be used across
connections with a session token.
This patch changes this behavior so that hiveserver2 sessions are no
longer closed when the corresponding connection is closed.
One downside of this change is that clients may inadvertently leave
sessions open indefinitely if they close their connection without
calling CloseSession(), which can waste space on the coordinator.
We already have a flag --idle_session_timeout, but this flag is off
by default and sessions that hit this timeout are expired but not
fully closed.
Rather than changing the default idle session behavior, which could
affect existing users, this patch mitigates this issue by adding a
new flag: --disconnected_session_timeout which is set to 1 hour by
default. When a session has had no open connections for longer than
this time, it will be closed and any associated queries will be
unregistered.
Testing:
- Added e2e tests.
Change-Id: Ia4555cd9b73db5b4dde92cd4fac4f9bfa3664d78
Reviewed-on: http://gerrit.cloudera.org:8080/13306
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the parquet scanner/writer. CREATE TABLE LIKE PARQUET
statements associated with data files that contain dates are also
supported.
Parquet uses DATE logical type for dates. DATE logical type annotates
an INT32 that stores the number of days from the Unix epoch, 1 January
1970.
This representation introduces a parquet interoperability issue
between Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
up to 1582-10-05 and Gregorian calendar for dates starting with
1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
calendar backward to dates preceding its official introduction in
1582-10-15.
This means that pre-1582-10-15 dates written to a parquet table by
Hive will be read back incorrectly by Impala and vice versa.
Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.
Change-Id: I67da03754531660bc8de3b6935580d46deae1814
Reviewed-on: http://gerrit.cloudera.org:8080/13189
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The coordinator currently waits indefinitely if it does not receive a
status report from a backend. This could cause a query to hang
indefinitely in certain situations, for example if the backend decides
to cancel itself as a result of failed status report rpcs.
This patch adds a thread to ImpalaServer which periodically iterates
over all queries for which that server is the coordinator and cancels
any that haven't had a report from a backend in a certain amount of
time.
This patch adds two flags:
--status_report_max_retry_s: the maximum number of seconds a backend
will attempt to send status reports before giving up. This is used
in place of --status_report_max_retries which is now deprecated.
--status_report_cancellation_padding: the coordinator will wait
--status_report_max_retry_s *
(1 + --status_report_cancellation_padding / 100)
before concluding a backend is not responding and cancelling the
query.
Testing:
- Added a functional test that runs a query that is cancelled through
the new mechanism.
- Passed a full set of exhaustive tests.
Ran tests on a 10 node cluster loaded with tpch 500:
- Ran the stress test for 1000 queries with the debug actions:
'REPORT_EXEC_STATUS_DELAY:JITTER@1000'
Prior to this patch, this setup results in hanging queries. With
this patch, no hangs were observed.
- Ran perf tests with 4 concurrent streams, 3 iterations per query.
Found no change in performance.
Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Reviewed-on: http://gerrit.cloudera.org:8080/12299
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Avoided rewrite if the resulting string literal exceeds a defined limit.
Testing:
Added three statements in testFoldConstantsRule() to verify that the
expression rewrite is accepted only when the size of the rewritten
expression is below a specified threshold.
Change-Id: I8b078113ccc1aa49b0cea0c86dff2e02e1dd0e23
Reviewed-on: http://gerrit.cloudera.org:8080/12814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
I recently ran into some queries that failed like so:
WARNINGS: Disk I/O error: Could not open file: /data/...: Error(5): Input/output error
These warnings were in the profile, but I had to cross-reference impalad
logs to figure out which machine had the broken disk.
In this commit, I've sprinkled GetBackendString() to include it.
Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Reviewed-on: http://gerrit.cloudera.org:8080/12402
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch limits the number of rows produced by a query by
tracking it at the PlanRootSink level. When the
NUM_ROWS_PRODUCED_LIMIT is set, it cancels a query when its
execution produces more rows than the specified limit. This limit
only applies when the results are returned to a client, e.g. for a
SELECT query, but not an INSERT query.
Testing:
Added tests to query-resource-limits.test to verify that the rows
produced limit is honored.
Manually tested on various combinations of tables, fileformats
and ROWS_RETURNED_LIMIT values.
Change-Id: I7b22dbe130a368f4be1f3662a559eb9aae7f0c1d
Reviewed-on: http://gerrit.cloudera.org:8080/12328
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There were two races:
* queries were terminated because of an impalad being detected
as failed by the statestore even if the query had finished
executing on that impalad.
* NUM_FRAGMENTS_IN_FLIGHT was used to detect the backend being
idle, but it was decremented before the final status report
was sent.
The fixes are:
* keep track of the backends that triggered the potential cancellation,
and only proceed with the cancellation if the coordinator has fragments
still executing on the backend.
* add a new metric that keeps track of the number of executing queries,
which isn't decremented until the final status report is sent.
Also do some cleanup/improvements in this code:
* use proper error codes for some errors
* more overloads for Status::Expected()
* also add a metric for the total number of queries executed on the
backend
Testing:
Add a new version of test_shutdown_executor with delays that
trigger both races. This test only runs in exhaustive to avoid
adding ~20s to core build time.
Ran exhaustive tests.
Looped test_restart_services overnight.
Change-Id: I7c1a80304cb6695d228aca8314e2231727ab1998
Reviewed-on: http://gerrit.cloudera.org:8080/12082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds additional context about how much scratch was allocated
by the query and the impalad in total. We sometimes see scratch
allocation failures because a query was spilling heavily and
ate up all the disk. In this case, the high values in the
error should provide an additional clue that the volume
of spilling is the problem (rather than disks being full
for other reasons).
Example error after deleting /tmp/impala-scratch:
[localhost:21000] default> set mem_limit=150m; select distinct * from tpch_parquet.lineitem limit 5;
WARNINGS: Could not create files in any configured scratch directories (--scratch_dirs=/tmp/impala-scratch) on backend 'tarmstrong-box:22000'. 2.00 MB of scratch is currently in use by this Impala Daemon (2.00 MB by this query). See logs for previous errors that may have prevented creating or writing scratch files.
Disk I/O error: open() failed for /tmp/impala-scratch/7d473ea7aef26431:c9105f7900000000_3120108e-475b-4616-9825-8bbdb1dc9cc2. The given path doesn't exist. errno=2
Change-Id: Icbedd586c57ec02e784143927e82b74455f98dc8
Reviewed-on: http://gerrit.cloudera.org:8080/12088
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is part 1 of a push to add timeouts for all HDFS operations.
It adds timeouts for opening an HDFS file handle.
It introduces a new SynchronousThreadPool, which executes
an operation in a thread pool and waits up to a specified
timeout for the operation to complete. This type of thread
pool can accept any subclass of SynchronousWorkItem, and
a single thread pool can process different types of work
items. It is tested by a new test case in thread-pool-test.
This also introduces a new HdfsMonitor which implements
timeouts for HDFS operations, currently limited to
hdfsOpenFile(). This is implemented using a SynchronousThreadPool.
The timeout for hdfs operations is specified by
hdfs_operation_timeout_sec, which defaults to 5 minutes.
Testing:
1. Added a test to thread-pool-test for the new
SynchronousThreadPool.
2. Core tests
3. Added a custom cluster test that does "kill -STOP"
for the NameNode and verifies that a subsequent
hdfsOpenFile operation times out.
Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454
Reviewed-on: http://gerrit.cloudera.org:8080/11874
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Move parquet classes into exec/parquet.
Move CollectionColumnReader and ParquetLevelDecoder into separate files.
Remove unnecessary 'encoding_' field from ParquetLevelDecoder.
Switch BOOLEAN decoding to use composition instead of inheritance. This
lets the boolean decoding use the faster batched implementations in
ScalarColumnReader and avoids some confusing aspects of the class
hierarchy, like the ReadValueBatch() implementation on the base class
that was shared between BoolColumnReader and CollectionColumnReader.
Improve compile times by instantiating BitPacking templates in a
separate file (this looks to give a 30s+ speedup for
compiling parquet-column-readers.cc).
Testing:
Ran exhaustive tests.
Change-Id: I0efd5c50b781fe9e3c022b33c66c06cfb529c0b8
Reviewed-on: http://gerrit.cloudera.org:8080/11949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this fix Impala did not check whether a timestamp's time part
is out of the valid [0, 24 hour) range when reading Parquet files,
so these timestamps were memcopied as they were to slots, leading to
results like:
1970-01-01 -00:00:00.000000001
1970-01-01 24:00:00
Different parts of Impala treat these timestamp differently:
- string conversion leads to invalid representation that cannot be
converted back to timestamp
- timezone conversions handle the overflowing time part and give
a valid timestamp result (at least since CCTZ, I did not check
older versions of Impala)
- Parquet writing inserts these timestamp as they are, so the
resulting Parquet file will also contain corrupt timestamps
The fix adds a check that converts these corrupt timestamps to NULL,
similarly to the handling of timestamp outside the [1400..10000)
range. A new error code is added for this case. If both the date
and the time part is corrupt, then error about corrupt time is
returned.
Testing:
- added a new scanner test that reads a corrupted Parquet file
with edge values
Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
Reviewed-on: http://gerrit.cloudera.org:8080/11521
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the same patch except with fixes for the test failures
on EC and S3 noted in the JIRA.
This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).
Details:
* In order to allow future admin commands, this is implemented with
function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
":shutdown('hostname')", so that non-coordinators can be shut down
and for the convenience of the client, which does not have to
connect to the specific impalad. There is no assumption that the
other impalad is registered in the statestore; just that the
coordinator can connect to the other daemon's thrift endpoint.
This simplifies things and allows shutdown in various important
cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
slower shutdown by specifying a deadline in seconds after the
statement is executed.
* If shutting down, a banner is shown on the root debug page.
Workflow:
1. (if a coordinator) clients are prevented from submitting
queries to this coordinator via some out-of-band mechanism,
e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
which stop scheduling fragment instances on this daemon
(if an executor).
4. the query startup grace period (which is ideally set to the AC
queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
long-running queries), after a longer timeout (counted from the start
of the shutdown process) it will shut down anyway.
What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
there is an out-of-band mechanism to prevent submission of more
queries to the shut down coordinator. If queries are submitted to
a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
slow down but not prevent eventual shutdown.
Limitations:
* The startup grace period needs to be configured to be greater than
the latency of statestore updates + scheduling + admission +
coordinator startup. Otherwise a coordinator may send a
fragment instance to the shutting down impalad. (We could
automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
even if the cluster is idle.
* We depend on the statestore detecting the process going down
if queries are still running on that backend when the timeout
expires. This may still be subject to existing problems,
e.g. IMPALA-2990.
Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
queries are running.
* End-to-end test of shutting down a coordinator
- New queries cannot be started on coord, existing queries continue to run
- Exercises various Beeswax and HS2 operations.
Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463
Reviewed-on: http://gerrit.cloudera.org:8080/11484
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
I started by converting scan and spill-to-disk because the
cancellation there is always meant to be internal to the scan and
spill-to-disk subsystems.
I updated all places that checked for TErrorCode::CANCELLED to treat
CANCELLED_INTERNALLY the same.
This is to aid triage and debugging of bugs like IMPALA-7418
where an "internal" cancellation leaks out into the query state.
This will make it easier to determine if an internal cancellation
somehow "leaked" out.
Testing:
Ran exhaustive tests.
Change-Id: If25d5b539d68981359e4d881cae7b08728ba2999
Reviewed-on: http://gerrit.cloudera.org:8080/11464
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).
Details:
* In order to allow future admin commands, this is implemented with
function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
":shutdown('hostname')", so that non-coordinators can be shut down
and for the convenience of the client, which does not have to
connect to the specific impalad. There is no assumption that the
other impalad is registered in the statestore; just that the
coordinator can connect to the other daemon's thrift endpoint.
This simplifies things and allows shutdown in various important
cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
slower shutdown by specifying a deadline in seconds after the
statement is executed.
* If shutting down, a banner is shown on the root debug page.
Workflow:
1. (if a coordinator) clients are prevented from submitting
queries to this coordinator via some out-of-band mechanism,
e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
which stop scheduling fragment instances on this daemon
(if an executor).
4. the query startup grace period (which is ideally set to the AC
queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
long-running queries), after a longer timeout (counted from the start
of the shutdown process) it will shut down anyway.
What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
there is an out-of-band mechanism to prevent submission of more
queries to the shut down coordinator. If queries are submitted to
a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
slow down but not prevent eventual shutdown.
Limitations:
* The startup grace period needs to be configured to be greater than
the latency of statestore updates + scheduling + admission +
coordinator startup. Otherwise a coordinator may send a
fragment instance to the shutting down impalad. (We could
automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
even if the cluster is idle.
* We depend on the statestore detecting the process going down
if queries are still running on that backend when the timeout
expires. This may still be subject to existing problems,
e.g. IMPALA-2990.
Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
queries are running.
* End-to-end test of shutting down a coordinator
- New queries cannot be started on coord, existing queries continue to run
- Exercises various Beeswax and HS2 operations.
Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0
Reviewed-on: http://gerrit.cloudera.org:8080/10744
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The error text with AES-GCM enabled looks like:
Error reading 44 bytes from scratch file
'/tmp/impala-scratch/0:0_d43635d0-8f55-485e-8899-907af289ac86' on
backend tarmstrong-box:22000 at offset 0: verification of read data
failed.
OpenSSL error in EVP_DecryptFinal:
139634997483216:error:0607C083:digital envelope
routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610:
139634997483216:error:0607C083:digital envelope
routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610:
139634997483216:error:0607C083:digital envelope
routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610:
139634997483216:error:0607C083:digital envelope
routines:EVP_CIPHER_CTX_ctrl:no cipher set:evp_enc.c:610:
Testing:
Added a backend test to exercise the code path and verify the error
code.
Change-Id: I0652d6cdfbb4e543dd0ca46b7cc65edc4e41a2d8
Reviewed-on: http://gerrit.cloudera.org:8080/10204
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When an impalad is in executor-only mode, it receives no
catalog updates. As a result, lib-cache entries are never
refreshed. A consequence is that udf queries can return
incorrect results or may not run due to resolution issues.
Both cases are caused by the executor using a stale copy
of the lib file. For incorrect results, an old version of
the method may be used. Resolution issues can come up if
a method is added to a lib file.
The solution in this change is to capture the coordinator's
view of the lib file's last modified time when planning.
This last modified time is then shipped with the plan to
executors. Executors must then use both the lib file path
and the last modified time as a key for the lib-cache.
If the coordinator's last modified time is more recent than
the executor's lib-cache entry, then the entry is refreshed.
Brief discussion of alternatives:
- lib-cache always checks last modified time
+ easy/local change to lib-cache
- adds an fs lookup always. rejected for this reason
- keep the last modified time in the catalog
- bound on staleness is too loose. consider the case where
fn's f1, f2, f3 are created with last modified times of
t1, t2, t3. treat the fn's last modified time as a low-watermark;
if the cache entry has a more recent time, use it. Such a scheme
would allow the version at t2 to persist. An old fn may keep the
state from converging to the latest. This could end up with strange
cases where different versions of the lib are used across executors
for a single query.
In contrast, the change in this path relies on the statestore to
push versions forward at all coordinators, so will push all
versions at all caches forward as well.
Testing:
- added an e2e custom cluster test
Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6
Reviewed-on: http://gerrit.cloudera.org:8080/9697
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The serialization format of a row batch relies on
tuple offsets. In its current form, the tuple offsets
are int32s. This means that it is impossible to generate
a valid serialization of a row batch that is larger
than INT_MAX.
This changes RowBatch::SerializeInternal() to return an
error if trying to serialize a row batch larger than INT_MAX.
This prevents a DCHECK on debug builds when creating a row
larger than 2GB.
This also changes the compression logic in RowBatch::Serialize()
to avoid a DCHECK if LZ4 will not be able to compress the
row batch. Instead, it returns an error.
This modifies row-batch-serialize-test to verify behavior at
each of the limits. Specifically:
RowBatches up to size LZ4_MAX_INPUT_SIZE succeed.
RowBatches with size range [LZ4_MAX_INPUT_SIZE+1, INT_MAX]
fail on LZ4 compression.
RowBatches with size > INT_MAX fail with RowBatch too large.
Change-Id: I3b022acdf3bc93912d6d98829b30e44b65890d91
Reviewed-on: http://gerrit.cloudera.org:8080/9367
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
The encoding was added in an early version of the Parquet
spec and deprecated even in the Parquet 1.0 spec.
Parquet-MR switched to generating RLE at the same time as
the spec changed in mid-2013. Impala always wrote RLE:
see commit 6e293090e6.
The Impala implementation of BIT_PACKED was never correct
because it implemented little endian bit unpacking instead of
the big endian unpacking required by the spec for levels.
Testing:
Updated tests to reflect expected behaviour for supported
and unsupported def level encodings.
Cherry-picks: not for 2.x.
Change-Id: I12c75b7f162dd7de8e26cf31be142b692e3624ae
Reviewed-on: http://gerrit.cloudera.org:8080/9241
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds the following details to the error message encountered
on failure to get minimum memory reservation:
- which ReservationTracker hit its limit
- top 5 admitted queries that are consuming the most memory under the
ReservationTracker that hit its limit
Testing:
- added tests to reservation-tracker-test.cc that verify the error
message returned for different cases.
- tested "initial reservation failed" condition manually to verify
the error message returned.
Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74
Reviewed-on: http://gerrit.cloudera.org:8080/8781
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins