Fill the LogicalType field in Parquet schemas for columns
that have an associated logical type. ConvertedType still
has to be filled to remain compatible with older readers.
Testing:
- added new tests to check both logical and converted types
to test_insert_parquet.py
Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91
Reviewed-on: http://gerrit.cloudera.org:8080/12004
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_min_max_filters and test_decimal_min_max_filters records the aggregated probe rows to
check whether min-max filter was exercised. In the case of ASAN builds, the probe side
started processing before the filters reached the probe side, because ASAN builds are a
little slower. The resolution is to increase RUNTIME_FILTER_WAIT_TIME_MS to accommodate ASAN.
This issue was also seen earlier on a runtime filter tests and fixed through IMPALA-6201. This
fix mimics the same, by setting RUNTIME_FILTER_WAIT_TIME_MS to $RUNTIME_FILTER_WAIT_TIME_MS.
Change-Id: I111ed15947bd2812753ae68d3bbb8a9871e25b08
Reviewed-on: http://gerrit.cloudera.org:8080/12224
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Cardinality is vital to understanding why a plan has the form it does,
yet the planner normally emits cardinality information only for the
detailed levels. Unfortunately, most query profiles we see are at the
standard level without this information (except in the summary table),
making it hard to understand what happened.
This patch adds cardinality to the standard EXPLAIN output. It also
changes the displayed cardinality value to be in abbreviated "metric"
form: 1.23K instead of 1234, etc.
Changing the DESCRIBE output has a huge impact on PlannerTest: all the
"golden" test files must change. To avoid doing this twice, this patch
also includes:
IMPALA-7919: Add predicates line in plan output for partition key
predicates
This is also the time to also include:
IMPALA-8022: Add cardinality checks to PlannerTest
The comparison code was changed to allow a set of validators, one of
which compares cardinality to ensure it is within 5% of the expected
value. This should ensure we don't change estimates unintentionally.
While many planner tests are concerned with cardinality, many others are
not. Testing showed that the cardinality is actually unstable within
tests. For such tests, added filters to ignore cardinality. The filter
is enabled by default (for backward compatibility) but disabled (to
allow cardinality verification) for the critical tests.
Rebasing the tests was complicated by a bug in the error-matching code,
so this patch also fixes:
IMPALA-8023: Fix PlannerTest to handle error lines consistently
Now, the error output written to the output "save results" file matches
that expected in the "golden" file -- no more handling these specially.
Testing:
* Added cardinality verification.
* Reran all FE tests.
* Rebased all PlannerTest .test files.
* Adjusted the metadata/test_explain.py test to handle the changed
EXPLAIN output.
Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Reviewed-on: http://gerrit.cloudera.org:8080/12136
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The code mimics the code written for other min-max filters. Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively
handles these 3 storage configurations. The column definition states the
precision and the precision determines the storage size.
The minimum and maximum values are stored in a union. The precision from
the column will come in as an input. Based on the precision the size will be
found, and depending on the size appropriate variable will be used.
The code in min-max-filter* follows the general convention of the file, hence
uses macros.
The test includes 24 decimal columns (as listed below) with the following joins:
1. Inner Join with broadcast (2 tables)
1a. 1 predicate
1b. 4 predicates - all results in decimal min-max filter
1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2. Inner Join with Shuffle (3 tables)
3. Right outer join (2 tables)
4. Left Semi join (2 tables)
5. Right Semi join (2 tables)
Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)
The test aggregates the count of probe rows. This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table. The count of probe rows is considered to be
deterministic. But, it will be beneficial to look out for changes in Kudu that can
change the way data is partitioned. Such a change could change the probe row count
and in that case, the test will have to be updated.
impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.
Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Reviewed-on: http://gerrit.cloudera.org:8080/12113
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
GC is performed when:
* The amount of memory allocated from the system for the buffer pool
exceeds the reservation (i.e. free buffers and clean pages are not
offset by unused reservation).
* The soft or hard process memory limit would otherwise cause an
allocation to fail.
Testing:
Looped the old version of the semi_joins_exhaustive test, which
reliably reproduced the issue. I confirmed that the buffer pool GC was
running and that it preventing the query failures.
Added a backend test that reproed the issue. A large chunk of the code
change is to add infrastructure to use TCMalloc memory metrics
for the process memory tracker in backend tests.
Ran exhaustive tests.
Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21
Reviewed-on: http://gerrit.cloudera.org:8080/12133
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The ClassLoader whence a UDF was loaded needs to be kept open for
executions of the UDF, so that the UDF can load other classes from the
same jar. (A typical scenario might be a utility class.) This was
broken by the fix to IMPALA-7668.
This commit moves closing the ClassLoader to the close() function.
A test for a UDF that imports a static method from another file has been
added. Doing so failed without this change.
Change-Id: Ic02e42fb25a2754ede21fe00312a60f07e0ba8a2
Reviewed-on: http://gerrit.cloudera.org:8080/12125
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Each AST statement node has a toSql() implementation. The code for
ScalarFunction and ToSqlUtils has a number of issues:
* If Location or Symbol are not set, they are shown as 'null'. Better to
omit these clauses if the items are not available. This is mostly an
issue during testing.
* The generated SQL does not follow the CREATE TABLE syntax. For
example, the signature and return value are provided for Java
functions, but should not be.
* Unlike other statements, this one is generated with a trailing newline.
* ToSql.getCreateFunctionSql() fails to separate functions with the
semi-colon statement separator.
These are all minor issues, but we might as well fix the code to work as
intended.
Testing:
* Added a new unit tests to verify the behavior (no tested existed
previously.)
* Re-ran all FE tests.
Change-Id: Id34d6df97760a11c299092dff8edbdb7033bce1c
Reviewed-on: http://gerrit.cloudera.org:8080/12014
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
PARQUET-1387 added int64 timestamps with nanosecond precision that
stores timestamps as nanoseconds since the Unix epoch.
As 64 bits are not enough to represent the whole 1400..9999 range
of Impala timestamps, this new type works with a limited range:
1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC
The benefit of the reduced range is that no validation is necessary
during scanning, as every possible 64 bit value represents a valid
timestamp in Impala. This may mean that this has the potential be
the fastest way to store timestamps in Impala + Parquet.
Another way NANO differs from MICRO and MILLI is that NANO can
be only described with new logical types in Parquet, it has no
converted type equivalent. This made implementing CREATE TABLE
LIKE PARQUET less trivial than it was for MICRO/MILLI: the type
conversion logic in ParquetHelper.java had to be rewritten to
use LogicalTypeAnnotation instead of ConvertedType.
The changes on Java side also made bumping CDH_BUILD_NUMBER
necessary.
Testing:
- added a new testfile with int64 nano timestamps
- ran core tests
Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Reviewed-on: http://gerrit.cloudera.org:8080/11984
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.
This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).
Tests:
=====
- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
estimation.
Perf note:
=========
I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.
Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Reviewed-on: http://gerrit.cloudera.org:8080/11565
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Move parquet classes into exec/parquet.
Move CollectionColumnReader and ParquetLevelDecoder into separate files.
Remove unnecessary 'encoding_' field from ParquetLevelDecoder.
Switch BOOLEAN decoding to use composition instead of inheritance. This
lets the boolean decoding use the faster batched implementations in
ScalarColumnReader and avoids some confusing aspects of the class
hierarchy, like the ReadValueBatch() implementation on the base class
that was shared between BoolColumnReader and CollectionColumnReader.
Improve compile times by instantiating BitPacking templates in a
separate file (this looks to give a 30s+ speedup for
compiling parquet-column-readers.cc).
Testing:
Ran exhaustive tests.
Change-Id: I0efd5c50b781fe9e3c022b33c66c06cfb529c0b8
Reviewed-on: http://gerrit.cloudera.org:8080/11949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-7367 reduced the memory requirement of the query tested in
test_exchange_mem_usage_scaling. This change reduces the mem limit
to ensure that the query runs out of memory as expected.
Testing:
Ran the test 100 times in a loop without any failures.
Change-Id: Ib2f063fb88ebf0c7f994b55ecfc860d81726fdd8
Reviewed-on: http://gerrit.cloudera.org:8080/11965
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As shown in IMPALA-7828. there is some non-determinism on whether the errors
detected in FragmentInstanceState::Close() will show up in the final profile
sent to the coordinator. The reason is that the current code marks a fragment
instance as "done" after ExecInternal() completes but before Close() is called.
There is a window between when the final status report is sent and when Close()
finishes.
This change fixes the problem by not sending the final report until Close()
is called. This has no implication on the first row available time for normal
queries. It may slightly lengthen the first row available time for DML queries.
Testing done: Updated udf-no-expr-rewrite.test to exercise this test
Perf run on an 8 node clusters didn't show any regression:
TPCH-300
+------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+------------+-----------------------+---------+------------+------------+----------------+
| TPCH(_300) | parquet / none / none | 23.94 | -2.05% | 12.55 | -2.62% |
+------------+-----------------------+---------+------------+------------+----------------+
Small concurrency
+-------------------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------------+-----------------------+---------+------------+------------+----------------+
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 6.89 | -0.66% | 6.62 | +0.41% |
+-------------------------+-----------------------+---------+------------+------------+----------------+
Medium concurrency
+-------------------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------------+-----------------------+---------+------------+------------+----------------+
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 55.57 | -1.04% | 55.27 | -0.98% |
+-------------------------+-----------------------+---------+------------+------------+----------------+
Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01
Reviewed-on: http://gerrit.cloudera.org:8080/11939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change packs StringValue and CollectionValue slots to ensure
they now occupy 12 bytes instead of 16 bytes. This reduces the
memory requirements and improves the performance. Since Kudu
tuples are populated using a memcopy, 4 bytes of padding was
added to StringSlots in Kudu tables.
Testing:
Ran core tests.
Added static asserts to ensure the value sizes are as expected.
Performance tests on TPCH-40 produced 3.96% improvement.
Change-Id: I32f3b06622c087e4aa288e8db1bf4581b10d386a
Reviewed-on: http://gerrit.cloudera.org:8080/11599
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The idea is to optimise the common case where there are long runs of
NULL or non-NULL values (i.e. the def level is repeated). We can
detect this cheaply by keying the decoding loop in the column reader
off the state of the def level RLE decoder - if there's a long run
of repeated levels, we can skip checking the def level for every
value. We still fall back to decoding, caching and reading
value-by-value a batch of def levels whenever the next def level is not
in a repeated run. We still use the old approach for decoding rep
levels. There might be some benefit to using the same approach for rep
levels *if* repeated def and rep level runs line up.
These changes should unlock further optimizations because more time is
spent in simple kernel functions, e.g. UnpackAndDecode32Values() for
dictionary decompression, which is very optimisable using SIMD etc.
Snappy decompression now seems to be the main CPU bottleneck for
decoding snappy-compressed Parquet.
Perf:
Running TPC-H scale factor 60 on uncompressed and snappy parquet
both showed a ~4% speedup overall.
Microbenchmarks on uncompressed parquet show scans only doing
dictionary decoding on uncompressed Parquet is ~75% faster:
set mt_dop=1;
select min(l_returnflag) from lineitem;
Testing:
We have alltypes agg with a mix of null and non-null.
Many tables have long runs of non-null values.
Added new test data and coverage:
* a test table manynulls with long runs of null values.
* a large CHAR test table
* missing coverage for materialising pos slot in flattened nested types
scan.
* Extended dict test to test longer runs.
* A larger version of complextypestbl with interesting collection
shapes - NULL collections, empty collections, etc, particularly runs
of collections with the same shape.
* Test interaction of timestamp validation with conversion
* Ran code coverage build to confirm all code paths are tested
* ASAN and exhaustive runs.
Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef
Reviewed-on: http://gerrit.cloudera.org:8080/8319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If explain_level is at 'extended' level or higher, then enhance the
output from the explain command. (1) Show the analyzed sql in the
explain header, this is the rewritten sql, which includes implicit
casts, and literals are printed with a cast so that their type is
visible. (2) When predicates are shown in the plan these are shown in
the same format.
The toSql() method can be called on a ParseNode tree to return
the sql corresponding ot the tree. In the past toSQl() has been
enhanced to print rewritten sql by partially overloading toSql() [with
toSql(boolean)]. This current change requires changing toSQl() in
many places as NumericLiteral can appear at different points in ia
parse tree. To avoid many new fragile overloads of toSql() I added
toSql(ToSqlOptions), where ToSqlOptions is an enum which controls the
form of the Sql that is returned. This changes many files but is safer
and means that any future options to toSql() can be added painlessly.
If SHOW_IMPLICIT_CASTS is passed to toSql() then
- in CastExpr print the implicit cast
- in NumericLiteral print the literal with a cast to show the type
Add a PlannerTestOption directive that will force the query text showing
implicit casts to be included in the PLAN section of a .test file.
The analyzed query text is wrapped at 80 characters. Note that the
analyzed query cannot always be executed as queries rewritten to use
LEFT SEMI JOIN are not legal sql. In addition some space characters may
be removed from the query for prettier display.
Documentation of this change will be done as IMPALA-7718
EXAMPLE OUTPUT:
[localhost:21000] default> set explain_level=2;
EXPLAIN_LEVEL set to 2
[localhost:21000] default> explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100;
Query: explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100
Max Per-Host Resource Reservation: Memory=0B Threads=2
Per-Host Resource Estimates: Memory=10MB
Codegen disabled by planner
Analyzed query: SELECT * FROM functional_kudu.alltypestiny WHERE CAST(bigint_col
AS DOUBLE) < CAST(10 AS DOUBLE)
""
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=4.88MB mem-reservation=0B thread-reservation=2
PLAN-ROOT SINK
| mem-estimate=0B mem-reservation=0B thread-reservation=0
|
00:SCAN KUDU [functional_kudu.alltypestiny]
predicates: CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE)
mem-estimate=4.88MB mem-reservation=0B thread-reservation=1
tuple-ids=0 row-size=97B cardinality=1
in pipelines: 00(GETNEXT)
Fetched 16 row(s) in 0.03s
TESTING:
All end-to-end tests pass.
Added a new test in ExprRewriterTest which prints sql with implict casts
for some interesting queries.
Add a unit test for the code which wraps text at 60 characters.
The output of some Planner Tests in .test files has been updated to
include the Analyzed sql that is printed when explain_level is
at at least 'extended' level.
Change-Id: I55c3bdacc295137f66b2316a912fc347da30d6b0
Reviewed-on: http://gerrit.cloudera.org:8080/11719
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
test_resource_limits was failing in release build because the queries
used were finishing earlier than expected. This resulted in fragment
instances not being able to send enough updates to the coordinator in
order to hit the limits used for the tests. This patches adds a
deterministic sleep to the queries which gives enough time to the
coordinator to catch up on reports.
Testing:
Checked that tests passed on release builds.
Change-Id: I4a47391e52f3974db554dfc0d38139d3ee18a1b4
Reviewed-on: http://gerrit.cloudera.org:8080/11933
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Changes:
- parquet.thrift is updated to a newer version which contains the
timestamp logical type.
- INT64 columns with converted types TIMESTAMP_MILLIS and
TIMESTAMP_MICROS can be read as TIMESTAMP.
- If the logical type is timestamp, then the type will contain the
information whether the UTC->local conversion is necessary. This
feature is only supported for the new timestamp types, so INT96
timestamps must still use flag
convert_legacy_hive_parquet_utc_timestamps.
- Min/max stat filtering is enabled again for columns that need
UTC->local conversion. This was disabled in IMPALA-7559 because
it could incorrectly drop column chunks.
- CREATE TABLE LIKE PARQUET converts these columns to
TIMESTAMP - before the change, an error was returned instead.
- Bulk of the Parquet column stat logic was moved to a new class
called "ColumnStatsReader".
Testing:
- Added unit tests for timezone conversion (this needed a new public
function in timezone_db.h and adding CET to tzdb_tiny).
- Added parquet files (created with parquet-mr) with int64 timestamp
columns.
Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
Reviewed-on: http://gerrit.cloudera.org:8080/11057
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before IMPALA-4063, the error message detected during
FragmentInstanceState::Close() was always lost. After
IMPALA-4063, we may sometimes get the error message in
FragmentInstanceState::Close(). It's non-deterministic
as the fragment instance thread may race with the query
state thread which reports the final status. The UDF test
currently tries to handle this non-determinism by using
"row_regex:.*" in the ERRORS section but it doesn't
always seem to work.
This change workarounds the issue by commenting out the
ERRORS section in udf-no-expr-rewrite.text for now.
The actual fix will be done in IMPALA-7829.
Change-Id: I6a55d5ad1a5a7278e7390f60854a8df28c1b9f28
Reviewed-on: http://gerrit.cloudera.org:8080/11900
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala assumes that when Reset() is called on an ExecNode, all of the
memory returned from that node by GetNext() has been attached to the
output RowBatch. In a query with a LIMIT on the subplan, such that
some nodes don't reach 'eos', this may not be the case.
The solution is to have Reset() take a RowBatch that any such memory
can be attached to. I examined all ExecNodes for resources being
transferred on 'eos' and added transferring of those resources in
Resst().
Testing:
- Added e2e tests that repro the issue for hash and nested loop joins.
Change-Id: I3968a379fcbb5d30fcec304995d3e44933dbbc77
Reviewed-on: http://gerrit.cloudera.org:8080/11852
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We need to carefully check that the intermediate value fits in an
int64_t and the final size fits in an int. If they don't we
raise an error and fail the query.
Testing:
Added a couple of backend tests to exercise the
overflow check code paths.
Change-Id: I872ce77bc2cb29116881c27ca2a5216f722cdb2a
Reviewed-on: http://gerrit.cloudera.org:8080/11889
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds a new query option 'topn_bytes_limit' that places a limit on the
number of estimated bytes that a TopN operator can process. If the
Impala planner estimates that a TopN operator will process more bytes
than this limit, it will replace the TopN operator with a sort operator.
Since the TopN operator cannot spill to disk, it has to buffer everything
in memory. This can cause frequent OOM issues when running with a large
limit + offset. Switching to a sort operator allows Impala to spill to
disk. We prefer to use the TopN operator when possible as it has better
performance than the sort operator for 'order by limit [offset]' queries.
The default limit is set to 512MB and is based on micro-benchmarking the
topn vs. sort operator for various limits (see the JIRA for full details).
The default is set to an intentionally high value in order to avoid
performance regressions.
Testing:
* Added a new planner test to fuctional-planner/ to validate that
'topn_bytes_limit' properly switches between topn and sort operators.
Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Reviewed-on: http://gerrit.cloudera.org:8080/11698
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The constraint imposed by IMPALA-1354 was artificial.
If there are constant "partition by" expressions, simply drop them,
they are no-ops.
Constant "order by" expressions can be ignored as well, though in effect
they should be accounted for as null expressions in the backend, with the
effect that combine all rows in the same window (i.e. no window breaks).
Change-Id: Idf129026c45120e9470df601268863634037908c
Reviewed-on: http://gerrit.cloudera.org:8080/11556
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.
This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.
Testing done: exhaustive tests
This patch is based on a patch by Sailesh Mukil
Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Reviewed-on: http://gerrit.cloudera.org:8080/11615
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fixes an arithmetic overflow in ExchangeNode::GetNextMerging. Prior to
this patch, the code read:
int rows_to_keep = num_rows_skipped_ - offset_;
Where num_rows_skipped_ and offset_ were of type int64_t. The result was
cast to an int which can lead to an overflow if the result exceeds the
value of 2^31. The value of rows_to_keep would be passed into
row-batch.h::CopyRows which would crash due to a DCHECK_LE error.
This crash arises when the value of the OFFSET is a large number, for
example, the query:
select int_col from functional.alltypes order by 1 limit
1 offset 9223372036854775800;
Would crash the Impalad executor for this query.
The fix is to change rows_to_keep to an int64_t to avoid the overflow,
which prevents the DCHECK_LE from failing.
Change-Id: I8bb8064aae6ad25c8a19f6a8869086be7e70400a
Reviewed-on: http://gerrit.cloudera.org:8080/11844
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Similar to the treatment of NULLs, we want to consider NaN values
as equal when grouping.
- When detecting a NaN in a set of row values, the NaN value must
be converted to a canonical value - so that all NaN values have
the same bit-pattern for hashing purposes.
- When doing equality evaluation, floating point types must have
additional logic to consider NaN values as equal.
- Existing logic for handling NULLs in this way is appropriate for
triggering this behavior for NaN values.
- Relabel "force null equality" as "inclusive equality" to expand
the scope of the concept to a more generic form that includes NaN.
Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Reviewed-on: http://gerrit.cloudera.org:8080/11535
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans
There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.
This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.
In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().
Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.
Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The previous approach could lead to hangs or cryptic error messages
because it removed the ORC data type from a lookup table.
Instead check explicitly in the planner for ORC scans and throw a
more helpful error message.
Testing:
Added custom cluster test to exercise code and check error message.
Change-Id: I209e79b18745c48d0182800a916d6566083f4609
Reviewed-on: http://gerrit.cloudera.org:8080/11835
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The .test file parser implemented an unconventional method for parsing
single-quoted strings in comma-separated value format. This didn't handle
trailing commas in the string correctly.
This commit switches to using a conventional method for parsing
comma-separated value format:
* Commas enclosed by single quotes are not treated as field separators
* Single quotes can be escaped within a string by doubling them.
I looked into using Python's .csv module for this, but it wouldn't
work without modifying the test file format more because it
automatically discards the quotes during parsing, which are actually
semantically important in .test files. E.g. without the quotes we can't
distinguish between the literal string 'regex:...' and the regex
regex:....
Testing:
Ran exhaustive tests and fixed .test files that required modifications.
Will rerun before merging.
Added a couple of tests to exercise edge cases in the test file parser.
Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9
Reviewed-on: http://gerrit.cloudera.org:8080/11800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
This is very similar to IMPALA-7335, except happens
when 'progress_' is incremented in the call chain
HdfsScanNode::ProcessSplit
-> HdfsScanNodeBase::CreateAndOpenScanner()
-> HdfsScanner::Close()
The fix required restructuring the code so that
SetDoneInternal() is called with the error *before*
HdfsScanner::Close(). This required a refactoring because
HdfsScanNodeBase doesn't actually know about SetDoneInternal().
My fix is to put the common logic between HdfsScanNode and
HdfsScanNodeMt into a helper in HdfsScanNodeBase, then in
HdfsScanNode, make sure to call SetDoneInternal() before
closing the scanner.
I also reworked HdfsScanNode::ProcessSplit() to handle error propagation
internally. I think the joint responsibility between ProcessSplit() and
its caller for handling errors made things harder than necessary.
Testing:
Added a debug action and test that reproduced the race before the fix.
Change-Id: I45a61210ca7d057b048c77d9f2f2695ec450f19b
Reviewed-on: http://gerrit.cloudera.org:8080/11596
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After inserting all of its input into its Aggregators,
StreamingAggregationNode performs some cleanup, such as calling
InputDone() on each Aggregator.
Previously, StreamingAggregationNode only checked that all of the
child's batches had been fetched before doing this cleanup, which
causes problems if the final child batch isn't processed fully in a
single GetNext() call. In this case, multiple calls to InputDone()
lead to a DCHECK failure.
The solution is to only perform the cleanup once the final child batch
has been fully processed.
Testing:
- Added an e2e test with a query that hits this condition.
Change-Id: I851007a60472d0e53081c076c863c866c516677c
Reviewed-on: http://gerrit.cloudera.org:8080/11626
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes one of the tests in test_resource_limits that expects a
query to run for more than 2 seconds but currently fails because it
sometimes completes earlier than that.
Change-Id: I2ba7080f62f0af3e16ef6c304463ebf78dec1b0c
Reviewed-on: http://gerrit.cloudera.org:8080/11741
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A spilling test when run on test build with hdfs erasure coding turned
on hits an out of memory error on the hdfs scan node. This happens
because the test is tuned for a regular 3 node minicluster without
hdfs erasure coding. Fix is to simply increase the memory limit on
the test to accommodate this difference yet keep it small enough to
achieve desired spilling on the hash join node.
Testing:
Ran it on an EC enabled minicluster to make sure it works
Change-Id: I207569822ba7388e78936d25e2311fa09c7a1b9a
Reviewed-on: http://gerrit.cloudera.org:8080/11740
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
StringMinMaxFilter uses a MemPool to allocate space for StringBuffers.
Previously, the MemPool was created by RuntimeFilterBank and passed to
each StringMinMaxFilter. In queries with multiple StringMinMaxFilters
being generated by the same fragment instance, this leads to
overlapping use of the MemPool by different threads, which is
incorrect as MemPools are not thread-safe.
The solution is to have each StringMinMaxFilter create its own
MemPool.
This patch also documents MemPool as not thread-safe and introduces a
DFAKE_MUTEX to help enforce correct usage. Doing this requires
modifying our CMakeLists.txt to pass '-DNDEBUG' to clang only in
RELEASE builds, so that the DFAKE_MUTEX will be present in the
compiled IR for DEBUG builds.
Testing:
- I have been unable to repro the actual crash despite trying a large
variety of different things. However, with additional logging added
its clear that the MemPool is being used concurrently, which is
incorrect.
- Added an e2e test that covers the potential issue. It hits the
DFAKE_MUTEX with a sleep added to MemPool::Allocate.
- Ran a full exhaustive build in both DEBUG and RELEASE.
Change-Id: I751cad7e6b75c9d95d7ad029bbd1e52fe09e8a29
Reviewed-on: http://gerrit.cloudera.org:8080/11650
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds the command line flag enable_parquet_page_index_writing
to the Impala daemon that switches Impala's ability of writing the
Parquet page index. By default the flag is false, i.e. Impala doesn't
write the page index.
This flag is only temporary, we plan to remove it once Impala is able to
read the page index and has better testing around it.
Because of this change I had to move test_parquet_page_index.py to the
custom_cluster test suite since I need to set this command line flag
in order to test the functionality. I also merged most of the test cases
because we don't want to restart the cluster too many times.
I removed 'num_data_pages_' from BaseColumnWriter since it was rather
confusing and didn't provide any measurable performance improvement.
This commit fixes the ASAN error produced by the first IMPALA-7644
commit which was reverted later.
Change-Id: Ib4a9098a2085a385351477c715ae245d83bf1c72
Reviewed-on: http://gerrit.cloudera.org:8080/11694
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.
This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.
I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.
Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
objects, which only have microsecond precision. Therefore result
timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
VARCHAR.
I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.
Limitations:
* Not all of the same methods are implemented as for beeswax, so some
tests that have more complicated interactions with the client will not
work with HS2 yet.
* We don't have a way to get the affected row count for inserts.
I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.
Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.
Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds the command line flag enable_parquet_page_index_writing
to the Impala daemon that switches Impala's ability of writing the
Parquet page index. By default the flag is false, i.e. Impala doesn't
write the page index.
This flag is only temporary, we plan to remove it once Impala is able to
read the page index and has better testing around it.
Because of this change I had to move test_parquet_page_index.py to the
custom_cluset test suite since I need to set this command line flag
in order to test the functionality. I also merged most of the test cases
because we don't want to restart the cluster too many times.
Change-Id: If9994882aa59cbaf3ae464100caa8211598287bc
Reviewed-on: http://gerrit.cloudera.org:8080/11563
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the broken SHOW GRANT USER ON <object> that always
shows an empty result due to incorrect comparison between TPrivilege for
the filter vs TPrivilege for the actual privilege that should not
consider the "grantoption".
Testing:
- Added new E2E tests
- Ran all FE tests
- Ran all authorization E2E tests
Change-Id: I7adc403caddd18e5a954cf6affd5d1d555b9f5f0
Reviewed-on: http://gerrit.cloudera.org:8080/11598
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
per host memory limit for a query
With this patch the per host memory limit of a query is automatically
set using the mem_limit set in the query options and the mem_estimate
calculated by the planner based on the following pseudo code:
if mem_limit is set in query options:
use that and if 'clamp-mem-limit-query-option' is true:
enforce the min/max query mem limits defined in the pool config.
else:
mem_limit = max(mem_estiamte,
min_mem_limit_required_to_accomodate_largest_initial_reservation)
finally, enforce min/max query mem limits defined in the pool
config on this value.
This calculated mem limit will also be used for admission accounting
and consequently for admission control. Moreover, three new pool
configuration options have been added to enable this behaviour:
"min-query-mem-limit" & "max-query-mem-limit" => help
clamp the per host memory limit for a query. If both these limits
are not configured, then the estimates from planning are not used
as a memory limit and only used for making admission decisions.
Moreover the estimates will no longer have a lower bound based
on the largest initial reservation.
"clamp-mem-limit-query-option" => if false, the mem_limit defined in
the query options is used directly and the max/min query mem limits
are not enforced on it.
Testing:
Added e2e test cases.
Added frontend tests for changes to RequestPoolService.
Successfully passed exhaustive tests.
Change-Id: Ifec00141651982f5975803c2165b7d7a10ebeaa6
Reviewed-on: http://gerrit.cloudera.org:8080/11157
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds test coverage for some cases where queries are currently
expected to fail with out-of-memory:
* memory limit exceeded in exchange node
* aggregation with large var-len intermediate values
* top N with large limit
* hash join with many duplicates on right side
* analytic with a large window that needs to be buffered in-memory
Note that it's not always totally deterministic where the query hits
'memory limit exceeded' so we don't include the node ID or name in the
expected error message.
Testing:
* ran exhaustive tests
* looped modified tests locally overnight
Change-Id: Icd1a7eb97837b742a967c260cafb5a7f4f45412e
Reviewed-on: http://gerrit.cloudera.org:8080/11564
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this fix Impala did not check whether a timestamp's time part
is out of the valid [0, 24 hour) range when reading Parquet files,
so these timestamps were memcopied as they were to slots, leading to
results like:
1970-01-01 -00:00:00.000000001
1970-01-01 24:00:00
Different parts of Impala treat these timestamp differently:
- string conversion leads to invalid representation that cannot be
converted back to timestamp
- timezone conversions handle the overflowing time part and give
a valid timestamp result (at least since CCTZ, I did not check
older versions of Impala)
- Parquet writing inserts these timestamp as they are, so the
resulting Parquet file will also contain corrupt timestamps
The fix adds a check that converts these corrupt timestamps to NULL,
similarly to the handling of timestamp outside the [1400..10000)
range. A new error code is added for this case. If both the date
and the time part is corrupt, then error about corrupt time is
returned.
Testing:
- added a new scanner test that reads a corrupted Parquet file
with edge values
Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
Reviewed-on: http://gerrit.cloudera.org:8080/11521
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements the same function as Hive UDF get_json_object.
We reuse RapidJson to parse the json string. In order to track the
memory used in RapidJson, we wrap FunctionContext into an allocator.
get_json_object accepts two parameters: a json string and a selector
(json path). We parse the json string into a Document tree and then
perform BFS according to the selector. For example, to process
get_json_object('[{\"a\":1}, {\"a\":2}, {\"a\":3}]', '$[*].a'),
we first perform '$[*]' to extract all the items in the root array.
Then we get a queue consists of {a:1},{a:2},{a:3} and perform '.a'
selector on all values in the queue. The final results is 1,2,3 in the
queue. As there're multiple results, they should be encapsulated into
an array. The output results is a string of '[1,2,3]'.
More examples can be found in expr-test.cc.
Test:
* Add unit tests in expr-test
* Add e2e tests in exprs.test
* Add tests in test_alloc_fail.py to check handling of out of memory
Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
Reviewed-on: http://gerrit.cloudera.org:8080/10950
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes the SHOW GRANT USER statement to show all privileges
granted to a user, either directly via object ownership, or granted
through a role via a group the user belongs to. The output for SHOW
GRANT USER will have two additional columns for privilege name and
privilege type so the user can know where the privilege comes from.
Truncated sample showing two columns that are different from role:
+----------------+----------------+--------+----------+-...
| principal_type | principal_name | scope | database | ...
+----------------+----------------+--------+----------+-...
| USER | foo | table | foo_db | ...
| ROLE | foo_role | server | | ...
+----------------+----------------+--------+----------+-...
Testing:
- Create new custom cluster test with custom group mapping.
- Ran FE and custom cluster tests.
Change-Id: Ie9f6c88f5569e1c414ceb8a86e7b013eaa3ecde1
Reviewed-on: http://gerrit.cloudera.org:8080/11531
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently in DECIMAL V2 mode, typeof(9.9 % 3) is DECIMAL(2,1) and typeof(mod(9.9, 3)) is
DECIMAL(4,1), while both are expected to be DECIMAL(2,1). This jira fixes V2 mode by
replacing "mod" with "%" at parser stage thus they share the same code path afterwards.
Testing:
Added unit tests and done real cluster testing.
Change-Id: Ib0067da04083859ffbf662a29007154461bea2ba
Reviewed-on: http://gerrit.cloudera.org:8080/11443
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The estimate for memory consumption for this scan is 9 columns * 384kb
per column = 3.375mb. So if we set the mem_limit to 6.5mb, we should
still not get more than one scanner thread, but we can avoid hitting
out-of-memory.
The issue in the JIRA was queued row batches. With this change, and
num_scanner_threads=2, there should be max 12 row batches
(10 in the queue, 2 in the scanner threads about to be enqueued)
and based on the column stats I'd estimate that each row batch is
around 200kb, so this change should provide significantly more headroom.
Change-Id: I6d992cc076bc8678089f765bdffe92e877e9d229
Reviewed-on: http://gerrit.cloudera.org:8080/11513
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>