Fill the LogicalType field in Parquet schemas for columns
that have an associated logical type. ConvertedType still
has to be filled to remain compatible with older readers.
Testing:
- added new tests to check both logical and converted types
to test_insert_parquet.py
Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91
Reviewed-on: http://gerrit.cloudera.org:8080/12004
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is a fix for the following issue:
1. Some BE tests (e.g. ExprTest.TimestampFunctions) use the system's
local timezone but run against a test timezone db (instead of the
system's timezone db).
2. On some Linux installations /usr/share/zoneinfo contains symlinks
to files in the /usr/share/zoneifo/SystemV directory
(e.g /usr/share/zoneinfo/America/Los_Angeles is a symlink to
../SystemV/PST8PDT).
3. The 'SystemV' directory is not part of the test timezone db, since
it is obsolete and excluded by default.
Consequently, if the system's local timezone is set to
America/Los_Angeles, BE tests won't find the corresponding timezone
file in the test timezone db. BE tests will default to UTC, which will
break some of them.
This change sets local timezone explicitly for failing BE tests, so
they don't depend on the system's local timezone.
It also adds 'SystemV' directory to the test timezone db to avoid
similar issues in the future.
Change-Id: I9288cd24c8af0c059e55d47c86bd92eaf0075681
Reviewed-on: http://gerrit.cloudera.org:8080/12199
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_min_max_filters and test_decimal_min_max_filters records the aggregated probe rows to
check whether min-max filter was exercised. In the case of ASAN builds, the probe side
started processing before the filters reached the probe side, because ASAN builds are a
little slower. The resolution is to increase RUNTIME_FILTER_WAIT_TIME_MS to accommodate ASAN.
This issue was also seen earlier on a runtime filter tests and fixed through IMPALA-6201. This
fix mimics the same, by setting RUNTIME_FILTER_WAIT_TIME_MS to $RUNTIME_FILTER_WAIT_TIME_MS.
Change-Id: I111ed15947bd2812753ae68d3bbb8a9871e25b08
Reviewed-on: http://gerrit.cloudera.org:8080/12224
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Cardinality is vital to understanding why a plan has the form it does,
yet the planner normally emits cardinality information only for the
detailed levels. Unfortunately, most query profiles we see are at the
standard level without this information (except in the summary table),
making it hard to understand what happened.
This patch adds cardinality to the standard EXPLAIN output. It also
changes the displayed cardinality value to be in abbreviated "metric"
form: 1.23K instead of 1234, etc.
Changing the DESCRIBE output has a huge impact on PlannerTest: all the
"golden" test files must change. To avoid doing this twice, this patch
also includes:
IMPALA-7919: Add predicates line in plan output for partition key
predicates
This is also the time to also include:
IMPALA-8022: Add cardinality checks to PlannerTest
The comparison code was changed to allow a set of validators, one of
which compares cardinality to ensure it is within 5% of the expected
value. This should ensure we don't change estimates unintentionally.
While many planner tests are concerned with cardinality, many others are
not. Testing showed that the cardinality is actually unstable within
tests. For such tests, added filters to ignore cardinality. The filter
is enabled by default (for backward compatibility) but disabled (to
allow cardinality verification) for the critical tests.
Rebasing the tests was complicated by a bug in the error-matching code,
so this patch also fixes:
IMPALA-8023: Fix PlannerTest to handle error lines consistently
Now, the error output written to the output "save results" file matches
that expected in the "golden" file -- no more handling these specially.
Testing:
* Added cardinality verification.
* Reran all FE tests.
* Rebased all PlannerTest .test files.
* Adjusted the metadata/test_explain.py test to handle the changed
EXPLAIN output.
Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Reviewed-on: http://gerrit.cloudera.org:8080/12136
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The code mimics the code written for other min-max filters. Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively
handles these 3 storage configurations. The column definition states the
precision and the precision determines the storage size.
The minimum and maximum values are stored in a union. The precision from
the column will come in as an input. Based on the precision the size will be
found, and depending on the size appropriate variable will be used.
The code in min-max-filter* follows the general convention of the file, hence
uses macros.
The test includes 24 decimal columns (as listed below) with the following joins:
1. Inner Join with broadcast (2 tables)
1a. 1 predicate
1b. 4 predicates - all results in decimal min-max filter
1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2. Inner Join with Shuffle (3 tables)
3. Right outer join (2 tables)
4. Left Semi join (2 tables)
5. Right Semi join (2 tables)
Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)
The test aggregates the count of probe rows. This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table. The count of probe rows is considered to be
deterministic. But, it will be beneficial to look out for changes in Kudu that can
change the way data is partitioned. Such a change could change the probe row count
and in that case, the test will have to be updated.
impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.
Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Reviewed-on: http://gerrit.cloudera.org:8080/12113
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
GC is performed when:
* The amount of memory allocated from the system for the buffer pool
exceeds the reservation (i.e. free buffers and clean pages are not
offset by unused reservation).
* The soft or hard process memory limit would otherwise cause an
allocation to fail.
Testing:
Looped the old version of the semi_joins_exhaustive test, which
reliably reproduced the issue. I confirmed that the buffer pool GC was
running and that it preventing the query failures.
Added a backend test that reproed the issue. A large chunk of the code
change is to add infrastructure to use TCMalloc memory metrics
for the process memory tracker in backend tests.
Ran exhaustive tests.
Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21
Reviewed-on: http://gerrit.cloudera.org:8080/12133
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The ClassLoader whence a UDF was loaded needs to be kept open for
executions of the UDF, so that the UDF can load other classes from the
same jar. (A typical scenario might be a utility class.) This was
broken by the fix to IMPALA-7668.
This commit moves closing the ClassLoader to the close() function.
A test for a UDF that imports a static method from another file has been
added. Doing so failed without this change.
Change-Id: Ic02e42fb25a2754ede21fe00312a60f07e0ba8a2
Reviewed-on: http://gerrit.cloudera.org:8080/12125
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The kudu table sink allocates untracked memory which is bounded by
limits that impala enforces through the kudu client API. This patch
adds a constant estimate to this table sink which is based on those
limits.
Testing:
Modified planner tests accordingly.
Change-Id: I89a45dce0cfbbe3cc0bc17d55ffdbd41cd7dbfbd
Reviewed-on: http://gerrit.cloudera.org:8080/12077
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Each AST statement node has a toSql() implementation. The code for
ScalarFunction and ToSqlUtils has a number of issues:
* If Location or Symbol are not set, they are shown as 'null'. Better to
omit these clauses if the items are not available. This is mostly an
issue during testing.
* The generated SQL does not follow the CREATE TABLE syntax. For
example, the signature and return value are provided for Java
functions, but should not be.
* Unlike other statements, this one is generated with a trailing newline.
* ToSql.getCreateFunctionSql() fails to separate functions with the
semi-colon statement separator.
These are all minor issues, but we might as well fix the code to work as
intended.
Testing:
* Added a new unit tests to verify the behavior (no tested existed
previously.)
* Re-ran all FE tests.
Change-Id: Id34d6df97760a11c299092dff8edbdb7033bce1c
Reviewed-on: http://gerrit.cloudera.org:8080/12014
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
PARQUET-1387 added int64 timestamps with nanosecond precision that
stores timestamps as nanoseconds since the Unix epoch.
As 64 bits are not enough to represent the whole 1400..9999 range
of Impala timestamps, this new type works with a limited range:
1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC
The benefit of the reduced range is that no validation is necessary
during scanning, as every possible 64 bit value represents a valid
timestamp in Impala. This may mean that this has the potential be
the fastest way to store timestamps in Impala + Parquet.
Another way NANO differs from MICRO and MILLI is that NANO can
be only described with new logical types in Parquet, it has no
converted type equivalent. This made implementing CREATE TABLE
LIKE PARQUET less trivial than it was for MICRO/MILLI: the type
conversion logic in ParquetHelper.java had to be rewritten to
use LogicalTypeAnnotation instead of ConvertedType.
The changes on Java side also made bumping CDH_BUILD_NUMBER
necessary.
Testing:
- added a new testfile with int64 nano timestamps
- ran core tests
Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Reviewed-on: http://gerrit.cloudera.org:8080/11984
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.
This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).
Tests:
=====
- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
estimation.
Perf note:
=========
I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.
Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Reviewed-on: http://gerrit.cloudera.org:8080/11565
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When bin/clean.sh runs, it removes some configuration
files in fe/src/test/resources that are required for
the minicluster to function.
This changes testdata/bin/run-all.sh to detect when
these configurations are missing and regenerate them.
It detects issues due to running clean.sh, which is
the most common case. It is not intended to fix
anything else.
Change-Id: I9f5955574d304b343e904851cfb9081648e350f8
Reviewed-on: http://gerrit.cloudera.org:8080/12006
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Expected results from spillable-bufe-sizing.test and max-row-size.test
included incorrect expressions: the un-anayzed GROUP BY expression with
ordinals represented as (invalid) casts.
SelectStmt is a bit of a mess. There are two copies of the grouping
expressions. Here we want to use the analyzed version with the ordinals
replaced.
Testing:
* Problem found when running PlannerTest. PlannerTest now passes,
with correct results, after this change.
* Turns out there is another path used when generating SQL for a view
which does toSql() on an unanalyzed query. Added unit tests for this
case.
Change-Id: I413bded920e27fe9f41f0ea989696a0c8f92fe4a
Reviewed-on: http://gerrit.cloudera.org:8080/11993
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Move parquet classes into exec/parquet.
Move CollectionColumnReader and ParquetLevelDecoder into separate files.
Remove unnecessary 'encoding_' field from ParquetLevelDecoder.
Switch BOOLEAN decoding to use composition instead of inheritance. This
lets the boolean decoding use the faster batched implementations in
ScalarColumnReader and avoids some confusing aspects of the class
hierarchy, like the ReadValueBatch() implementation on the base class
that was shared between BoolColumnReader and CollectionColumnReader.
Improve compile times by instantiating BitPacking templates in a
separate file (this looks to give a 30s+ speedup for
compiling parquet-column-readers.cc).
Testing:
Ran exhaustive tests.
Change-Id: I0efd5c50b781fe9e3c022b33c66c06cfb529c0b8
Reviewed-on: http://gerrit.cloudera.org:8080/11949
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-7367 reduced the memory requirement of the query tested in
test_exchange_mem_usage_scaling. This change reduces the mem limit
to ensure that the query runs out of memory as expected.
Testing:
Ran the test 100 times in a loop without any failures.
Change-Id: Ib2f063fb88ebf0c7f994b55ecfc860d81726fdd8
Reviewed-on: http://gerrit.cloudera.org:8080/11965
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As shown in IMPALA-7828. there is some non-determinism on whether the errors
detected in FragmentInstanceState::Close() will show up in the final profile
sent to the coordinator. The reason is that the current code marks a fragment
instance as "done" after ExecInternal() completes but before Close() is called.
There is a window between when the final status report is sent and when Close()
finishes.
This change fixes the problem by not sending the final report until Close()
is called. This has no implication on the first row available time for normal
queries. It may slightly lengthen the first row available time for DML queries.
Testing done: Updated udf-no-expr-rewrite.test to exercise this test
Perf run on an 8 node clusters didn't show any regression:
TPCH-300
+------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+------------+-----------------------+---------+------------+------------+----------------+
| TPCH(_300) | parquet / none / none | 23.94 | -2.05% | 12.55 | -2.62% |
+------------+-----------------------+---------+------------+------------+----------------+
Small concurrency
+-------------------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------------+-----------------------+---------+------------+------------+----------------+
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 6.89 | -0.66% | 6.62 | +0.41% |
+-------------------------+-----------------------+---------+------------+------------+----------------+
Medium concurrency
+-------------------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------------+-----------------------+---------+------------+------------+----------------+
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 55.57 | -1.04% | 55.27 | -0.98% |
+-------------------------+-----------------------+---------+------------+------------+----------------+
Change-Id: I61618854ae3f4e7ef20028dcb0ff5cbcfa8adb01
Reviewed-on: http://gerrit.cloudera.org:8080/11939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Dataload has a step of "Loading Hive builtins" that
loads a bunch of jars into HDFS/S3/etc. Despite
its name, nothing seems to be using these.
Dataload and core tests succeed without this step.
This removes the Hive builtins step and associated
scripts.
Change-Id: Iaca5ffdaca4b5506e9401b17a7806d37fd7b1844
Reviewed-on: http://gerrit.cloudera.org:8080/11944
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change packs StringValue and CollectionValue slots to ensure
they now occupy 12 bytes instead of 16 bytes. This reduces the
memory requirements and improves the performance. Since Kudu
tuples are populated using a memcopy, 4 bytes of padding was
added to StringSlots in Kudu tables.
Testing:
Ran core tests.
Added static asserts to ensure the value sizes are as expected.
Performance tests on TPCH-40 produced 3.96% improvement.
Change-Id: I32f3b06622c087e4aa288e8db1bf4581b10d386a
Reviewed-on: http://gerrit.cloudera.org:8080/11599
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The idea is to optimise the common case where there are long runs of
NULL or non-NULL values (i.e. the def level is repeated). We can
detect this cheaply by keying the decoding loop in the column reader
off the state of the def level RLE decoder - if there's a long run
of repeated levels, we can skip checking the def level for every
value. We still fall back to decoding, caching and reading
value-by-value a batch of def levels whenever the next def level is not
in a repeated run. We still use the old approach for decoding rep
levels. There might be some benefit to using the same approach for rep
levels *if* repeated def and rep level runs line up.
These changes should unlock further optimizations because more time is
spent in simple kernel functions, e.g. UnpackAndDecode32Values() for
dictionary decompression, which is very optimisable using SIMD etc.
Snappy decompression now seems to be the main CPU bottleneck for
decoding snappy-compressed Parquet.
Perf:
Running TPC-H scale factor 60 on uncompressed and snappy parquet
both showed a ~4% speedup overall.
Microbenchmarks on uncompressed parquet show scans only doing
dictionary decoding on uncompressed Parquet is ~75% faster:
set mt_dop=1;
select min(l_returnflag) from lineitem;
Testing:
We have alltypes agg with a mix of null and non-null.
Many tables have long runs of non-null values.
Added new test data and coverage:
* a test table manynulls with long runs of null values.
* a large CHAR test table
* missing coverage for materialising pos slot in flattened nested types
scan.
* Extended dict test to test longer runs.
* A larger version of complextypestbl with interesting collection
shapes - NULL collections, empty collections, etc, particularly runs
of collections with the same shape.
* Test interaction of timestamp validation with conversion
* Ran code coverage build to confirm all code paths are tested
* ASAN and exhaustive runs.
Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef
Reviewed-on: http://gerrit.cloudera.org:8080/8319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If explain_level is at 'extended' level or higher, then enhance the
output from the explain command. (1) Show the analyzed sql in the
explain header, this is the rewritten sql, which includes implicit
casts, and literals are printed with a cast so that their type is
visible. (2) When predicates are shown in the plan these are shown in
the same format.
The toSql() method can be called on a ParseNode tree to return
the sql corresponding ot the tree. In the past toSQl() has been
enhanced to print rewritten sql by partially overloading toSql() [with
toSql(boolean)]. This current change requires changing toSQl() in
many places as NumericLiteral can appear at different points in ia
parse tree. To avoid many new fragile overloads of toSql() I added
toSql(ToSqlOptions), where ToSqlOptions is an enum which controls the
form of the Sql that is returned. This changes many files but is safer
and means that any future options to toSql() can be added painlessly.
If SHOW_IMPLICIT_CASTS is passed to toSql() then
- in CastExpr print the implicit cast
- in NumericLiteral print the literal with a cast to show the type
Add a PlannerTestOption directive that will force the query text showing
implicit casts to be included in the PLAN section of a .test file.
The analyzed query text is wrapped at 80 characters. Note that the
analyzed query cannot always be executed as queries rewritten to use
LEFT SEMI JOIN are not legal sql. In addition some space characters may
be removed from the query for prettier display.
Documentation of this change will be done as IMPALA-7718
EXAMPLE OUTPUT:
[localhost:21000] default> set explain_level=2;
EXPLAIN_LEVEL set to 2
[localhost:21000] default> explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100;
Query: explain select * from functional_kudu.alltypestiny where bigint_col < 1000 / 100
Max Per-Host Resource Reservation: Memory=0B Threads=2
Per-Host Resource Estimates: Memory=10MB
Codegen disabled by planner
Analyzed query: SELECT * FROM functional_kudu.alltypestiny WHERE CAST(bigint_col
AS DOUBLE) < CAST(10 AS DOUBLE)
""
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=4.88MB mem-reservation=0B thread-reservation=2
PLAN-ROOT SINK
| mem-estimate=0B mem-reservation=0B thread-reservation=0
|
00:SCAN KUDU [functional_kudu.alltypestiny]
predicates: CAST(bigint_col AS DOUBLE) < CAST(10 AS DOUBLE)
mem-estimate=4.88MB mem-reservation=0B thread-reservation=1
tuple-ids=0 row-size=97B cardinality=1
in pipelines: 00(GETNEXT)
Fetched 16 row(s) in 0.03s
TESTING:
All end-to-end tests pass.
Added a new test in ExprRewriterTest which prints sql with implict casts
for some interesting queries.
Add a unit test for the code which wraps text at 60 characters.
The output of some Planner Tests in .test files has been updated to
include the Analyzed sql that is printed when explain_level is
at at least 'extended' level.
Change-Id: I55c3bdacc295137f66b2316a912fc347da30d6b0
Reviewed-on: http://gerrit.cloudera.org:8080/11719
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
test_resource_limits was failing in release build because the queries
used were finishing earlier than expected. This resulted in fragment
instances not being able to send enough updates to the coordinator in
order to hit the limits used for the tests. This patches adds a
deterministic sleep to the queries which gives enough time to the
coordinator to catch up on reports.
Testing:
Checked that tests passed on release builds.
Change-Id: I4a47391e52f3974db554dfc0d38139d3ee18a1b4
Reviewed-on: http://gerrit.cloudera.org:8080/11933
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Changes:
- parquet.thrift is updated to a newer version which contains the
timestamp logical type.
- INT64 columns with converted types TIMESTAMP_MILLIS and
TIMESTAMP_MICROS can be read as TIMESTAMP.
- If the logical type is timestamp, then the type will contain the
information whether the UTC->local conversion is necessary. This
feature is only supported for the new timestamp types, so INT96
timestamps must still use flag
convert_legacy_hive_parquet_utc_timestamps.
- Min/max stat filtering is enabled again for columns that need
UTC->local conversion. This was disabled in IMPALA-7559 because
it could incorrectly drop column chunks.
- CREATE TABLE LIKE PARQUET converts these columns to
TIMESTAMP - before the change, an error was returned instead.
- Bulk of the Parquet column stat logic was moved to a new class
called "ColumnStatsReader".
Testing:
- Added unit tests for timezone conversion (this needed a new public
function in timezone_db.h and adding CET to tzdb_tiny).
- Added parquet files (created with parquet-mr) with int64 timestamp
columns.
Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
Reviewed-on: http://gerrit.cloudera.org:8080/11057
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It turns out that Impala has a somewhat Baroque way to represent the
value of a numeric 0. NumericLiteral.toSql() uses the Java
BigDecimal class to convert a numeric value to a string for use in
explained plans and in verifying expression rewrites.
The default Java behavior is to consider scale when rendering numbers,
including 0. Thus, depending on precision and scale, you may get:
0
0.0
0.00
0.000
...
0E-38
However, mathematically, zero is zero. Plans attach no special meaning
to the extra decimal points or trailing zeros.
To make testing easier, changed the behavior to always emit "0" when the
value is zero, regardless of precision or scale.
Testing: Reran the planner tests and modified captured plans that had
the 0.0, 0.00 variations of zero.
Since this change affects only EXPLAIN output, it cannot affect the
operation of queries. If may impact other tests that compare EXPLAIN
output to a "golden" copy.
Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1
Reviewed-on: http://gerrit.cloudera.org:8080/11878
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-110 added support for queries with multiple DISTINCT aggregates
in a single select list. This patch adds queries to test this
functionality to our targeted-perf workloads and fixes some incorrect
return types in another targeted-perf aggregation query.
It also adds some targeted queries to the stress test by extending the
regex for stress test files to accept files of the form
'tpch-stress-*' and to allow for multiple tests per file.
Testing:
- Added an e2e test that runs the stress test file.
Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9
Reviewed-on: http://gerrit.cloudera.org:8080/11805
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before IMPALA-4063, the error message detected during
FragmentInstanceState::Close() was always lost. After
IMPALA-4063, we may sometimes get the error message in
FragmentInstanceState::Close(). It's non-deterministic
as the fragment instance thread may race with the query
state thread which reports the final status. The UDF test
currently tries to handle this non-determinism by using
"row_regex:.*" in the ERRORS section but it doesn't
always seem to work.
This change workarounds the issue by commenting out the
ERRORS section in udf-no-expr-rewrite.text for now.
The actual fix will be done in IMPALA-7829.
Change-Id: I6a55d5ad1a5a7278e7390f60854a8df28c1b9f28
Reviewed-on: http://gerrit.cloudera.org:8080/11900
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala assumes that when Reset() is called on an ExecNode, all of the
memory returned from that node by GetNext() has been attached to the
output RowBatch. In a query with a LIMIT on the subplan, such that
some nodes don't reach 'eos', this may not be the case.
The solution is to have Reset() take a RowBatch that any such memory
can be attached to. I examined all ExecNodes for resources being
transferred on 'eos' and added transferring of those resources in
Resst().
Testing:
- Added e2e tests that repro the issue for hash and nested loop joins.
Change-Id: I3968a379fcbb5d30fcec304995d3e44933dbbc77
Reviewed-on: http://gerrit.cloudera.org:8080/11852
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Previously, the AggregationNode calculated the estimated number
of rows based on input cardinality without accounting for the
division of input data across multiple fragment instances. This
bloated up the memory estimates for the node. After this change,
the AggregationNode accounts for the number of fragment instances
while estimating the number of rows per instance. A skew factor of
1.5 was added to account for data skew among multiple fragment
instances. This number was derived using empirical analysis of
real-world and benchmark (tpch, tpcds) queries.
Testing:
Tested queries with changed estimates to avoid cases of
significant underestimation of memory.
Ran front-end and end-to-end tests affected by this change.
Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Reviewed-on: http://gerrit.cloudera.org:8080/11854
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We need to carefully check that the intermediate value fits in an
int64_t and the final size fits in an int. If they don't we
raise an error and fail the query.
Testing:
Added a couple of backend tests to exercise the
overflow check code paths.
Change-Id: I872ce77bc2cb29116881c27ca2a5216f722cdb2a
Reviewed-on: http://gerrit.cloudera.org:8080/11889
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds a new query option 'topn_bytes_limit' that places a limit on the
number of estimated bytes that a TopN operator can process. If the
Impala planner estimates that a TopN operator will process more bytes
than this limit, it will replace the TopN operator with a sort operator.
Since the TopN operator cannot spill to disk, it has to buffer everything
in memory. This can cause frequent OOM issues when running with a large
limit + offset. Switching to a sort operator allows Impala to spill to
disk. We prefer to use the TopN operator when possible as it has better
performance than the sort operator for 'order by limit [offset]' queries.
The default limit is set to 512MB and is based on micro-benchmarking the
topn vs. sort operator for various limits (see the JIRA for full details).
The default is set to an intentionally high value in order to avoid
performance regressions.
Testing:
* Added a new planner test to fuctional-planner/ to validate that
'topn_bytes_limit' properly switches between topn and sort operators.
Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Reviewed-on: http://gerrit.cloudera.org:8080/11698
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
The constraint imposed by IMPALA-1354 was artificial.
If there are constant "partition by" expressions, simply drop them,
they are no-ops.
Constant "order by" expressions can be ignored as well, though in effect
they should be accounted for as null expressions in the backend, with the
effect that combine all rows in the same window (i.e. no window breaks).
Change-Id: Idf129026c45120e9470df601268863634037908c
Reviewed-on: http://gerrit.cloudera.org:8080/11556
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.
This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.
Testing done: exhaustive tests
This patch is based on a patch by Sailesh Mukil
Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Reviewed-on: http://gerrit.cloudera.org:8080/11615
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fixes an arithmetic overflow in ExchangeNode::GetNextMerging. Prior to
this patch, the code read:
int rows_to_keep = num_rows_skipped_ - offset_;
Where num_rows_skipped_ and offset_ were of type int64_t. The result was
cast to an int which can lead to an overflow if the result exceeds the
value of 2^31. The value of rows_to_keep would be passed into
row-batch.h::CopyRows which would crash due to a DCHECK_LE error.
This crash arises when the value of the OFFSET is a large number, for
example, the query:
select int_col from functional.alltypes order by 1 limit
1 offset 9223372036854775800;
Would crash the Impalad executor for this query.
The fix is to change rows_to_keep to an int64_t to avoid the overflow,
which prevents the DCHECK_LE from failing.
Change-Id: I8bb8064aae6ad25c8a19f6a8869086be7e70400a
Reviewed-on: http://gerrit.cloudera.org:8080/11844
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Similar to the treatment of NULLs, we want to consider NaN values
as equal when grouping.
- When detecting a NaN in a set of row values, the NaN value must
be converted to a canonical value - so that all NaN values have
the same bit-pattern for hashing purposes.
- When doing equality evaluation, floating point types must have
additional logic to consider NaN values as equal.
- Existing logic for handling NULLs in this way is appropriate for
triggering this behavior for NaN values.
- Relabel "force null equality" as "inclusive equality" to expand
the scope of the concept to a more generic form that includes NaN.
Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Reviewed-on: http://gerrit.cloudera.org:8080/11535
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans
There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.
This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.
In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().
Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.
Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The query text (for Q48) and substitution parameters didn't match the
TPC-DS standard for qualification queries. After fixing that, the
queries return the results expected by the TPC-DS standards.
Note that this may affect the performance of perf workloads running
tpcds-unmodified.
Change-Id: Ic7c737f68adf616738d6eb6e5a02593af25bcbaf
Reviewed-on: http://gerrit.cloudera.org:8080/11833
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The previous approach could lead to hangs or cryptic error messages
because it removed the ORC data type from a lookup table.
Instead check explicitly in the planner for ORC scans and throw a
more helpful error message.
Testing:
Added custom cluster test to exercise code and check error message.
Change-Id: I209e79b18745c48d0182800a916d6566083f4609
Reviewed-on: http://gerrit.cloudera.org:8080/11835
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch updates the HMS and Sentry run scripts to start HMS and
Sentry in debug mode in the minicluster to make it easier to debug
issues related to HMS and Sentry.
HMS debug port: 30010
Sentry debug port: 30020
Testing:
- Connected the debugger to both HMS and Sentry.
Change-Id: I29b025cbde36ef398ea36fbe69eff26e27d93e48
Reviewed-on: http://gerrit.cloudera.org:8080/11826
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The .test file parser implemented an unconventional method for parsing
single-quoted strings in comma-separated value format. This didn't handle
trailing commas in the string correctly.
This commit switches to using a conventional method for parsing
comma-separated value format:
* Commas enclosed by single quotes are not treated as field separators
* Single quotes can be escaped within a string by doubling them.
I looked into using Python's .csv module for this, but it wouldn't
work without modifying the test file format more because it
automatically discards the quotes during parsing, which are actually
semantically important in .test files. E.g. without the quotes we can't
distinguish between the literal string 'regex:...' and the regex
regex:....
Testing:
Ran exhaustive tests and fixed .test files that required modifications.
Will rerun before merging.
Added a couple of tests to exercise edge cases in the test file parser.
Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9
Reviewed-on: http://gerrit.cloudera.org:8080/11800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Prior to this change, the AggregationNode's perInstanceCardinality
was influenced by the node's selectivity and limit. This was
incorrect because the hash table is constructed over the entire
input stream before any row batches are produced. This change
ensures that the input cardinality is used to determine the
perInstanceCardinality.
Testing:
Added a planner test which ensures that an AggregationNode with a
limit estimates memory based on the input cardinality.
Ran front-end and end-to-end tests affected by this change.
Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc
Reviewed-on: http://gerrit.cloudera.org:8080/11806
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is very similar to IMPALA-7335, except happens
when 'progress_' is incremented in the call chain
HdfsScanNode::ProcessSplit
-> HdfsScanNodeBase::CreateAndOpenScanner()
-> HdfsScanner::Close()
The fix required restructuring the code so that
SetDoneInternal() is called with the error *before*
HdfsScanner::Close(). This required a refactoring because
HdfsScanNodeBase doesn't actually know about SetDoneInternal().
My fix is to put the common logic between HdfsScanNode and
HdfsScanNodeMt into a helper in HdfsScanNodeBase, then in
HdfsScanNode, make sure to call SetDoneInternal() before
closing the scanner.
I also reworked HdfsScanNode::ProcessSplit() to handle error propagation
internally. I think the joint responsibility between ProcessSplit() and
its caller for handling errors made things harder than necessary.
Testing:
Added a debug action and test that reproduced the race before the fix.
Change-Id: I45a61210ca7d057b048c77d9f2f2695ec450f19b
Reviewed-on: http://gerrit.cloudera.org:8080/11596
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The current location resolves to /user/hive/warehouse/chars_formats_*.
Impala's test data actually lives at /test-warehouse/chars_formats_*.
Tested this by reloading data from scratch and running the core tests.
Change-Id: I781b484e7a15ccaa5de590563d68b3dca6a658e5
Reviewed-on: http://gerrit.cloudera.org:8080/11789
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A few unversioned artifacts crept in over time without corresponding
.gitignore entries. These are the updates based on the git status output
on my dev env.
Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a
Reviewed-on: http://gerrit.cloudera.org:8080/11787
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After inserting all of its input into its Aggregators,
StreamingAggregationNode performs some cleanup, such as calling
InputDone() on each Aggregator.
Previously, StreamingAggregationNode only checked that all of the
child's batches had been fetched before doing this cleanup, which
causes problems if the final child batch isn't processed fully in a
single GetNext() call. In this case, multiple calls to InputDone()
lead to a DCHECK failure.
The solution is to only perform the cleanup once the final child batch
has been fully processed.
Testing:
- Added an e2e test with a query that hits this condition.
Change-Id: I851007a60472d0e53081c076c863c866c516677c
Reviewed-on: http://gerrit.cloudera.org:8080/11626
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>