We found that the tests of test_iceberg_query and test_iceberg_profile
fail after the patch for IMPALA-9741 has been merged and that it is due
to the default timezone of Impala not being UTC. This patch fixes the
issue by adding "SET TIMEZONE=UTC;" before those test queries are run.
Testing:
- Verified in a local development environment that the tests of
test_iceberg_query and test_iceberg_profile could pass after applying
this patch.
Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3
Reviewed-on: http://gerrit.cloudera.org:8080/16425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch mainly realizes the querying of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't specify this property in your SQL, default file format
is PARQUET.
We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When querying iceberg table, we pushdown
partition column predicates to iceberg to decide which data files
need to be scanned, and then transfer this information to BE to
do the real scan operation.
Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in test_scanners.py
Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Reviewed-on: http://gerrit.cloudera.org:8080/16143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implementing codegen for HiveUdfCall.
Testing:
Verified that java udf tests pass locally.
Benchmarks:
Used a UDF from TestUdf.java that adds three integers:
create function tpch15_parquet.sum3(int, int, int) returns int
location '/test-warehouse/impala-hive-udfs.jar'
symbol='org.apache.impala.TestUdf';
Used the following query on the master branch and the change's branch:
set num_nodes=1; set mt_dop=1;
select min(tpch15_parquet.sum3(cast(l_orderkey as int),
cast(l_partkey as int), cast(l_suppkey as int)))
from tpch15_parquet.lineitem;
Results averaged over 100 runs after warmup:
Master: 20.6346s, stddev: 0.3132411856765332
This change: 19.0256s, stddev: 0.42039019873436
This is a ~7.8% improvement.
Change-Id: I2f994dac550f297ed3c88491816403f237d4d747
Reviewed-on: http://gerrit.cloudera.org:8080/16314
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for constant propagation of range predicates
involving date and timestamp constants. Previously, only equality
predicates were considered for propagation. The new type of propagation
is shown by the following example:
Before constant propagation:
WHERE date_col = CAST(timestamp_col as DATE)
AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01'
After constant propagation:
WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01'
AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01'
AND date_col = CAST(timestamp_col as DATE)
As a consequence, since Impala supports table partitioning by date
columns but not timestamp columns, the above propagation enables
partition pruning based on timestamp ranges.
Existing code for equality based constant propagation was refactored
and consolidated into a new class which handles both equality and
range based constant propagation. Range based propagation is only
applied to date and timestamp columns.
Testing:
- Added new range constant propagation tests to PlannerTest.
- Added e2e test for range constant propagation based on a newly
added date partitioned table.
- Ran precommit tests.
Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Reviewed-on: http://gerrit.cloudera.org:8080/16346
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When Impala TRUNCATEs an ACID table, it creates a new base directory
with the hidden file "_empty" in it. Newer Hive versions ignore files
starting with underscore, therefore they ignore the whole base
directory.
To resolve this issue we can simply rename the empty file to "empty".
Testing:
* update acid-truncate.test accordingly
Change-Id: Ia0557b9944624bc123c540752bbe3877312a7ac9
Reviewed-on: http://gerrit.cloudera.org:8080/16396
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently Impala checks file metadata 'hive.acid.version' to decide the
full ACID schema. There are cases when Hive forgets to set this value
for full ACID files, e.g. query-based compactions.
So it's more robust to check the schema elements instead of the metadata
field. Also, sometimes Hive write the schema with different character
cases, e.g. originalTransaction vs originaltransaction, so we should
rather compare the column names in a case insensitive way.
Testing:
* added test for full ACID compaction
* added test_full_acid_schema_without_file_metadata_tag to test full
ACID file without metadata 'hive.acid.version'
Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14
Reviewed-on: http://gerrit.cloudera.org:8080/16383
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fix just handles the case where a column's cardinality is zero
however it's nullable and we have null stats to indicate there are null
values, therefore we adjust the cardinality from 0 to 1.
The cardinality of zero was especially problematic when calculating
cardinalities for multiple predicates with multiplication. The 0 would
propagate up the plan tree and result in poor plan choices such as
always using broadcast joins where shuffle would've been more optimal.
Testing:
* 26 Node TPC-DS 30TB run had better plans for Q4 and Q11
- Q4 172s -> 80s
- Q11 103s -> 77s
* CardinalityTest
* TpcdsPlannerTest
Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032
Reviewed-on: http://gerrit.cloudera.org:8080/16349
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
INTERSECT/EXCEPT are not duplicate preserving operations. The distinct
aggregations can happen in each operand, the leftmost operand only, or
after all the operands in a separate aggregation step. Except for a
couple special cases we would use the last strategy most often.
This change pushes the distinct aggregation down to the leftmost operand
in cases where there are no analytic functions, or when a distinct or
grouping operation already eliminates duplicates.
In general DISTINCT placement such as in this case should be done
throughout the entire plan tree in a cost based manner as described in
IMPALA-5260
Testing:
* TpcdsPlannerTest
* PlannerTest
* TPC-DS 30TB Perf run for any affected queries
- Q14-1 180s -> 150s
- Q14-2 109s -> 90s
- Q8 no significant change
* SetOperation Planner Tests
* Analyzer tests
* Tpcds Functional Workload
Change-Id: Ia248f1595df2ab48fbe70c778c7c32bde5c518a5
Reviewed-on: http://gerrit.cloudera.org:8080/16350
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This adds a BLOOM_FILTER_ERROR_RATE option that takes a
value between 0 and 1 (exclusive) that can override
the default target false positive probability (fpp)
value of 0.75 for selecting the filter size.
It does not affect whether filters are disabled
at runtime.
Adds estimated FPP and bloom size to the routing
table so we have some observability. Here is an
example:
tpch_kudu> select count(*) from customer join nation on n_nationkey = c_nationkey;
ID Src. Node Tgt. Node(s) Target type Partition filter Pending (Expected) First arrived Completed Enabled Bloom Size Est fpp
-----------------------------------------------------------------------------------------------------------------------------------------
1 2 0 LOCAL false 0 (3) N/A N/A true MIN_MAX
0 2 0 LOCAL false 0 (3) N/A N/A true 1.00 MB 1.04e-37
Testing:
Added a test that shows the query option affecting filter size.
Ran core tests.
Change-Id: Ifb123a0ea1e0e95d95df9837c1f0222fd60361f3
Reviewed-on: http://gerrit.cloudera.org:8080/16377
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Added TpcdsPlannerTest to include each TPC-DS query as a separate plan
test file. Removed the previous tpcds-all test file.
This means when running only PlannerTest no TPC-DS plans are checked,
however as part of a full frontend test run the TpcdsPlannerTest will be
included.
Runs with cardinality and resource checks, as well as using parquet
tables to include predicate pushdowns.
Change-Id: Ibaf40d8b783be1dc7b62ba3269feb034cb8047da
Reviewed-on: http://gerrit.cloudera.org:8080/16345
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This is the support for Cumulative Distribution Function (CDF) from
Apache DataSketches KLL algorithm collection. It receives a serialized
KLL sketch and one or more float values to represent ranges in the
sketched values.
E.g. [1, 5, 10] will mean the following ranges:
(-inf, 1), (-inf, 5), (-inf, 10), (-inf, +inf)
Returns a comma separated string where each value in the string is a
number in the range of [0,1] and shows that what percentage of the
data is in the particular ranges.
Note, ds_kll_cdf() should return an Array of doubles as the result but
with that we have to wait for the complex type support. Until, we
provide ds_kll_cdf_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.
Example:
select ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10)
from alltypes;
+----------------------------------------------------------+
| ds_kll_cdf_as_string(ds_kll_sketch(float_col), 2, 4, 10) |
+----------------------------------------------------------+
| 0.2,0.401644,1,1 |
+----------------------------------------------------------+
Change-Id: I77e6afc4556ad05a295b89f6d06c2e4a6bb2cf82
Reviewed-on: http://gerrit.cloudera.org:8080/16359
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is the support for Probabilistic Mass Function (PMF) from Apache
DataSketches KLL algorithm collection. It receives a serialized KLL
sketch and one or more float values to represent ranges in the
sketched values.
E.g. [1, 5, 10] will mean the following ranges:
(-inf, 1), [1, 5), [5, 10), [10, +inf)
Returns a comma separated string where each value in the string is a
number in the range of [0,1] and shows that what percentage of the
data is in the particular ranges.
Note, ds_kll_pmf() should return an Array of doubles as the result but
with that we have to wait for the complex type support. Until, we
provide ds_kll_pmf_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.
Example:
select ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10)
from alltypes;
+----------------------------------------------------------+
| ds_kll_pmf_as_string(ds_kll_sketch(float_col), 2, 4, 10) |
+----------------------------------------------------------+
| 0.202192,0.199452,0.598356,0 |
+----------------------------------------------------------+
Change-Id: I222402f2dce2f49ab2b3f6e81a709da5539293ba
Reviewed-on: http://gerrit.cloudera.org:8080/16336
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The bug was the the statement rewriter converted NOT IN <subquery>
predicates to != <subquery> predicates when the subquery could
be an empty set. This was invalid, because NOT IN (<empty set>)
is true, but != (<empty set>) is false.
Testing:
Added targeted planner and end-to-end tests.
Ran exhaustive tests.
Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc
Reviewed-on: http://gerrit.cloudera.org:8080/16338
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This function is very similar to ds_kll_quantile() but this one can
receive any number of rank parameters and returns a comma separated
string that holds the results for all of the given ranks.
For more details about ds_kll_quantile() see IMPALA-9959.
Note, ds_kll_quantiles() should return an Array of floats as the result
but with that we have to wait for the complex type support. Until, we
provide ds_kll_quantiles_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.
Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f
Reviewed-on: http://gerrit.cloudera.org:8080/16324
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.
To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
spilling a partition, we have to save a large page worth of
reservation for the serialize stream, in case it needs to write large
rows. This space can be restored when all the partitions are spilled
so the serialize stream is not needed until we build/repartition a
spilled partition and thus have pinned partitions again. If the large
write page reservation is used, we save it back whenever possible
after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
restore the large write page reservation whenever we fail to add a
large row, before spilling any partitions. Reclaim it whenever
possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.
For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.
Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
patch. Revise the query to make the udf allocate larger strings so it
can consistently pass.
- Run CORE tests.
Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This fix addresses the current limitation in that an ill-formatted
Parquet version string is not properly formatted before appearing
in an error message or impalad.INFO. With the fix, any such string is
converted to a hex string first. The hex string is a sequence of
four hex digit groups separated by spaces and each group is one or
two hex digits, such as "6c 65 2e a".
Testing:
Ran "core" tests successfully.
Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1
Reviewed-on: http://gerrit.cloudera.org:8080/16331
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Include remaining TPC-DS queries to the testdata workload definition.
Q8 and Q38 were using non standard variants, those have been
replaced by the official query versions. Q35 is using an official
variant. Had to escape a table alias in Q90 as we treat 'AT' as a
reserved keyword.
Change-Id: Id5436689390f149694f14e6da1df624de4f5f7ad
Reviewed-on: http://gerrit.cloudera.org:8080/16280
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
In this test there is no need to check for "Error parsing row"
since the "Error converting column" is enought to be sure we are
no longer able to read dateless timestamps.
Change-Id: Ia97490288dae81561969d260739a07ec42571f48
Reviewed-on: http://gerrit.cloudera.org:8080/16334
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This work addresses the current limitation in computing the total row
count for a Hive table in a scan. The row count can be incorrectly
computed as 0, even though there exists data in the Hive table. This
is the stats corruption at table level. Similar stats corruption
exists for a partition. The row count of a table or a partition
sometime can also be -1 which indicates a missing stats situation.
In the fix, as long as no partition in a Hive table exhibits any
missing or corrupt stats, the total row count for the table is computed
from the row counts in all partitions. Otherwise, Impala looks at
the table level stats particularly the table row count.
In addition, if the table stats is missing or corrupted, Impala
estimates a row count for the table, if feasible. This row count is
the sum of the row count from the partitions with good stats, and
an estimation of the number of rows in the partitions with missing or
corrupt stats. Such estimation also applies when some partition
has corrupt stats.
One way to observe the fix is through the explain of queries scanning
Hive tables with missing or corrupted stats. The cardinality for any
full scan should be a positive value (i.e. the estimated row count),
instead of 'unavailable'. At the beginning of the explain output,
that table is still listed in the WARNING section for potentially
corrupt table statistics.
Testing:
1. Ran unit tests with queries documented in the case against Hive
tables with the following configrations:
a. No stats corruption in any partitions
b. Stats corruption in some partitions
c. Stats corruption in all partitions
2. Added two new tests in test_compute_stats.py:
a. test_corrupted_stats_in_partitioned_Hive_tables
b. test_corrupted_stats_in_unpartitioned_Hive_tables
3. Fixed failures in corrupt-stats.test
4. Ran "core" test
Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Reviewed-on: http://gerrit.cloudera.org:8080/16098
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
for example, base64decode('YWxwaGE%') will return
'alpha\377' in newer os which has newer sasl library.
I tested it on Ubuntu 18.04 aarch64 version.
Change-Id: Ib9bd9e03d5f744c18c957cdaf2064fa918086004
Reviewed-on: http://gerrit.cloudera.org:8080/16175
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
This implements scanning full ACID tables that contain complex types.
The same technique works that we use for primitive types. I.e. we add
a LEFT ANTI JOIN on top of the Hdfs scan node in order to subtract
the deleted rows from the inserted rows.
However, there were some types of queries where we couldn't do that.
These are the queries that scan the nested collection items directly.
E.g.: SELECT item FROM complextypestbl.int_array;
The above query only creates a single tuple descriptor that holds the
collection items. Since this tuple descriptor is not at the table-level,
we cannot add slot references to the hidden ACID column which are at the
top level of the table schema.
To resolve this I added a statement rewriter that rewrites the above
statement to the following:
SELECT item FROM complextypestbl $a$1, $a$1.int_array;
Now in this example we'll have two tuple descriptors, one for the
table-level, and one for the collection item. So we can add the ACID
slot refs to the table-level tuple descriptor. The rewrite is
implemented by the new AcidRewriter class.
Performance
I executed the following query with num_nodes=1 on a non-transactional
table (without the rewrite), and on an ACID table (with the rewrite):
select count(*) from customer_nested.c_orders.o_lineitems;
Without the rewrite:
Fetched 1 row(s) in 0.41s
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
| F00:ROOT | 1 | 1 | 13.61us | 13.61us | | | 0 B | 0 B | |
| 01:AGGREGATE | 1 | 1 | 3.68ms | 3.68ms | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE |
| 00:SCAN HDFS | 1 | 1 | 280.47ms | 280.47ms | 6.00M | 15.00M | 56.98 MB | 8.00 MB | tpch_nested_orc_def.customer.c_orders.o_lineitems |
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
With the rewrite:
Fetched 1 row(s) in 0.42s
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
| F00:ROOT | 1 | 1 | 25.16us | 25.16us | | | 0 B | 0 B | |
| 05:AGGREGATE | 1 | 1 | 3.44ms | 3.44ms | 1 | 1 | 63.00 KB | 10.00 MB | FINALIZE |
| 01:SUBPLAN | 1 | 1 | 16.52ms | 16.52ms | 6.00M | 125.92M | 47.00 KB | 0 B | |
| |--04:NESTED LOOP JOIN | 1 | 1 | 188.47ms | 188.47ms | 0 | 10 | 24.00 KB | 12 B | CROSS JOIN |
| | |--02:SINGULAR ROW SRC | 1 | 1 | 0ns | 0ns | 0 | 1 | 0 B | 0 B | |
| | 03:UNNEST | 1 | 1 | 25.37ms | 25.37ms | 0 | 10 | 0 B | 0 B | $a$1.c_orders.o_lineitems o_lineitems |
| 00:SCAN HDFS | 1 | 1 | 96.26ms | 96.26ms | 100.00K | 12.59M | 38.19 MB | 72.00 MB | default.customer_nested $a$1 |
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
So the overhead is very little.
Testing
* Added planner tests to PlannerTest/acid-scans.test
* E2E query tests to QueryTest/full-acid-complex-type-scans.test
* E2E tests for rowid-generation: QueryTest/full-acid-rowid.test
Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f
Reviewed-on: http://gerrit.cloudera.org:8080/16228
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.
An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
SELECT
ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
FROM sketch_tbl
WHERE partition_col=1 OR partition_col=5;
Testing:
- Apart from the automated tests I added to this patch I also
tested ds_kll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_kll_union() on those sketches.
Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Reviewed-on: http://gerrit.cloudera.org:8080/16267
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds a marker to runtime profiles and explain plans indicating if custom
(e.g. non-built in) user-defined functions are being used. For explain
plans, a SQL-style comment is added after any function call. For runtime
profiles, a new Frontend entry called "User Defined Functions (UDFs)"
lists out all UDFs analyzed during planning.
Take the following example:
create function hive_lower(string) returns string location
'/test-warehouse/hive-exec.jar'
symbol='org.apache.hadoop.hive.ql.udf.UDFLower';
set explain_level=3;
explain select * from functional.alltypes order by hive_lower(string_col);
...
01:SORT
order by: default.hive_lower(string_col) /* JAVA UDF */ ASC
materialized: default.hive_lower(string_col) /* JAVA UDF */
...
This shows up in the runtime profile as well.
When the above query is actually run, the runtime profile includes the
following entry:
Frontend
User Defined Functions (UDFs): default.hive_lower
Error messages will also include SQL-style comments about any UDFs used.
For example:
select aggfn(int_col) over (partition by int_col) from
functional.alltypesagg
Throws:
Aggregate function 'default.aggfn(int_col) /* NATIVE UDF */' not
supported with OVER clause.
Testing:
* Added tests to test_udfs.py
* Ran core tests
Change-Id: I79122e6cc74fd5a62c76962289a1615fbac2f345
Reviewed-on: http://gerrit.cloudera.org:8080/16188
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ds_kll_rank() receives two parameters: a STRING that represents a
serialized DataSketches KLL sketch and a float to provide a probing
value in the sketch.
Returns a DOUBLE that is the rank of the given probing value in the
range of [0,1]. E.g. a return value of 0.2 means that the probing value
given as parameter is greater than the 20% of all the values in the
sketch. Note, this is an approximate calculation.
Change-Id: I95857886dfbb8c84aeeaf718c0e610012fda4be0
Reviewed-on: http://gerrit.cloudera.org:8080/16283
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a new query option MAX_FS_WRITERS that limits the
number of HDFS writer instances.
Highlights:
- Depending on the plan, it either restricts the num of instances of
the root fragment or adds an exchange and then limits the num of
instances of that.
- Assigns instances evenly across available backends.
- "no-shuffle" query hint is ignored when using query option.
- Change in behavior of plans is only when this query option is used.
- The only exception to the previous point is that the optimization
logic that decides to add an exchange now looks at the num of
instances instead of the number of nodes.
Limitation:
A mismatch of cluster state during query planning and scheduling can
result in more or less fragment instances to be scheduled than
expected. Eg. If max_fs_writers in 2 and the planner sees only 2
executors then it might not add an exchange between a scan node and
the table sink, but during scheduling if there are 3 nodes then that
scan+tablesink instance will be scheduled on 3 backends.
Testing:
- Added planner tests to cover all cases where this enforcement kicks
in and to highlight the behavior.
- Added e2e tests to confirm that the scheduler is enforcing the limit
and distributing the instance evenly across backends for different
plan shapes.
Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Reviewed-on: http://gerrit.cloudera.org:8080/16204
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner. When the pushdown is
applicable, the analytic sort is converted to a TopN sort.
Further, this is split into a bottom TopN and an upper
TopN separated by a hash partition exchange. This
ensures that the limit is applied as early as possible
before hash partitioning.
Fixed couple of additional related issues uncovered as a
result of limit pushdown:
- Changed the analytic sort's partition-by expr sort
semantic from NULLS FIRST to NULLS LAST to ensure
correctness in the presence of limit.
- The LIMIT on the analytic sort node was causing it to
be treated as a merging point in the distributed planner.
Fixed it by introducing an api allowPartitioned() in the
PlanNode.
Testing:
- Ran PlannerTest and updated several EXPLAIN plans.
- Added Planner tests for both positive and negative cases of
limit pushdown.
- Ran end-to-end TPC-DS queries. Specifically tested
TPC-DS q67 for limit pushdown and result correctness.
- Added targeted end-to-end tests using TPC-H dataset.
Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Reviewed-on: http://gerrit.cloudera.org:8080/16219
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.
We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.
A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.
Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt
Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Reviewed-on: http://gerrit.cloudera.org:8080/16123
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
ds_kll_sketch() is an aggregate function that receives a float
parameter (e.g. a float column of a table) and returns a serialized
Apache DataSketches KLL sketch of the input data set wrapped into
STRING type. This sketch can be saved into a table or view and later
used for quantile approximations. ds_kll_quantile() receives two
parameters: a STRING parameter that contains a serialized KLL sketch
and a DOUBLE that represents the rank of the quantile in the range of
[0,1]. E.g. rank=0.1 means the approximate value in the sketch where
10% of the sketched items are less than or equals to this value.
Testing:
- Added automated tests on small data sets to check the basic
functionality of sketching and getting a quantile approximate.
- Tested on TPCH25_parquet.lineitem to check that sketching and
approximating works on bigger scale as well where serialize/merge
phases are also required. On this scale the error range of the
quantile approximation is within 1-1.5%
Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Reviewed-on: http://gerrit.cloudera.org:8080/16235
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In Hive empty strings doesn't count as separate values when querying
count(distinct) estimates using Apache DataSketches HLL algorithm
on strings and varchars.
For compatibility's sake Impala should not take it either.
Tests:
-added extra tests for hll with empty strings
Change-Id: Ie7648217bbe2f66b817788f131c062f349b1e9ad
Reviewed-on: http://gerrit.cloudera.org:8080/16226
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.
An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
SELECT
ds_hll_estimate(ds_hll_union(sketch_col))
FROM sketch_tbl
WHERE partition_col=1 OR partition_col=5;
Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939
Testing:
- Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.
Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Reviewed-on: http://gerrit.cloudera.org:8080/16095
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change bumps up the CDP_BUILD_NUMBER to 4493826. This is needed
to fix a failing test.
Hive started to assign bucket ids to files differently. Because of
that I had to modify the test_full_acid_rowid test that had an
assumption about how bucket ids are assigned to files.
If you have problems restarting the Hive Metastore, try the following:
buildall.sh <your usual flags> -upgrade_metastore_db
If you have problems restarting Kudu, try the following:
Unset LD_LIBRARY_PATH in your shell, and stop setting it in
impala-config-local.sh
Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Reviewed-on: http://gerrit.cloudera.org:8080/16195
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
When generating plans with left semi/anti joins (typically
resulting from subquery rewrites), the planner now
considers inserting a distinct aggregation on the inner
side of the join. The decision is based on whether that
aggregation would reduce the number of rows by more than
75%. This is fairly conservative and the optimization
might be beneficial for smaller reductions, but the
conservative threshold is chosen to reduce the number
of potential plan regressions.
The aggregation can both reduce the # of rows and the
width of the rows, by projecting out unneeded slots.
ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option is
added to allow toggling the optimization.
Tests:
* Add positive and negative planner tests for various
cases - including semi/anti joins, missing stats,
broadcast/shuffle, different numbers of join predicates.
* Add some end-to-end tests to verify plans execute correctly.
Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Reviewed-on: http://gerrit.cloudera.org:8080/16180
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.
(UPDATE operations are implemented via DELETE+INSERT.)
In the filesystem it looks like e.g.:
* full_acid/delta_0000001_0000001_0000/0000_0
* full_acid/delta_0000002_0000002_0000/0000_0
* full_acid/delete_delta_0000003_0000003_0000/0000_0
During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.
Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.
The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.
Testing:
* added planner test
* added e2e tests
Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Reviewed-on: http://gerrit.cloudera.org:8080/16082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements grouping() and grouping_id() builtins.
grouping_id() has both a no-arg version, which returns
a bit vector of all grouping exprs and a varargs version,
which returns a bit vector of the provided arguments.
Grouping is a keyword, so needs special handling in the
parser to be accepted as a function name.
These functions are implemented in the transpose agg
with a CASE expression similar to other aggregate functions,
but returning the grouping() or grouping_id() value for that
aggregation class instead of an aggregated value.
Testing:
* Added parser test for grouping keyword.
* Added analysis tests for the functions.
* Added basic planner test to show expressions generated
* Added some TPC-DS queries that use grouping() - queries
80, 70 and 86 using reference .test files from Fang-Yu
Rao. 27 and 36 were added with reference results from
https://github.com/cwida/tpcds-result-reproduction
* Add targeted end-to-end tests.
* Added view compatibility test with Hive.
Change-Id: If0b1640d606256c0fe9204d2a21a8f6d06abcdb6
Reviewed-on: http://gerrit.cloudera.org:8080/16140
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Integrates the parsing and analysis with plan generation.
Testing:
* Add analysis test to make sure we reject unsupported queries.
* Added targeted planner tests to ensure we generate the correct
aggregation classes for a variety of cases.
* Add targeted end-to-end functional tests.
Added five TPC-DS queries that use ROLLUP, building on some work done
by Fang-Yu Rao. Some tweaks were required for these tests.
* Add an extra ORDER BY clause to q77 to make fully deterministic.
* Add backticks around `returns` to avoid reserved word.
* Add INTERVAL keyword to date/timestamp arithmetic.
We can run q80, too, but I haven't added or verified results yet -
that can be done in a follow-up.
Change-Id: Ie454c5bf7aee266321dee615548d7f2b71380197
Reviewed-on: http://gerrit.cloudera.org:8080/16128
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
The patch for IMPALA-8954 failed to account for subqueries
that could produce < 1 row. SelectStmt.returnsSingleRow()
is confusing because it actually returns true if it
returns *at most* one row.
As a fix I split it into returnsExactlyOneRow() and
returnsAtMostOneRow(), then used returnsExactlyOneRow()
to determine if the subquery should instead be rewritten
into a LEFT OUTER JOIN, which produces the correct result.
CROSS JOIN is still preferred because it can be more freely
reordered during planning.
Testing:
* Added planner tests for a range of scenarios where it can
be rewritten as a CROSS JOIN and where it needs to be a LEFT
OUTER JOIN for correctness.
* Added some targeted end-to-end tests where the results were
previously incorrect. Checked the behaviour against Hive and
postgres.
Ran exhaustive tests.
Change-Id: I6034aedac776783bdc8cdb3a2df344e2b3662da6
Reviewed-on: http://gerrit.cloudera.org:8080/16171
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch supports a subset of cases of subqueries
inside OR inside WHERE and HAVING clauses.
The approach used is to rewrite the subquery into
a many-to-one LEFT OUTER JOIN with the subquery and
then replace the subquery in the expression with a
reference to the single select list expressions of
the subquery. This works because:
* A many-to-one LEFT OUTER JOIN returns one output row
for each left input row, meaning that for every row
in the original query before the rewrite, we get
the same row plus a single matched row from the subquery
* Expressions can be rewritten to refer to a slotref from
the right side of the LEFT OUTER JOIN without affecting
semantics. E.g. an IN subquery becomes <slot> IS NOT NULL
or <operator> (<subquery>) becomes <operator> <slot>.
This does not affect SELECT list subqueries, which are
rewritten using a different mechanism that can already
support some subqueries in disjuncts.
Correlated and uncorrelated subqueries are both supported, but
various limitations are present.
Limitations:
* Only one subquery per predicate is supported. The rewriting approach
should generalize to multiple subqueries but other code needs
refactoring to handle this case.
* EXISTS and NOT EXISTS subqueries are not supported. The rewriting
approach can generalise to that, but we need to add or pick a
select list item from the subquery to check for NULL/IS NOT NULL
and a little more work is required to do that correctly.
* NOT IN is not supported because of the special NULL semantics.
* Subqueries with aggregates + grouping by are not supported because
we rely on adding distinct to select list and we don't
support distinct + aggregations because of IMPALA-5098.
Tests:
* Positive analysis tests for IN and binary predicate operators.
* Negative analysis tests for unsupported subquery operators.
* Negative analysis tests for multiple subqueries.
* Negative analysis tests for runtime scalar subqueries.
* Positive and negative analysis tests for aggregations in subquery.
* TPC-DS Query 45 planner and query tests
* Targeted planner tests for various supported queries.
* Targeted functional tests to confirm plans are executable and
return correct result. These exercise a mix of the supported
features - correlated/correlated, aggregate functions,
EXISTS/comparator, etc.
* Tests for BETWEEN predicate, which is supported as a side-effect
of being rewritten during analysis.
Change-Id: I64588992901afd7cd885419a0b7f949b0b174976
Reviewed-on: http://gerrit.cloudera.org:8080/16152
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Removed the support for dateless timestamps.
During dateless timestamp casts if the format doesn't contain
date part we get an error during tokenization of the format.
If the input str doesn't contain a date part then we get null result.
Examples:
select cast('01:02:59' as timestamp);
This will come back as NULL value.
select to_timestamp('01:01:01', 'HH:mm:ss');
select cast('01:02:59' as timestamp format 'HH12:MI:SS');
select cast('12 AM' as timestamp FORMAT 'AM.HH12');
These will come back with a parsing errors.
Casting from a table will generate similar results.
Testing:
Modified the previous tests related to dateless timestamps.
Added test to read fromtables which are still containing dateless
timestamps and covered timestamp to string path when no date tokens
are requested in the output string.
Change-Id: I48c49bf027cc4b917849b3d58518facba372b322
Reviewed-on: http://gerrit.cloudera.org:8080/15866
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
These functions can be used to get cardinality estimates of data
using HLL algorithm from Apache DataSketches. ds_hll_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized HLL sketch in string format. This can be written to a
table or be fed directly to ds_hll_estimate() that returns the
cardinality estimate for that sketch.
Comparing to ndv() these functions bring more flexibility as once we
fed data to the sketch it can be written to a table and next time we
can save scanning through the dataset and simply return the estimate
using the sketch. This doesn't come for free, however, as perfomance
measurements show that ndv() is 2x-3.5x faster than sketching. On the
other hand if we query the estimate from an existing sketch then the
runtime is negligible.
Another flexibility with these sketches is that they can be merged
together so e.g. if we had saved a sketch for each of the partitions
of a table then they can be combined with each other based on the
query without touching the actual data.
DataSketches HLL is sensitive for the order of the data fed to the
sketch and as a result running these algorithms in Impala gets
non-deterministic results within the error bounds of the algorithm.
In terms of correctness DataSketches HLL is most of the time in 2%
range from the correct result but there are occasional spikes where
the difference is bigger but never goes out of the range of 5%.
Even though the DataSketches HLL algorithm could be parameterized
currently this implementation hard-codes these parameters and use
HLL_4 and lg_k=12.
For more details about Apache DataSketches' HLL implementation see:
https://datasketches.apache.org/docs/HLL/HLL.html
Testing:
- Added some tests running estimates for small datasets where the
amount of data is small enough to get the correct results.
- Ran manual tests on TPCH25.lineitem to compare perfomance with
ndv(). Depending on data characteristics ndv() appears 2x-3.5x
faster. The lower the cardinality of the dataset the bigger the
difference between the 2 algorithms is.
- Ran manual tests on TPCH25.lineitem and
functional_parquet.alltypes to compare correctness with ndv(). See
results above.
Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Reviewed-on: http://gerrit.cloudera.org:8080/16000
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Support rewriting subqueries in the HAVING clause by nesting the
aggregation query and pulling up the subquery predicates into the outer
WHERE clause.
Testing:
* New analyzer tests
* New functional subquery tests
* Added Q23, Q24 and Q44 to the tpcds workload
* Ran subquery rewrite tests
Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
Reviewed-on: http://gerrit.cloudera.org:8080/16052
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Extend StmtRewriter with the ability to rewrite scalar subqueries in the
select list into cross joins. Currently the subquery must pass plan-time
checks to determine that it returns a single row which may miss cases
that may be valid at runtime or with more complex evaluation of the
predicate expressions in the planner. Support for correlated subqueries
will be a follow on change.
Testing:
* Added new analyzer tests, updated previous subquery tests
* test_queries.py::TestQueries::test_subquery
* Added test_tpcds_q9 to e2e and planner tests
Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
Reviewed-on: http://gerrit.cloudera.org:8080/16007
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
When null count is 0, the IsNullPredicate's selectivity
was not being computed since it did not distinguish
between a -1 (no stats) vs a 0 null count. This caused
a default selectivity estimate being applied. This
patch fixes it by explicitly checking whether nulls
count stat is present and if so, use it regardless of
whether it is 0 or more.
Testing:
- Added cardinality tests for IS NULL and IS NOT NULL.
- Ran PlannerTest and updated baseline plans.
- Updated expected selectivity for null predicate tests
in ExprCardinalityTest.
- Ran precommit tests through gerrit-verify-dryrun
Change-Id: I46c084be780b8f5aead9e2b9656fbab6cc8c8874
Reviewed-on: http://gerrit.cloudera.org:8080/16131
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds the following 12 TPCDS queries to the class of
TestTpcdsDecimalV2Query: Q26, Q30, Q31, Q47, Q48, Q57, Q58, Q59, Q63,
Q83, Q85, and Q89. All the queries except for Q31 are added to the class
of TestTpcdsQuery as well because Impala returns one fewer row than
expected for TestTpcdsQuery::test_tpcds_q31(), which requires further
investigation.
To verify whether or not the returned result set from Impala for a given
query is correct, we compare the result set with that produced by the
HiveServer2 (HS2) in Impala's mini-cluster. We could execute SQL
statements in HS2 via Beeline, HS2's command line shell, which could be
launched by the following command.
beeline -u "jdbc:hive2://localhost:11050/default"
We note that among these 12 queries, the execution of Q31, Q58, and Q83
result in the error of "Counters limit exceeded" by TEZ. To work around
this problem, for these 3 queries we have to execute the following
statement before running them to increase the default number of
counters, which is set to 120.
set tez.counters.max=1200
On the other hand, the table of 'reason' is referenced by Q85. This
table was not referenced by any TPCDS query before this patch and thus
was not created. In this regard, in this patch we also modify
tpcds_schema_template.sql to create this additional table along with its
data.
Testing:
- Verified that this patch passes the exhaustive tests in the DEBUG
build.
Change-Id: Ib5f260e75a3803aabe9ccef271ba94036f96e5cf
Reviewed-on: http://gerrit.cloudera.org:8080/16119
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
"Original files" are files that don't have full ACID schema. We can see
such files if we upgrade a non-ACID table to full ACID. Also, the LOAD
DATA statement can load non-ACID files into full ACID tables. So such
files don't store special ACID columns, that means we need
to auto-generate their values. These are (operation,
originalTransaction, bucket, rowid, and currentTransaction).
With the exception of 'rowid', all of them can be calculated based on
the file path, so I add their values to the scanner's template tuple.
'rowid' is the ordinal number of the row inside a bucket inside a
directory. For now Impala only allows one file per bucket per
directory. Therefore we can generate row ids for each file
independently.
Multiple files in a single bucket in a directory can only be present if
the table was non-transactional earlier and we upgraded it to full ACID
table. After the first compaction we should only see one original file
per bucket per directory.
In HdfsOrcScanner we calculate the first row id for our split then
the OrcStructReader fills the rowid slot with the proper values.
Testing:
* added e2e tests to check if the generated values are correct
* added e2e test to reject tables that have multiple files per bucket
* added unit tests to the new auxiliary functions
Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Reviewed-on: http://gerrit.cloudera.org:8080/16001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>