During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.
This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.
Testing:
- Passed core tests.
- Ran a single node benchmark which shows no regression.
- Updated row-batch-serialize-test and row-batch-serialize-benchmark
to test the row-batch serialization used by KRPC.
- Manually collected query-profile, heap growth, and memory usage log
showing untracked memory decreased by 1/2.
- Add test_datastream_sender.py to verify the peak memory of EXCHANGE
SENDER node.
- Raise mem_limit in two of test_spilling_large_rows test case.
- Print test line number in PlannerTestBase.java
New row-batch serialization benchmark:
Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize: 10% 50% 90% 10% 50% 90%
(rel) (rel) (rel)
-------------------------------------------------------------
ser_no_dups_base 18.6 18.8 18.9 1X 1X 1X
ser_no_dups 18.5 18.5 18.8 0.998X 0.988X 0.991X
ser_no_dups_full 14.7 14.8 14.8 0.793X 0.79X 0.783X
ser_adj_dups_base 28.2 28.4 28.8 1X 1X 1X
ser_adj_dups 68.9 69.1 69.8 2.44X 2.43X 2.43X
ser_adj_dups_full 56.2 56.7 57.1 1.99X 2X 1.99X
ser_dups_base 20.7 20.9 20.9 1X 1X 1X
ser_dups 20.6 20.8 20.9 0.994X 0.995X 1X
ser_dups_full 39.8 40 40.5 1.93X 1.92X 1.94X
deserialize: 10% 50% 90% 10% 50% 90%
(rel) (rel) (rel)
-------------------------------------------------------------
deser_no_dups_base 75.9 76.6 77 1X 1X 1X
deser_no_dups 74.9 75.6 76 0.987X 0.987X 0.987X
deser_adj_dups_base 127 128 129 1X 1X 1X
deser_adj_dups 179 193 195 1.41X 1.51X 1.51X
deser_dups_base 128 128 129 1X 1X 1X
deser_dups 165 190 193 1.29X 1.48X 1.49X
Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Reviewed-on: http://gerrit.cloudera.org:8080/18798
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Similarly to HIVE-26596 Impala should set merge-on-read write mode
for V2 tables, unless otherwise specified:
* during table creation with 'format-version'='2'
* during alter table set tblproperties 'format-version'='2'
We do so because in the foreseeable future Impala will only support
merge-on-read (on the write-side, on the read side copy-on-write is
also supported). Also, currently Hive only supports merge-on-read.
Testing:
* e2e tests added
Change-Id: Icaa32472cde98e21fb23f5461175db1bf401db3d
Reviewed-on: http://gerrit.cloudera.org:8080/19138
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In IMPALA-11492, ExprTest.Utf8MaskTest was failing on some
configurations because the en_US.UTF-8 was missing. Since the
Docker images don't contain en_US.UTF-8, they are subject
to the same bug. This was confirmed by adding tests cases
to the test_utf8_strings.py end-to-end test and running it
in the dockerized tests.
This add the appropriate language pack to the list of packages
installed for the Docker build.
Testing:
- This adds end-to-end tests to test_utf8_strings.py covering the
same cases that were failing in ExprTest.Utf8MaskTest. They
failed without the added languages packs, and now succeed.
Change-Id: I353f257b3cb6d45f7d0a28f7d5319fdb457e6e3d
Reviewed-on: http://gerrit.cloudera.org:8080/19080
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reverts support for o3fs as a default filesystem added in IMPALA-9442.
Updates test setup to use ofs instead.
Munges absolute paths in Iceberg metadata to match the new location
required for ofs. Ozone has strict requirements on volume and bucket
names, so all tables must be created within a bucket (e.g. inside
/impala/test-warehouse/).
Change-Id: I45e90d30b2e68876dec0db3c43ac15ee510b17bd
Reviewed-on: http://gerrit.cloudera.org:8080/19001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A query that returns at most one row can run more efficiently without
result spooling. If result spooling is enabled, it will set the
minimum memory reservation in PlanRootSink, e.g. 'select 1' minimum
memory reservation is 4MB.
This optimization can reduce the statement's resource reservation and
prevent the exception 'Failed to get minimum memory reservation' when
the host memory limit not available.
Testing:
- Add tests in result-spooling.test
Change-Id: Icd4d73c21106048df68a270cf03d4abd56bd3aac
Reviewed-on: http://gerrit.cloudera.org:8080/18711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add TBLPROPERTIES support to the view, here are some examples:
CREATE VIEW [IF NOT EXISTS] [database_name.]view_name
[(column_name [COMMENT 'column_comment'][, ...])]
[COMMENT 'view_comment']
[TBLPROPERTIES (property_name = property_value, ...)]
AS select_statement;
ALTER VIEW [database_name.]view_name SET TBLPROPERTIES
(property_name = property_value, ...);
ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES
(property_name, ...);
Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
Reviewed-on: http://gerrit.cloudera.org:8080/18940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds table sampling functionalities for Iceberg tables.
From now it's possible to execute SELECT and COMPUTE STATS statements
with table sampling.
Predicates in the WHERE clause affect the results of table sampling
similarly to how legacy tables work (sampling is applied after static
partition and file pruning).
Sampling is repeatable via the REPEATABLE clause.
Testing
* planner tests
* e2e tests for V1 and V2 tables
Change-Id: I5de151747c0e9d9379a4051252175fccf42efd7d
Reviewed-on: http://gerrit.cloudera.org:8080/18989
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch we used HMS API alter_table() to update an Iceberg
table's statistics. 'alter_table()' API calls are unsafe for Iceberg
tables as they overwrite the whole HMS table, including the table
property 'metadata_location' which must always point to the latest
snapshot. Hence concurrent modification to the same table could be
reverted by COMPUTE STATS.
In this patch we are using Iceberg API to update Iceberg tables.
Also, table-level stats (e.g. numRows, totalSize, totalFiles) are not
set as Iceberg keeps them up-to-date.
COMPUTE INCREMENTAL STATS without partition clause is the same as
plain COMPUTE STATS for Iceberg tables. This behavior is aligned
with current behavior on non-partitioned tables:
https://impala.apache.org/docs/build/html/topics/impala_compute_stats.html
COMPUTE INCREMENTAL STATS .. PARTITION raises an error.
DROP STATS has been also modified to not drop table-level stats for
HMS-integrated Iceberg tables.
Testing:
* added e2e tests for COMPUTE STATS
* added e2e tests for DROP STATS
* manually tested concurrent Hive INSERT and Impala COMPUTE STATS
using latest Hive
* opened IMPALA-11590 to add automated interop tests with Hive
Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Reviewed-on: http://gerrit.cloudera.org:8080/18995
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds EXPAND_COMPLEX_TYPES query option to support the
display of complex types in SELECT statements where star (*) expression
is in the select list. By default, the query option is disabled. When
it's enabled, it changes the behaviour of star expansion to list all
top-level column fields including ones with complex types, instead of
listing the scalar column fields only. Nested complex type expansion is
also supported, eg.: struct.* will enumerate the members of the struct.
Array, map and struct types are supported.
Testing:
- Analyzer tests check select statements when the query option is
enabled or disabled.
- EE tests check the proper complex type deserialization when the query
option is enabled, and the original behaviour when the option is
disabled.
Change-Id: I84b5e5703f9e0ce0f4f8bff83941677dd7489974
Reviewed-on: http://gerrit.cloudera.org:8080/18863
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Non-matching rows from the left side will null out all slots from the
right side in left outer joins. If the right side is a subquery, it
is possible that some returned expressions will be non-NULL even if all
slots are NULL (e.g. constants) - these expressions are wrapped as
IF(TupleIsNull(tids), NULL, expr) to null them in the non-matching
case.
The logic above used to hit a precondition for complex types. We can
safely ignore complex types for now, as currently the only possible
expression that returns a complex type is SlotRef, which doesn't
need to be wrapped. We will have to revisit this once functions are
added that return complex types.
Testing:
- added a regression test and ran it
Change-Id: Iaa8991cd4448d5c7ef7f44f73ee07e2a2b6f37ce
Reviewed-on: http://gerrit.cloudera.org:8080/18954
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Added support for multiple file formats. Previously Impala created a
Scanner class based on the partitions file format, now in case of an
Iceberg table it will read out the file format from the file level
metadata instead.
IcebergScanNode will aggregate file formats as well instead of relying
on partitions, so it can be used for plannig.
Testing:
Created a mixed file format table with hive and added a test for it.
Change-Id: Ifc816595724e8fd2c885c6664f790af61ddf5c07
Reviewed-on: http://gerrit.cloudera.org:8080/18935
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adding support for MAP types in the select list.
An example of how maps are printed:
{"k1":2,"k2":null}
Nested collection types (maps and arrays) are supported in any
combination. However, structs in collections and collections in structs
are not supported.
Limitations (other than map support) as described in the commit for
IMPALA-9498 still apply, the following are to be implemented later:
- Unify HS2 / Beeswax logic with the way STRUCTs are handled.
This could be done in a "final" logic that can handle
STRUCTS/ARRAYS nested to each other
- Implement "deep copy" and "deep serialize" for collections in BE.
This would enable all operators, e.g. ORDER BY and UNION.
Testing:
- modified the FE tests that checked that maps were not allowed in the
select list - now the test expect maps are allowed there
- added FE and EE tests involving maps based on the array tests
Change-Id: I921c647f1779add36e7f5df4ce6ca237dcfaf001
Reviewed-on: http://gerrit.cloudera.org:8080/18736
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
For Iceberg tables, when one of the following properties is used, it is
considered that the table is possible to have data outside the table
location directory:
- 'write.object-storage.enabled' is true
- 'write.data.path' is not empty
- 'write.location-provider.impl' is configured
- 'write.object-storage.path'(Deprecated) is not empty
- 'write.folder-storage.path'(Deprecated) is not empty
We should tolerate the situation that relative path of the data files
cannot be obtained by the table location path, and we could use the
absolute path in that case. E.g. the ETL program will write the table
that the metadata of the Iceberg tables is placed in
'hdfs://nameservice_meta/warehouse/hadoop_catalog/ice_tbl/metadata',
the recent data files in
'hdfs://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', and the
data files half a year ago in
's3a://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', it
should still be queried normally by Impala.
Testing:
- added e2e tests
Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
Reviewed-on: http://gerrit.cloudera.org:8080/18894
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.
When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').
This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:
UNION
/ \
SCAN data ANTI JOIN
/ \
SCAN data SCAN deletes
Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.
Testing:
* added planner test
* added e2e tests
TODO in follow-up Jiras:
* better cardinality estimates (IMPALA-11516)
* support unrelative collection columns (select item from t.int_array)
(IMPALA-11517)
Currently such queries return error during analysis
Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Reviewed-on: http://gerrit.cloudera.org:8080/18847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11350 implemented the FILE__POSITION virtual column for Parquet
files. This ticket does the same but for ORC files. Note, that for full
ACID ORC tables there have already been an implementation of row__id
that could simply be re-used for this ticket.
Testing:
- TestScannersVirtualColumns.test_virtual_column_file_position_generic
is changed to run now on ORC as well. I don't think further testing
is required as this functionality has already been there for row__id
we just re-used it for FILE__POSITION.
Change-Id: Ie8e951f73ceb910d64cd149192853a4a2131f79b
Reviewed-on: http://gerrit.cloudera.org:8080/18909
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-9482 added support to the remaining Hive types and removed the
functional.unsupported_types table. There was a reference remaining in a
misc test. test_misc is not marked as exhaustive but it only runs in
exhaustive builds.
Change-Id: I65b6ea5ac742fbcc427ad41741d347558cb7d110
Reviewed-on: http://gerrit.cloudera.org:8080/18896
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fix Impalad crashed in the method ParquetBoolDecoder::SkipValues when
the parameter 'num_values' is 0. The function should tolerate that the
'num_values' is 0 values.
Testing:
- Add e2e tests
Change-Id: I8c4c5a4dff9e9e75913c7b524b4ae70967febb37
Reviewed-on: http://gerrit.cloudera.org:8080/18854
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for BINARY columns for all table formats with
the exception of Kudu.
In Hive the main difference between STRING and BINARY is that STRING is
assumed to be UTF8 encoded, while BINARY can be any byte array.
Some other differences in Hive:
- BINARY can be only cast from/to STRING
- Only a small subset of built-in STRING functions support BINARY.
- In several file formats (e.g. text) BINARY is base64 encoded.
- No NDV is calculated during COMPUTE STATISTICS.
As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly
identical, especially from the backend's perspective. For this reason,
BINARY is implemented a bit differently compared to other types:
while the frontend treats STRING and BINARY as two separate types, most
of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g.
in SlotDesc. Only the following parts of backend need to differentiate
between STRING and BINARY:
- table scanners
- table writers
- HS2/Beeswax service
These parts have access to column metadata, which allows to add special
handling for BINARY.
Only a very few builtins are allowed for BINARY at the moment:
- length
- min/max/count
- coalesce and similar "selector" functions
Other STRING functions can be only used by casting to STRING first.
Adding support for more of these functions is very easy, as simply
the BINARY type has to be "connected" to the already existing STRING
function's signature. Functions where the result depends on utf8_mode
need to ensure that with BINARY it always works as if utf8_mode=0 (for
example length() is mapped to bytes() as length count utf8 chars if
utf8_mode=1).
All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY,
though in case of legacy Hive UDFs it is only supported if the argument
and return types are set explicitely to ensure backward compatibility.
See IMPALA-11340 for details.
The original plan was to behave as close to Hive as possible, but I
realized that Hive has more relaxed casting rules than Impala, which
led to STRING<->BINARY casts being necessary in more cases in Impala.
This was needed to disallow passing a BINARY to functions that expect
a STRING argument. An example for the difference is that in
INSERT ... VALUES () string literals need to be explicitly cast to
BINARY, while this is not needed in Hive.
Testing:
- Added functional.binary_tbl for all file formats (except Kudu)
to test scanning.
- Removed functional.unsupported_types and related tests, as now
Impala supports all (non-complex) types that Hive does.
- Added FE/EE tests mainly based on the ones added to the DATE type
Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Reviewed-on: http://gerrit.cloudera.org:8080/16066
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Virtual column FILE__POSITION returns the ordinal position of the row
in the data file. It will be useful to add support for Iceberg's
position-based delete files
This patch only adds FILE__POSITION to Parquet tables. It works
similarly to the handling of collection position slots. I.e. we
add the responsibility of dealing with the file position slot to
an existing column reader. Because of page-filtering and late
materialization we already tracked the file position in member
'current_row_' during scanning.
Querying the FILE__POSITION in other file formats raises an error.
Testing:
* added e2e tests
Change-Id: I4ef72c683d0d5ae2898bca36fa87e74b663671f7
Reviewed-on: http://gerrit.cloudera.org:8080/18704
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In the case of INSERT INTO iceberg_tbl (col_a, col_b, ...), if the
partition columns of Iceberg table are not in the columns permutation,
in order for data to be written to the default
partition '__HIVE_DEFAULT_PARTITION__' we will fill the missing
partition columns with NullLiteral.
Testing:
- add e2e tests
Change-Id: I40c733755d65e5c81a12ffe09b6d16ed5d115368
Reviewed-on: http://gerrit.cloudera.org:8080/18790
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bump up CDP_BUILD_NUMBER to pick Hive version
3.1.3000.7.2.16.0-127 that contains:
- thrift-0.16.0 upgrade from HIVE-25635.
- Backport of ORC-517.
This patch also contains fix for IMPALA-11466 by adding jetty-server as
an allowed dependency.
Testing:
- Build locally and confirm that the cdp components is downloaded.
Change-Id: Iff5297a48865fb2444e8ef7b9881536dc1bbf63c
Reviewed-on: http://gerrit.cloudera.org:8080/18803
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
Iceberg tables store partition information in manifest files and not in
the file path. This metadata has already been pushed down to the
scanners and this commit uses this metadata to evaluate runtime filters
on Iceberg files.
Pefromance measurement:
Used TPC-DS Q10 [1] with scale of 10 to measure the query performance.
Min/Max filters have been disabled and increased the wait time for
runtime filters to 5 seconds. After pre-warming the Catalog I executed
Q10 5 times on my local machine. The fastest execution times were:
Baseline Parquet tables: 1.08s
Baseline Iceberg tables without this patch: 1.43s
Iceberg tables with this patch: 1.09s
Testing:
* Added e2e tests.
* Initial perofrmance test with TPC-DS Q10.
Ref:
[1] TPC-DS Q10:
select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1,
cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3,
cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5,
cd_dep_college_count, count(*) cnt6
from customer c, customer_address ca, customer_demographics
where c.c_current_addr_sk = ca.ca_address_sk and
ca_county in ('Walker County','Richland County','Gaines County',
'Douglas County','Dona Ana County') and
cd_demo_sk = c.c_current_cdemo_sk and
exists (select *
from store_sales, date_dim
where c.c_customer_sk = ss_customer_sk and
ss_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 4 and 4+3) and
exists (select *
from (select ws_bill_customer_sk as customer_sk, d_year,d_moy
from web_sales, date_dim where ws_sold_date_sk = d_date_sk
and d_year = 2002 and
d_moy between 4 and 4+3
union all
select cs_ship_customer_sk as customer_sk, d_year, d_moy
from catalog_sales, date_dim
where cs_sold_date_sk = d_date_sk and d_year = 2002 and
d_moy between 4 and 4+3
) x
where c.c_customer_sk = customer_sk)
group by cd_gender, cd_marital_status, cd_education_status,
cd_purchase_estimate, cd_credit_rating, cd_dep_count,
cd_dep_employed_count, cd_dep_college_count
order by cd_gender, cd_marital_status, cd_education_status,
cd_purchase_estimate, cd_credit_rating, cd_dep_count,
cd_dep_employed_count, cd_dep_college_count
limit 100;
Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a
Reviewed-on: http://gerrit.cloudera.org:8080/18531
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change has been considered only for Iceberg tables mainly for table
maintenance reasons. Iceberg table writes create new snapshots and these
can accumulate over time. This commit allows a simple form of compaction
of these snapshots.
INSERT OVERWRITES have been blocked in case partition evolution is in
place, because it would be possible to overwrite a data file with a
newer schema that has less columns. This could cause unexpected data
loss.
For bucketed tables, the following syntax is allowed to be executed:
INSERT OVERWRITE ice_tbl SELECT * FROM ice_tbl;
The source and target table has to be the same and specified, only
SELECT '*' queries are allowed. These requirements are also in place to
avoid unexpected data loss.
- Values are not allowed, because inserting a single record could
overwrite a whole file in a bucket.
- Only source table is allowed, because at the time of the insert it
is unknown which files will be modified, similar to values.
Testing:
- Added e2e tests.
Change-Id: Ibd1bc19d839297246eadeb754cdeeec1e306098a
Reviewed-on: http://gerrit.cloudera.org:8080/18649
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Extended the parser to support 'stored by' keyword for storage
engines, namely Kudu and Iceberg at the moment, to match hive's
syntax. Furthermore, this lays the ground work for seperating
storage engines from file formats to be able specify both with
"STORED BY ... STORED AS ...".
Testing:
Added front-end Parser and Analyzer tests and query tests for table
creation.
Change-Id: Ib677bea8e582bbc01c5fb8c81df57eb60b0ed961
Reviewed-on: http://gerrit.cloudera.org:8080/18743
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Because the column value bounds of the Iceberg meta are not necessarily
a min or max value, NOT_IN cannot be answered using them.
NOT_IN(col, {X, ...}) with (X, Y) doesn't guarantee that X is a value
in col. But it works when the push-down column is the partition column,
it's still very helpful.
Testing:
- add e2e tests
Change-Id: Ib8bdaf6f31a4438e11c4eb27485bb413fe6df9a3
Reviewed-on: http://gerrit.cloudera.org:8080/18760
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
More than 1d arrays in select list tried to register a
CollectionTableRef with name "item" for the inner arrays,
leading to name collision if there was more than one such array.
The logic is changed to always use the full path as implicit alias
in CollectionTableRefs backing arrays in select list.
As a side effect this leads to using the fully qualified names
in expressions in the explain plans of queries that use arrays
from views. This is not an intended change, but I don't consider
it to be critical. Created IMPALA-11452 to deal with more
sophisticated alias handling in collections.
Testing:
- added a new table to testdata and a regression test
Change-Id: I6f2b6cad51fa25a6f6932420eccf1b0a964d5e4e
Reviewed-on: http://gerrit.cloudera.org:8080/18734
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When adding a partition with location in a file system which is
different from the file system of the table location, Impala accept
it. But when insert values to the table, catalogd throw exception.
This patch fix the issue by using the right FileSystem object.
Testing:
- Added new test case with partitions on different file systems.
Ran the test on S3.
- Did manual tests in cluster with partitions on HDFS and Ozone.
- Passed core test.
Change-Id: I0491ee1bf40c3d5240f9124cef3f3169c44a8267
Reviewed-on: http://gerrit.cloudera.org:8080/18759
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch catalogd always ordered HBase columns
lexicographically by family/qualifier. This is incompatible with other
table formats and the way Hive handles HBase tables, where the order
comes from HMS as defined during CREATE TABLE.
I don't know of any valid reason behind this old behavior, it probably
just made the implementation a bit easier by doing the ordering in FE
instead of BE - the BE actually needs this ordering during scanning
as the HBase API returns results in this order, but this should have
no effect on other parts of Impala.
Added flag use_hms_column_order_for_hbase_tables (used by catalogd)
to decide whether to do this reordering:
- true: keep HMS order
- false: reorder by family/qualifier [default]
The old way is kept as default to avoid breaking existing workloads,
but it would make sense to change it in the next major release.
Note that a query option would be more convenient to use, but it
would be much harder to implement it as the order is decided during
loading in catalogd.
Testing:
- added custom cluster test for
use_hms_column_order_for_hbase_tables = true
Change-Id: Ibc5df8b803f2ae3b93951765326cdaea706e3563
Reviewed-on: http://gerrit.cloudera.org:8080/18635
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The DESCRIBE FORMATTED output show this even for bucketed Iceberg
tables:
| Num Buckets: | 0 | NULL |
| Bucket Columns: | [] | NULL |
We should remove them, and the user should rely on the information in
the '# Partition Transform Information' block instead.
Testing:
- add e2e tests
- tested in a real cluster
Change-Id: Idc156c932780f0f12c935a1a60ff6606d59bb1da
Reviewed-on: http://gerrit.cloudera.org:8080/18735
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
KUDU-2671 added support for custom hash partition specification at the
range partition level. This patch adds CREATE TABLE and ALTER TABLE
syntax to allow Kudu custom hash schema to be specified through Impala.
In addition, a new SHOW HASH SCHEMA statement has been added to allow
display of the hash schema information for each partition.
HASH syntax within a partition is similar to the table-level syntax
except that HASH clauses must follow the PARTITION clause and commas are
not allowed within a partition. These differences were required to keep
the grammar unambiguous and due to limitations of the Java Cup Parser.
To make the grammar more consistent, commas in the table-level partion
spec and between PARTITION clauses are now optional but allowed for
backward compatibility.
Example:
CREATE TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2))
PARTITION BY HASH(id) PARTITIONS 3 HASH(c2) PARTITIONS 4
RANGE (c2)
(
PARTITION 0 <= VALUES < 10
PARTITION 10 <= VALUES < 20
HASH(id) PARTITIONS 2 HASH(c2) PARTITIONS 3
PARTITION 20 <= VALUES < 30
)
STORED AS KUDU;
ALTER TABLE t1 ADD RANGE PARTITION 30 <= VALUES < 40
HASH(id) PARTITIONS 3 HASH(c2) PARTITIONS 4;
This bumps the toolchain to Kudu githash 43ee785b2d to get
the needed Kudu-side changes.
Testing:
Tests added to kudu_partition_ddl.test
Change-Id: I981056e0827f4957580706d6e73742e4e6743c1c
Reviewed-on: http://gerrit.cloudera.org:8080/18676
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.
Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.
Testing:
Default resolution method was changed to field id for migrated tables,
existing tests use that from now.
Added new tests to cover edge cases with complex types and schema
evolution.
Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With PARQUET_LATE_MATERIALIZATION we can set the number of minimum
consecutive rows that if filtered out, we avoid materialization of rows
in other columns in parquet.
E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we
find at least 10 consecutive rows that don't pass the predicates we
avoid materializing the corresponding rows in the other columns.
But due to an off-by-one error we actually only needed
(PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if
we set PARQUET_LATE_MATERIALIZATION to one, then we need zero
consecutive filtered out elements which leads to a crash/DCHECK. The bug
is in the GetMicroBatches() algorithm when we produce the micro batches
based on the selected rows.
Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it
shouldn't be allowed.
Testing
* e2e test with PARQUET_LATE_MATERIALIZATION=1
* e2e test for checking SET PARQUET_LATE_MATERIALIZATION=N
Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
Reviewed-on: http://gerrit.cloudera.org:8080/18700
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit implements cloning between Iceberg tables. Cloning Iceberg
tables from other Types of tables is not implemented, because the Data
Types of Iceberg and Impala do not correspond one by one.
Testing:
- e2e tests
Change-Id: I1284b926f51158e221277b18b2e73707e29f86ac
Reviewed-on: http://gerrit.cloudera.org:8080/18658
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Currently, SHOW PARTITIONS on Iceberg tables only outputs the partition
spec which is not too useful.
Instead it should output the concrete partitions, number of files, number
of rows in each partitions. E.g.:
SHOW PARTITIONS ice_ctas_hadoop_tables_part;
'{"d_month":"613"}',4,2
'{"d_month":"614"}',3,1
'{"d_month":"615"}',2,1
Testing:
- Added end-to-end test
Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Reviewed-on: http://gerrit.cloudera.org:8080/18641
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In the FOR SYSTEM_TIME AS OF clause we expect timestamps in the local
timezone, while the error message shows the timestamp in UTC timezone.
The error message should show timestamp in local timezone.
Testing:
- Add e2e test
Change-Id: Iba5d5eb65133f11cc4eb2fc15a19f7b25c14cc46
Reviewed-on: http://gerrit.cloudera.org:8080/18675
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.
Queries that can be optimized need to meet the following requirements:
- SelectStmt does not have WHERE clause
- SelectStmt does not have GROUP BY clause
- SelectStmt does not have HAVING clause
- The TableRefs of FROM clause contains only one BaseTableRef
- Only for the Iceberg table
- SelectList must contain 'count(*)' or 'count(constant)'
- SelectList can contain other agg functions, e.g. min, sum, etc
- SelectList can contain constant
Testing:
- Added end-to-end test
- Existing tests
- Test it in a real cluster
Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Reviewed-on: http://gerrit.cloudera.org:8080/18574
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch is to extend the support of Struct columns in the select
list to Parquet files as well.
There are some limitation with this patch:
- Dictionary filtering could work when we have conjuncts on a member
of a struct, however, if this struct is given in the select list
then the dictionary filtering is disabled. The reason is that in
this case there would be a mismatch between the slot/tuple IDs in
the conjunct between the ones in the select list due to expr
substitution logic when a struct is in the select list. Solving
this puzzle would be a nice future performance enhancement. See
IMPALA-11361.
- When structs are read in a batched manner it delegates the actual
reading of the data to the column readers of its children, however,
would use the simple ReadValue() on these readers instead of the
batched version. The reason is that calling the batched reader in
the member column readers would in fact read in batches, but it
won't handle the case when the parent struct is NULL and would set
only itself to NULL but not the parent struct. This might also be a
future performance enhancement. See IMPALA-11363.
- If there is a struct in the select list then late materialization
is turned off. The reason is that LM expects the column readers to
be used through the batched reading interface, however, as said in
the above bulletpoint currently struct column readers use the
non-batched reading interface of its children. As a result after
reading the column readers are not in a state as SkipRows() of LM
expects and then results in a query failure because it's not able
to skip the rows for non-filter readers.
Once IMPALA-11363 is implemented and the struct will also use the
ReadValueBatch() interface of its children then late
materialization could be turned on even if structs are in the
select list. See IMPALA-11364.
Testing:
- There were a lot of tests already to exercise this functionality
but they were only run on ORC table. I changed these to cover
Parquet tables too.
Change-Id: I3e8b4cbc2c4d1dd5fbefb7c87dad8d4e6ac2f452
Reviewed-on: http://gerrit.cloudera.org:8080/18596
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fix some formatting bugs when DESCRIBE FORMATTED for Iceberg tables:
- 'LINE_DELIM' is missing on '# Partition Transform Information'
- The partition transform columns header should be
'col_name,transform_type,NULL'
Testing:
- Existing tests
Change-Id: I991644cefb34decc843a5542b47eaec11d7b6e42
Reviewed-on: http://gerrit.cloudera.org:8080/18634
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive changed behavior in HIVE-24920 to allow external SFS tables to
point to locations in managed tables. test_sfs.py had a test case for
the previous failure, so it broke with this change.
This changes test_sfs.py to verify the new behavior where an external
SFS table can point to a managed location.
Testing:
- Ran test_sfs.py locally with recent GBN
Change-Id: I426aacdba7afba3f3a747a4e02632d15e38c63c9
Reviewed-on: http://gerrit.cloudera.org:8080/18618
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
and COVAR_POP()
CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
covariance between them.
COVAR_POP() function takes two numeric type columns and returns
population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.
Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Reviewed-on: http://gerrit.cloudera.org:8080/18413
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The expectation for predicates on unnested arrays is that they are
either picked up by the SCAN node or the UNNEST node for evaluation. If
there is only one array being unnested then the SCAN node, otherwise
the UNNEST node will be responsible for the evaluation. However, if
there is a JOIN node involved where the JOIN construction happens
before creating the UNNEST node then the JOIN node incorrectly picks
up the predicates for the unnested arrays as well. This patch is to fix
this behaviour.
Tests:
- Added E2E tests to cover result correctness.
- Added planner tests to verify that the desired node picks up the
predicates for unnested arrays.
Change-Id: I89fed4eef220ca513b259f0e2649cdfbe43c797a
Reviewed-on: http://gerrit.cloudera.org:8080/18614
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11302: Improve error message for CREATE EXTERNAL TABLE iceberg
command
The error message complained about failed table loading. The new
error message is more precise, and also notifies the user to use
plain 'CREATE TABLE' for creating new Iceberg tables.
IMPALA-11303: Exception is not raised for Iceberg DDL that misses
LOCATION clause
We returned the error in the result summary. Instead of that now
we raise an error, and also notify the user about how to create
new Iceberg tables.
Testing:
* e2e tests
Change-Id: I659115cc97a1a00e1ddf3fbb7dbe1f286bf1edcf
Reviewed-on: http://gerrit.cloudera.org:8080/18563
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Identity-partitioned columns are not necessarily stored in the data
files. E.g. when we migrate a legacy partitioned table to Iceberg
without rewriting the data files, the partition columns won't be
present in the files.
The Parquet scanner does a few optimizations to eliminate row groups,
i.e. filtering based on stats, bloom filters, etc. When a column is
not present in the data file that has some predicate on, then it is
assumed that the whole row group doesn't pass the filtering criteria.
But for Iceberg some files might contain partition columns, while
other files doesn't, so we need to prepare the scanners to handle
such cases.
The ORC scanner doesn't have that many optimizations so it didn't
ran into this issue.
Testing:
* e2e tests
Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Reviewed-on: http://gerrit.cloudera.org:8080/18605
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
Reviewed-by: <lipenglin@sensorsdata.cn>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.
This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.
Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.
All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.
Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.
Testing:
* analyzer tests
* added e2e tests
Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Reviewed-on: http://gerrit.cloudera.org:8080/18514
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-10182 fixed the problem of creating inferred predicates when
both sides of an equality predicate came from the same slot.
Inferred predicates also should not be created when both sides
of an equality predicate are constant values which do not have
scan slots.
Change-Id: If1cd4559dda406d2d38703ed594b70b41ed336fd
Reviewed-on: http://gerrit.cloudera.org:8080/18579
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Added query option and implementation to be able to resolve columns by
names.
Changed secondary resolution strategy for iceberg orc tables to name
based resolution.
Testing:
Added new test dimension for orc tests, added results to now working
iceberg migrated table test
Change-Id: I29562a059160c19eb58ccea76aa959d2e408f8de
Reviewed-on: http://gerrit.cloudera.org:8080/18397
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
CollectionItemsRead in the runtime profile counts the total number of
nested collection items read by the scan node. Only created for scans
that support nested types, e.g. Parquet or ORC.
Each scanner thread maintains its local counter and merges it into
HdfsScanNode counter for each row batch. However, the local counter in
orc-scanner is uninitialized, leading to weird values. This patch simply
initializes it to 0 and adds test coverage.
Tests:
Add profile verification for this counter on some existing query tests.
Note that there are some implementation difference between Parquet and
ORC scanners (e.g. in predicate pushdown). So we will see different
counter results in some query. I just pick some queries that have
consistent counters.
Change-Id: Id7783d1460ac9b98e94d3a31028b43f5a9884f99
Reviewed-on: http://gerrit.cloudera.org:8080/18528
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>