This patch mainly realizes the querying of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't specify this property in your SQL, default file format
is PARQUET.
We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When querying iceberg table, we pushdown
partition column predicates to iceberg to decide which data files
need to be scanned, and then transfer this information to BE to
do the real scan operation.
Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in test_scanners.py
Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Reviewed-on: http://gerrit.cloudera.org:8080/16143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for constant propagation of range predicates
involving date and timestamp constants. Previously, only equality
predicates were considered for propagation. The new type of propagation
is shown by the following example:
Before constant propagation:
WHERE date_col = CAST(timestamp_col as DATE)
AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01'
After constant propagation:
WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01'
AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01'
AND date_col = CAST(timestamp_col as DATE)
As a consequence, since Impala supports table partitioning by date
columns but not timestamp columns, the above propagation enables
partition pruning based on timestamp ranges.
Existing code for equality based constant propagation was refactored
and consolidated into a new class which handles both equality and
range based constant propagation. Range based propagation is only
applied to date and timestamp columns.
Testing:
- Added new range constant propagation tests to PlannerTest.
- Added e2e test for range constant propagation based on a newly
added date partitioned table.
- Ran precommit tests.
Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Reviewed-on: http://gerrit.cloudera.org:8080/16346
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This implements scanning full ACID tables that contain complex types.
The same technique works that we use for primitive types. I.e. we add
a LEFT ANTI JOIN on top of the Hdfs scan node in order to subtract
the deleted rows from the inserted rows.
However, there were some types of queries where we couldn't do that.
These are the queries that scan the nested collection items directly.
E.g.: SELECT item FROM complextypestbl.int_array;
The above query only creates a single tuple descriptor that holds the
collection items. Since this tuple descriptor is not at the table-level,
we cannot add slot references to the hidden ACID column which are at the
top level of the table schema.
To resolve this I added a statement rewriter that rewrites the above
statement to the following:
SELECT item FROM complextypestbl $a$1, $a$1.int_array;
Now in this example we'll have two tuple descriptors, one for the
table-level, and one for the collection item. So we can add the ACID
slot refs to the table-level tuple descriptor. The rewrite is
implemented by the new AcidRewriter class.
Performance
I executed the following query with num_nodes=1 on a non-transactional
table (without the rewrite), and on an ACID table (with the rewrite):
select count(*) from customer_nested.c_orders.o_lineitems;
Without the rewrite:
Fetched 1 row(s) in 0.41s
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
| F00:ROOT | 1 | 1 | 13.61us | 13.61us | | | 0 B | 0 B | |
| 01:AGGREGATE | 1 | 1 | 3.68ms | 3.68ms | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE |
| 00:SCAN HDFS | 1 | 1 | 280.47ms | 280.47ms | 6.00M | 15.00M | 56.98 MB | 8.00 MB | tpch_nested_orc_def.customer.c_orders.o_lineitems |
+--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+
With the rewrite:
Fetched 1 row(s) in 0.42s
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
| F00:ROOT | 1 | 1 | 25.16us | 25.16us | | | 0 B | 0 B | |
| 05:AGGREGATE | 1 | 1 | 3.44ms | 3.44ms | 1 | 1 | 63.00 KB | 10.00 MB | FINALIZE |
| 01:SUBPLAN | 1 | 1 | 16.52ms | 16.52ms | 6.00M | 125.92M | 47.00 KB | 0 B | |
| |--04:NESTED LOOP JOIN | 1 | 1 | 188.47ms | 188.47ms | 0 | 10 | 24.00 KB | 12 B | CROSS JOIN |
| | |--02:SINGULAR ROW SRC | 1 | 1 | 0ns | 0ns | 0 | 1 | 0 B | 0 B | |
| | 03:UNNEST | 1 | 1 | 25.37ms | 25.37ms | 0 | 10 | 0 B | 0 B | $a$1.c_orders.o_lineitems o_lineitems |
| 00:SCAN HDFS | 1 | 1 | 96.26ms | 96.26ms | 100.00K | 12.59M | 38.19 MB | 72.00 MB | default.customer_nested $a$1 |
+---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+
So the overhead is very little.
Testing
* Added planner tests to PlannerTest/acid-scans.test
* E2E query tests to QueryTest/full-acid-complex-type-scans.test
* E2E tests for rowid-generation: QueryTest/full-acid-rowid.test
Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f
Reviewed-on: http://gerrit.cloudera.org:8080/16228
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.
(UPDATE operations are implemented via DELETE+INSERT.)
In the filesystem it looks like e.g.:
* full_acid/delta_0000001_0000001_0000/0000_0
* full_acid/delta_0000002_0000002_0000/0000_0
* full_acid/delete_delta_0000003_0000003_0000/0000_0
During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.
Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.
The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.
Testing:
* added planner test
* added e2e tests
Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Reviewed-on: http://gerrit.cloudera.org:8080/16082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Extend StmtRewriter with the ability to rewrite scalar subqueries in the
select list into cross joins. Currently the subquery must pass plan-time
checks to determine that it returns a single row which may miss cases
that may be valid at runtime or with more complex evaluation of the
predicate expressions in the planner. Support for correlated subqueries
will be a follow on change.
Testing:
* Added new analyzer tests, updated previous subquery tests
* test_queries.py::TestQueries::test_subquery
* Added test_tpcds_q9 to e2e and planner tests
Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
Reviewed-on: http://gerrit.cloudera.org:8080/16007
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
This patch adds the following 12 TPCDS queries to the class of
TestTpcdsDecimalV2Query: Q26, Q30, Q31, Q47, Q48, Q57, Q58, Q59, Q63,
Q83, Q85, and Q89. All the queries except for Q31 are added to the class
of TestTpcdsQuery as well because Impala returns one fewer row than
expected for TestTpcdsQuery::test_tpcds_q31(), which requires further
investigation.
To verify whether or not the returned result set from Impala for a given
query is correct, we compare the result set with that produced by the
HiveServer2 (HS2) in Impala's mini-cluster. We could execute SQL
statements in HS2 via Beeline, HS2's command line shell, which could be
launched by the following command.
beeline -u "jdbc:hive2://localhost:11050/default"
We note that among these 12 queries, the execution of Q31, Q58, and Q83
result in the error of "Counters limit exceeded" by TEZ. To work around
this problem, for these 3 queries we have to execute the following
statement before running them to increase the default number of
counters, which is set to 120.
set tez.counters.max=1200
On the other hand, the table of 'reason' is referenced by Q85. This
table was not referenced by any TPCDS query before this patch and thus
was not created. In this regard, in this patch we also modify
tpcds_schema_template.sql to create this additional table along with its
data.
Testing:
- Verified that this patch passes the exhaustive tests in the DEBUG
build.
Change-Id: Ib5f260e75a3803aabe9ccef271ba94036f96e5cf
Reviewed-on: http://gerrit.cloudera.org:8080/16119
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
"Original files" are files that don't have full ACID schema. We can see
such files if we upgrade a non-ACID table to full ACID. Also, the LOAD
DATA statement can load non-ACID files into full ACID tables. So such
files don't store special ACID columns, that means we need
to auto-generate their values. These are (operation,
originalTransaction, bucket, rowid, and currentTransaction).
With the exception of 'rowid', all of them can be calculated based on
the file path, so I add their values to the scanner's template tuple.
'rowid' is the ordinal number of the row inside a bucket inside a
directory. For now Impala only allows one file per bucket per
directory. Therefore we can generate row ids for each file
independently.
Multiple files in a single bucket in a directory can only be present if
the table was non-transactional earlier and we upgraded it to full ACID
table. After the first compaction we should only see one original file
per bucket per directory.
In HdfsOrcScanner we calculate the first row id for our split then
the OrcStructReader fills the rowid slot with the proper values.
Testing:
* added e2e tests to check if the generated values are correct
* added e2e test to reject tables that have multiple files per bucket
* added unit tests to the new auxiliary functions
Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Reviewed-on: http://gerrit.cloudera.org:8080/16001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala save timestamp as 12 bytes of structure TimestampValue with
time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time
microseconds. To avoid the data truncation issue in the bloom filter,
add FunctionCallExpr with 'utc_to_unix_micros' as the root of source
expression of bloom filter to convert timestamp values to microseconds
when building timestamp bloom filter for Kudu.
Generated functional date_tbl table in Kudu format for unit-test.
Added new test cases for Kudu Timestamp and Date bloom filters.
Testing:
Passed all core tests.
Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Reviewed-on: http://gerrit.cloudera.org:8080/16094
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.
This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.
The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.
Testing:
- Dryrun of GVO
- Modified TestPartitionMetadataUncompressedTextOnly's
test_unsupported_text_compression() to add LZO case
Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Updated CDP build to 7.2.1.0-57 to include new Hive features such as
HIVE-22995.
In minicluster, we have default values of hive.create.as.acid and
hive.create.as.insert.only which are false. So by default hive creates
external type table located in external warehouse directory.
Due to HIVE-22995, desc db returns external warehouse directory.
With above reasons, we need use external warehouse dir in some tests.
Also add a new test for "CREATE DATABASE ... LOCATION".
Tested:
Re-run failed test in minicluster.
Run exhaustive tests.
Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Reviewed-on: http://gerrit.cloudera.org:8080/15990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This sets hive.optimize.sort.dynamic.partition to true when loading
tpcds.store_sales. This option takes effect during Hive dynamic partitioning
inserts. It introduces a sort into the insert query so that all data is
sorted on the partition key. This allows the reducers to only open a single
file at a time when writing out files.
When this config is set to false, Hive will write to multiple partitions
at the same time. So a single Hive container will have multiple file
handles open at once. This can lead to OOM issues on the Hive side as well
as diskspace issues with HDFS. When a file is opened on HDFS, the
Namenode reserves an entire block for each file, even if the resulting
file is less than a block size. If there isn't enough disk space for all
file reservations, inserts will start failing because HDFS says there is
not enough capacity on the cluster.
The change is only necessary when loading tpcds.store_sales. Adding it
to other dynamic partitioning inserts does not seem to be necessary. It
is likely that the issue only shows up when reading from an
unpartitioned table and inserting into a partitioned table. In this
case, loading tpcds.store_sales requires reading from
tpcds_unpartitioned.store_sales. The other dynamic partitioning inserts
all read from a partitioned table and write to a partitioned table.
This patch does not introduce a significant performance regression to
the runtime of data-load generation.
Testing:
* Ran core tests
* Ran core tests for Impala-EC
Change-Id: Ic2b7c0ec40a02da2640fae20cf640517fd1f4fef
Reviewed-on: http://gerrit.cloudera.org:8080/15998
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.
This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.
Testing:
- The only impediment to building with
IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
Impala-lzo. With a custom Impala-lzo, compilation succeeds.
Either Impala-lzo will be fixed or it will be removed.
- Core tests
Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Minor compactions can compact several delta directories into a single
delta directory. The current directory filtering algorithm had to be
modified to handle minor compacted directories and prefer those over
plain delta directories. This happens in the Frontend, mostly in
AcidUtils.java.
Hive Streaming Ingestion writes similar delta directories, but they
might contain rows Impala cannot see based on its valid write id list.
E.g. we can have the following delta directory:
full_acid/delta_0000001_0000010/0000 # minWriteId: 1
# maxWriteId: 10
This delta dir contains rows with write ids between 1 and 10. But maybe
we are only allowed to see write ids less than 5. Therefore we need to
check the ACID write id column (named originalTransaction) to determine
which rows are valid.
Delta directories written by Hive Streaming don't have a visibility txn
id, so we can recognize them based on the directory name. If there's
a visibilityTxnId and it is committed => every row is valid:
full_acid/delta_0000001_0000010_v01234 # has visibilityTxnId
# every row is valid
If there's no visibilityTxnId then it was created via Hive Streaming,
therefore we need to validate rows. Fortunately Hive Streaming writes
rows with different write ids into different ORC stripes, therefore we
don't need to validate the write id per row. If we had statistics,
we could validate per stripe, but since Hive Streaming doesn't write
statistics we validate the write id per ORC row batch (an alternative
could be to do a 2-pass read, first we'd read a single value from each
stripe's 'currentTransaction' field, then we'd read the stripe if the
write id is valid).
Testing
* the frontend logic is tested in AcidUtilsTest
* the backend row validation is tested in test_acid_row_validation
Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d
Reviewed-on: http://gerrit.cloudera.org:8080/15818
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
-Modified the ‘test_insert.py’ so the tests can run parallel.
-Every test will create its own temporary tables for insert testing.
-Swapped out the SETUP tags to Truncate table QUERY statement.
-Becouse the SETUP tag is not used anymore, the correspondig
code was removed.
-A test query in ‘insert.test’. The test was incorrect so modified
to test for the right behavior.
Testing:
-tests/run-tests.py query_test/test_insert.py
-impala-py.test tests/query_test/test_insert.py
-the same for test_insert_permutation.py and test_load.py
Change-Id: I257e936868917a2fcc6c030f6c855b247e8a0eea
Reviewed-on: http://gerrit.cloudera.org:8080/15529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HIVE-22794 disallows ACID tables outside of the 'managed' warehouse
directory. This change updates data loading to make it conform to
the new rules.
The following tests had to be modified to use the new paths:
* AnalyzeDDLTest.TestCreateTableLikeFileOrc()
* create-table-like-file-orc.test
Change-Id: Id3b65f56bf7f225b1d29aa397f987fdd7eb7176c
Reviewed-on: http://gerrit.cloudera.org:8080/15708
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Full ACID row format looks like this:
{
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1,
"row": {"i": 1}
}
User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.
Because of that in the Catalog I create the exact opposite of the above
schema:
{
"row__id":
{
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
}
"i": 1
}
This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.
We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.
Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.
Newly added Tests:
* specific queries about hidden columns (full-acid-rowid.test)
* SHOW CREATE TABLE (show-create-table-full-acid.test)
* DESCRIBE [FORMATTED] TABLE (describe-path.test)
* INSERT should be forbidden (acid-negative.test)
* added tests for column masking (
ranger_column_masking_complex_types.test)
Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Reviewed-on: http://gerrit.cloudera.org:8080/15395
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
String values from external systems (HDFS, Hive, Kudu, etc.) are already
unescaped, the same as string values in Thrift objects deserialized in
coordinators. We should mark needsUnescaping_ as false in creating
StringLiterals for these values (in LiteralExpr#create()).
When comparing StringLiterals in partition pruning, we should also use
the unescaped values if needsUnescaping_ is true.
Tests:
- Add tests for partition pruning on unescaped strings.
- Add test coverage for all existing code paths using
LiteralExpr#create().
- Run core tests
Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f
Reviewed-on: http://gerrit.cloudera.org:8080/15278
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The schema file allows specifying a commandline command in
several of the sections (LOAD, DEPENDENT_LOAD, etc). These
are execute by testdata/bin/generate-schema-statements.py
when it is creating the SQL files that are later executed
for dataload. A fair number of tables use this flexibility
to execute hdfs mkdir and copy commands via the command line.
Unfortunately, this is very inefficient. HDFS command line
commands require spinning up a JVM and can take over one
second per command. These commands are executed during a
serial part of dataload, and they can be executed multiple
times. In short, these commands are a significant slowdown
for loading the functional tables.
This converts the hdfs command line statements to equivalent
Hive LOAD DATA LOCAL statements. These are doing the copy
from an already running JVM, so they do not need JVM startup.
They also run in the parallel part of dataload, speeding up
the SQL generation part.
This speeds up generate-schema-statements.py significantly.
On the functional dataset, it saves 7 minutes.
Before:
time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f
real 8m8.068s
user 10m11.218s
sys 0m44.932s
After:
time testdata/bin/generate-schema-statements.py -w functional-query -e exhaustive -f
real 0m35.800s
user 0m42.536s
sys 0m5.210s
This is currently a long-pole in dataload, so it translates directly to
an overall speedup of about 7 minutes.
Testing:
- Ran debug tests
Change-Id: Icf17b85ff85618933716a80f1ccd6701b07f464c
Reviewed-on: http://gerrit.cloudera.org:8080/15228
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Hudi Read Optimized Table contains multiple versions of parquet files,
in order to load the table correctly, Impala needs to recognize Hudi Read
Optimized Table as a HdfsTable and load the latest version of the file
using HoodieROTablePathFilter.
Tests
- Unit test for Hudi in FileMetadataLoader
- Create table tests in functional_schema_template.sql
- Query tests in hudi-parquet.test
Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Reviewed-on: http://gerrit.cloudera.org:8080/14711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Add test coverage for randomly corrupt ORC files by adding orc in tests
of test_scanners_fuzz.py. Also add two additional queries for nested
types.
Tests:
- Ran test_scanners_fuzz.py 780 rounds (took 43h).
- Ran test_scanners_fuzz.py for orc/def/block 1081 rounds (took 24h).
Change-Id: I3233e5d9f555029d954b5ddd5858ea194afc06bf
Reviewed-on: http://gerrit.cloudera.org:8080/15062
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Implements read path for the date type in ORC scanner. The internal
representation of a date is an int32 meaning the number of days since
Unix epoch using proleptic Gregorian calendar.
Similarly to the Parquet implementation (IMPALA-7370) this
representation introduces an interoperability issue between Impala
and older versions of Hive (before 3.1). For more details see the
commit message of the mentioned Parquet implementation.
Change-Id: I672a2cdd2452a46b676e0e36942fd310f55c4956
Reviewed-on: http://gerrit.cloudera.org:8080/14982
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The goal is to let JDBC clients get constraint information
from Impala tables. We implement two new metadata operations in
impala-hs2-server, GetPrimaryKeys and GetCrossReference, which are
already implemented in Hive's HS2. The thrift
definitions are copied from Hive's TCLIService.thrift. In FE, these
two operations are implemented to get the information from tables
in the catalog.
Much like GetColumns(), tables need to be loaded in order to be able to get
PK/FK information. We wait for the PK table/FK table to load.
In the implementation, PK/FK information is returned
ONLY if the user has access to ALL the columns involved in the PK/FK
relationship.
Testing:
- Added three test tables to our test datasets since most of our FE tests
relied on dummy tables or testdata. It was difficult to test PK/FK with
these methods. Also, we can build on this testdata in future when we make
optimizer improvements.
- Added unit tests in AuthorizationTest and JDBCtest.
- Added e2e test in test_hs2.py
- This patch modifies AnalyzeDDLTests and ToSqlTests to rely on the newly
added dataset instead of dummy tables for pk/fk tests.
Caveats:
- Ranger needs OWNER user information for authorization. Since this is HMS
metadata that we do not aggresively load, this information is not available
for IncompleteTables. Some foreign key tables (fact tables for example)
might have FK/PK relationships with several PK tables some of which might
not be loaded in catalog. Currently we have no way to check column
previleges without owner user information tables. We do not return keys
involving such columns. Therefore, when Ranger is used, there maybe missing
PK/FK relationships for parent tables that are not loaded. This can be
tracked in IMPALA-9172.
- Retrieval of constraints is not yet supported in LocalCatalog mode. See
IMPALA-9158.
Change-Id: I8942dfbbd4a3be244eed1c61ac2ce17069960477
Reviewed-on: http://gerrit.cloudera.org:8080/14720
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the avro scanner.
Similarly to parquet, avro uses DATE logical type for dates. DATE
logical type annotates an INT32 that stores the number of days since
the unix epoch, 1 January 1970.
This representation introduces an avro interoperability issue between
Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
up to 1582-10-05 and Gregorian calendar for dates starting with
1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
calendar backward to dates preceding its official introduction in
1582-10-15.
This means that pre-1582-10-15 dates written to an avro table by Hive
will be read back incorrectly by Impala.
Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.
Dependency changes:
- BE uses avro 1.7.4-p5 from native-toolchain.
Change-Id: I7a9d5b93a22cf3a00244037e187f8c145cacc959
Reviewed-on: http://gerrit.cloudera.org:8080/13944
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_acid_nonacid_insert has been failing lately. HMS became more
strict about checking the capabilities of its clients. Seems like
the Python client doesn't set any capabilities for itself therefore
HMS rejects its attempts of creating and dropping tables.
Now instead of using the RESET utility from the e2e test framework
(to drop and re-create tables), the test is using a unique database
and creates the tables through Impala. Different file formats are
exercised with the help of the DEFAULT_FILE_FORMAT query option.
Change-Id: I3a82338a7820d0ee748c961c8656fa3319c3929c
Reviewed-on: http://gerrit.cloudera.org:8080/14064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support to Impala for scanning .DEFLATE files of
tables stored as text. To avoid confusion, it should be noted that
although these files have a compression type of DEFLATE in Impala,
they should be treated as if their compression type is DEFAULT.
Hadoop tools such as Hive and MapReduce support reading and writing
text files compressed using the deflate algorithm, which is the default
compression type. Hadoop uses the zlib library (an implementation of
the DEFLATE algorithm) to compress text files into .DEFLATE files,
which are not in the raw deflate format but rather the zlib format
(the zlib library supports three flavors of deflate, and Hadoop uses
the flavor that compresses data into deflate with zlib wrappings rather
than just raw deflate)
Testing:
There is a pre-existing unit test that validates compressing and
decompressing data with compression type DEFLATE. Also, modified
existing end-to-end testing that simulates querying files of various
formats and compression types. All core and exhaustive tests pass.
Change-Id: I45e41ab5a12637d396fef0812a09d71fa839b27a
Reviewed-on: http://gerrit.cloudera.org:8080/13857
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This commit adds INSERT support for insert-only ACID tables.
The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.
The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.
When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.
Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
inserts data with Impala and reads with Hive. (These integration
tests only run with exhaustive exploration strategy)
TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
is handled correctly during SELECT, but would be better to create a
base directory from the beginning. Hive creates a delta directory
for CTAS.
Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Reviewed-on: http://gerrit.cloudera.org:8080/13559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a method to check if a table bucketed.
For Hive 3, integrates with HMS translation layer for
capabilities checks.
Implements methods ensureTableWriteSupported and
ensureTableReadSupported.
Set default capabilities for tables.
Tests:
Added unit tests to ParserTest and AnalyzerTest.
Added bucketed tables which are required by IMPALA-8439.
Ran core tests(Hive 2 and Hive 3)
ToDo:
Integrate checking bucketed tables capabilities and creating
error messages with HMS translation after Hive provides the
required functions.
Enable capabilities checking for Kudu tables.
When upgrade tables from non-acid to acid, the default
capabilities should be changed too. Currently, use the
workaround by explicitly setting tblproperties OBJCAPABILITIES
with the acid properties.
Change-Id: Ia08d01168660830b6e0d08b55a95eac129889cec
Reviewed-on: http://gerrit.cloudera.org:8080/13558
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Instead of creating an in memory instance of View, we were
creating instance of HdfsTable. Modified the code to create instance of
View for materialized view.
Testing Done:
- Added tests in AnalyzerTest.
Change-Id: Idcd619303e19b5a2551876a63d67569c76bd22f0
Reviewed-on: http://gerrit.cloudera.org:8080/13503
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Copied some code from Hive to identify if the table is transactional,
insert only table.
Also modified code to prohibit write operations on insert only table.
That code will be reverted once we add support for write operations on
insert only table.
Testing Done:
- Added a new test in AnalyzerTest
Change-Id: I740dc4ce0dbbc0c2e042b01832e606cc1ac4132a
Reviewed-on: http://gerrit.cloudera.org:8080/13311
Tested-by: Todd Lipcon <todd@apache.org>
Reviewed-by: Sudhanshu Arora <sudhanshu@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
This fixes three issues for functional dataset loading:
- works around HIVE-21675, a bug in which 'CREATE VIEW IF NOT EXISTS'
does not function correctly in our current Hive build. This has been
fixed already, but the workaround is pretty simple, and actually the
'drop and recreate' pattern is used more widely for data-loading than
the 'create if not exists' one.
- Moves the creation of the 'hive_index' table from
load-dependent-tables.sql to a new load-dependent-tables-hive2.sql
file which is only executed on Hive 2.
- Moving from MR to Tez execution changed the behavior of data loading
by disabling the auto-merging of small files. With Hive-on-MR, this
behavior defaulted to true, but with Hive-on-Tez it defaults false.
The change is likely motivated by the fact that Tez automatically
groups small splits on the _input_ side and thus is less likely to
produce lots of small files. However, that grouping functionality
doesn't work properly in localhost clusters (TEZ-3310) so we aren't
seeing the benefit. So, this patch enables the post-process merging of
small files.
Prior to this change, the 'alltypesaggmultifilesnopart' test table was
getting 40+ files inside it, which broke various planner tests. With
the change, it gets the expected 4 files.
Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd
Reviewed-on: http://gerrit.cloudera.org:8080/13251
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change is a follow-up to IMPALA-7368 and adds support for DATE
type to the parquet scanner/writer. CREATE TABLE LIKE PARQUET
statements associated with data files that contain dates are also
supported.
Parquet uses DATE logical type for dates. DATE logical type annotates
an INT32 that stores the number of days from the Unix epoch, 1 January
1970.
This representation introduces a parquet interoperability issue
between Impala and older versions of Hive:
- Before version 3.1, Hive used Julian calendar to represent dates
up to 1582-10-05 and Gregorian calendar for dates starting with
1582-10-15. Dates between 1582-10-05 and 1582-10-15 were lost.
- Impala uses proleptic Gregorian calendar, extending the Gregorian
calendar backward to dates preceding its official introduction in
1582-10-15.
This means that pre-1582-10-15 dates written to a parquet table by
Hive will be read back incorrectly by Impala and vice versa.
Note that Hive 3.1 switched to proleptic Gregorian calendar too, so
for Hive 3.1+ this is no longer an issue.
Change-Id: I67da03754531660bc8de3b6935580d46deae1814
Reviewed-on: http://gerrit.cloudera.org:8080/13189
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This workload was added in 2012 or so, and hasn't actually been able to
run in years:
- fails to load because the name has a '-' in it, and we use unquoted
database names in the load script
- the SQL to load it seems to refer to testdata files which don't
exist
Even if it did load, it looks like it's only a set of ~10 trivial
queries which I doubt have any real benchmark interest in modern days.
Change-Id: I383638ef4acce59792fee287da0016b3d949e693
Reviewed-on: http://gerrit.cloudera.org:8080/13117
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
DATE values describe a particular year/month/day in the form
yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is 0000-01-01 to 9999-12-31.
This initial DATE type support covers TEXT and HBASE fileformats only.
'DateValue' is used as the internal type to represent DATE values.
The changes are as follows:
- Support for DATE literal syntax.
- Explicit casting between DATE and other types (note that invalid
casts will fail with an error just like invalid DECIMAL_V2 casts,
while failed casts to other types do no lead to warning or error):
- from STRING to DATE. The string value must be formatted as
yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory,
the time component is optional. If the time component is
present, it will be truncated silently.
- from DATE to STRING. The resulting string value is formatted as
yyyy-MM-dd.
- from TIMESTAMP to DATE. The source timestamp's time of day
component is ignored.
- from DATE to TIMESTAMP. The target timestamp's time of day
component is set to 00:00:00.
- Implicit casting between DATE and other types:
- from STRING to DATE if the source string value is used in a
context where a DATE value is expected.
- from DATE to TIMESTAMP if the source date value is used in a
context where a TIMESTAMP value is expected.
- Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP
implicit conversions are now all possible, the existing function
overload resolution logic is not adequate anymore.
For example, it resolves the
if(false, '2011-01-01', DATE '1499-02-02') function call to the
if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded
function, instead of the if(BOOLEAN, DATE, DATE) version.
This is clearly wrong, so the function overload resolution logic had
to be changed to resolve function calls to the best-fit overloaded
function definition if there are multiple applicable candidates.
An overloaded function definition is an applicable candidate for a
function call if each actual parameter in the function call either
matches the corresponding formal parameter's type (without casting)
or is implicitly castable to that type.
When looking for the best-fit applicable candidate, a parameter
match score (i.e. the number of actual parameters in the function
call that match their corresponding formal parameter's type without
casting) is calculated and the applicable candidate with the highest
parameter match score is chosen.
There's one more issue that the new resolution logic has to address:
if two applicable candidates have the same parameter match score and
the only difference between the two is that the first one requires a
STRING -> TIMESTAMP implicit cast for some of its parameters while
the second one requires a STRING -> DATE implicit cast for the same
parameters then the first candidate has to be chosen not to break
backward compatibility.
E.g: year('2019-02-15') function call must resolve to
year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not
implemented yet, so this is not an issue at the moment but it will
be in the future.
When the resolution algorithm considers overloaded function
definitions, first it orders them lexicographically by the types in
their parameter lists. To ensure the backward compatible behavior
Primitivetype.DATE enum value has to come after
PrimitiveType.TIMESTAMP.
- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.
These items are tightly coupled and it makes sense to implement them
in one change-set.
Testing:
- A new partitioned TEXT table 'functional.date_tbl' (and the
corresponding HBASE table 'functional_hbase.date_tbl') was
introduced for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
- since DATE type is supported for TEXT and HBASE fileformats
only, most DATE tests were implemented separately in
tests/query_test/test_date_queries.py.
Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.
Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Reviewed-on: http://gerrit.cloudera.org:8080/12481
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We've supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).
In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in
io buffer read from DiskIoMgr). The ORC lib can materialize ORC column
binaries into its representation (orc::ColumnVectorBatch). Then we
transform values in orc::ColumnVectorBatch into impala::Tuples in
hdfs-orc-scanner. We don't need to do anything about decoding/decompression
since they are handled by the ORC lib. Fortunately, the ORC lib already
supports complex types, we can still leverage it to support complex types.
What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need in the form required by the ORC
lib (Get list of ORC type ids from tuple descriptors)
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
Impala's representation (Slots/Tuples/RowBatches)
To format the materialization, we implement several ORC column readers
in hdfs-orc-scanner. Each kind of reader treats a column type and
transforms outputs of the ORC lib into tuple/slot values.
Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.
* Run exhaustive tests in DEBUG and RELEASE builds.
Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Reviewed-on: http://gerrit.cloudera.org:8080/12168
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.
This patch adds detailed unit tests for cardinality:
* Table cardinality, NDV values and null count in metadata retrieved from
HMS.
* Table cardinality, NDV values and null counts in metadata presented to
the query.
* Expression NDV and selectivity values (which derive from table
cardinality and column NDV.)
The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.
Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.
This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previous SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.
As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.
Testing: This patch includes only tests: no "production" code was
changed.
Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Reviewed-on: http://gerrit.cloudera.org:8080/12248
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The code mimics the code written for other min-max filters. Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively
handles these 3 storage configurations. The column definition states the
precision and the precision determines the storage size.
The minimum and maximum values are stored in a union. The precision from
the column will come in as an input. Based on the precision the size will be
found, and depending on the size appropriate variable will be used.
The code in min-max-filter* follows the general convention of the file, hence
uses macros.
The test includes 24 decimal columns (as listed below) with the following joins:
1. Inner Join with broadcast (2 tables)
1a. 1 predicate
1b. 4 predicates - all results in decimal min-max filter
1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2. Inner Join with Shuffle (3 tables)
3. Right outer join (2 tables)
4. Left Semi join (2 tables)
5. Right Semi join (2 tables)
Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)
The test aggregates the count of probe rows. This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table. The count of probe rows is considered to be
deterministic. But, it will be beneficial to look out for changes in Kudu that can
change the way data is partitioned. Such a change could change the probe row count
and in that case, the test will have to be updated.
impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.
Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Reviewed-on: http://gerrit.cloudera.org:8080/12113
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The idea is to optimise the common case where there are long runs of
NULL or non-NULL values (i.e. the def level is repeated). We can
detect this cheaply by keying the decoding loop in the column reader
off the state of the def level RLE decoder - if there's a long run
of repeated levels, we can skip checking the def level for every
value. We still fall back to decoding, caching and reading
value-by-value a batch of def levels whenever the next def level is not
in a repeated run. We still use the old approach for decoding rep
levels. There might be some benefit to using the same approach for rep
levels *if* repeated def and rep level runs line up.
These changes should unlock further optimizations because more time is
spent in simple kernel functions, e.g. UnpackAndDecode32Values() for
dictionary decompression, which is very optimisable using SIMD etc.
Snappy decompression now seems to be the main CPU bottleneck for
decoding snappy-compressed Parquet.
Perf:
Running TPC-H scale factor 60 on uncompressed and snappy parquet
both showed a ~4% speedup overall.
Microbenchmarks on uncompressed parquet show scans only doing
dictionary decoding on uncompressed Parquet is ~75% faster:
set mt_dop=1;
select min(l_returnflag) from lineitem;
Testing:
We have alltypes agg with a mix of null and non-null.
Many tables have long runs of non-null values.
Added new test data and coverage:
* a test table manynulls with long runs of null values.
* a large CHAR test table
* missing coverage for materialising pos slot in flattened nested types
scan.
* Extended dict test to test longer runs.
* A larger version of complextypestbl with interesting collection
shapes - NULL collections, empty collections, etc, particularly runs
of collections with the same shape.
* Test interaction of timestamp validation with conversion
* Ran code coverage build to confirm all code paths are tested
* ASAN and exhaustive runs.
Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef
Reviewed-on: http://gerrit.cloudera.org:8080/8319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This fixes a class of bugs where the planner incorrectly uses the raw
string from the parser instead of the unescaped string. This occurs in
several places that push predicates down to the storage layer:
* Kudu scans
* HBase scans
* Data source scans
There are some more complex issues with escapes and the LIKE predicate
that are tracked separately by IMPALA-2422.
This also uncovered a different issue with RCFiles that is tracked by
IMPALA-7778 and is worked around by the tests added.
In order to make bugs like this more obvious in future, I renamed
getValue() to getValueWithOriginalEscapes().
Testing:
Added regression test that tests handling of backslash escapes on all
file formats. I did not add a regression test for the data source bug
since it seems to require some major modification of the data source
test infrastructure.
Change-Id: I53d6e20dd48ab6837ddd325db8a9d49ee04fed28
Reviewed-on: http://gerrit.cloudera.org:8080/11814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Kudu allows specifying range partitions over multiple columns. Impala
already has support for doing this when the partitions are specified
with '=', but if the partitions are specified with '<' or '<=', the
parser would return an error.
This patch modifies the parser to allow for creating Kudu tables like:
create table kudu_test (a int, b int, primary key(a, b))
partition by range(a, b) (partition (0, 0) <= values < (1, 1));
and similary to alter partitions like:
alter table kudu_test add range partition (1, 1) <= values < (2, 2);
Testing:
- Modified functional_kudu.jointbl's schema so that we have a table
in functional with a multi-column range partition to test things
against.
- Added FE and E2E tests for CREATE and ALTER.
Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Reviewed-on: http://gerrit.cloudera.org:8080/10441
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some frontend PlannerTests rely on HBase tables being
arranged in a deterministic way. Specifically, the
HBase tables need to be split with specific region
boundaries and those regions need to be assigned to
specific HBase region servers.
Currently, the tables are created without splits and
testdata/bin/split-hbase.sh runs Java code in
HBaseTestDataRegionAssignment to split and assign
the tables. This runs during dataload via
testdata/bin/create-load-data.sh and during tests
with bin/run-all-tests.sh. There are problems with
both parts of this process. The table splitting is
flaky. Since significant time can pass between the
assignments and the tests, rebalancing means the
assignments are not always stable.
This changes the process so that the HBase tables are
created with the splits already specified via the
HBase shell. The splits remain stable over time.
PlannerTestBase runs the assignment code in
HBaseTestDataRegionAssignment at the start of
the PlannerTests. This makes the assignments
deterministic. No other tests depends on the
exact assignments, so this does not regress anything.
Testing:
- Local testing
- Ran gerrit-verify-dryrun-external
- Verified minicluster profile 2 compiles
Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Reviewed-on: http://gerrit.cloudera.org:8080/10447
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes generate-schema-statements.py to produce
separate SQL files for different file formats for Hive.
This changes load-data.py to go parallel on these
separate Hive SQL files. For correctness, the text
version of all tables must be loaded before any
of the other file formats.
load-data.py runs DDLs to create the tables in Impala
and goes parallel. Currently, there are some minor
dependencies so that text tables must be created
prior to creating the other table formats. This
changes the definitions of some tables in
testdata/datasets/functional/functional_schema_template.sql
to remove these dependencies. Now, the DDLs for the
text tables can run in parallel to the other file formats.
To unify the parallelism for Impala and Hive, load-data.py
now uses a single fixed-size pool of processes to run all
SQL files rather than spawning a thread per SQL file.
This also modifies the locations that do invalidate to
use refresh where possible and eliminate global
invalidates.
For debuggability, different SQL executions output to
different log files rather than to standard out. If an
error occurs, this will point out the relevant log
file.
This saves about 10-15 minutes on dataload (including
for GVO).
Change-Id: I34b71e6df3c8f23a5a31451280e35f4dc015a2fd
Reviewed-on: http://gerrit.cloudera.org:8080/8894
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.
A startup option --enable_orc_scanner is added for this feature. It's
set to true by default. Setting it to false will fail queries on ORC
tables.
Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.
Tests
- Most of the end-to-end tests can run on ORC format.
- Add tpcds, tpch tests for ORC.
- Add some ORC specific tests.
- Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
is not robust for corrupt files (ORC-315).
Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Reviewed-on: http://gerrit.cloudera.org:8080/9134
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before Kudu supported DECIMAL columns the TPCDS and TPCH
columns were djusted to use DOUBLE in place of DECIMAL. This
patch undoes that change now that Kudu supports DECIMAL.
Testing:
- Updated concurrent_select.py
- Updated test_tpch_queries.py
- Excersized by the Kudu planner tests
Change-Id: I2f7e4464dc6705cadd610a82c459390a9c0dfe4f
Reviewed-on: http://gerrit.cloudera.org:8080/9484
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins
IMPALA-5752 added support for Kudu decimal. As a part
of that, it added Kudu versions of decimal_tbl and
decimal_tiny. Kudu tables are created and loaded even
on local tests, so these tables are loaded when they
previously weren't. The LOAD sections for these tables
rely on executing HDFS commmands to copy data to
appropriate locations. These HDFS commands cannot work
on local tests, causing this failure.
Untangling when to execute LOAD sections is complicated,
so this simply switches the decimal_tbl and decimal_tiny
to do LOAD DATA LOCAL calls, which do not rely on HDFS
commands.
Change-Id: I1f717917269d116c07a6f17944583f5e8faf2932
Reviewed-on: http://gerrit.cloudera.org:8080/9438
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Adds support for the Kudu DECIMAL type introduced in Kudu 1.7.0.
Note: Adding support for Kudu decimal min/max filters is
tracked in IMPALA-6533.
Tests:
* Added Kudu create with decimal test to AnalyzeDDLTest.java
* Added Kudu table_format to test_decimal_queries.py
** Both decimal.test and decimal-exprs.test workloads
* Added decimal queries to the following Kudu workloads:
** kudu_create.test
** kudu_delete.test
** kudu_insert.test
** kudu_update.test
** kudu_upsert.test
Change-Id: I3a9fe5acadc53ec198585d765a8cfb0abe56e199
Reviewed-on: http://gerrit.cloudera.org:8080/9368
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).
This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.
The frontend builds the mapping of SlotId to conjuncts that
are dictionary filterable. This mapping now includes SlotId's
that reference nested tuples. The backend is adjusted to
use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.
Testing:
- e2e test extended to illustrate row-groups that are pruned
by nested collection dictionary filters.
Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Reviewed-on: http://gerrit.cloudera.org:8080/8775
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
I re-created the original patch for IMPALA-6068, but only
performed what I believe to be the limited legal transformation
of data load: DEPENDENT_LOAD -> DEPENDENT_LOAD_HIVE.
Any place that directly uploads via hadoop or hdfs commands
was left alone as changing it can't be proven to be correct.
Change-Id: I6c242cca209a7138b10ad517076707709b5cd204
Testing: Doing a full data load. I mistakenly changed a variable
name causing the first two dry-runs to fail.
Reviewed-on: http://gerrit.cloudera.org:8080/8690
Reviewed-by: Zach Amsden <zamsden@cloudera.com>
Tested-by: Zach Amsden <zamsden@cloudera.com>