impala/tests at 63f5e8ec00d089dee7eee9fd47a931e356e2f985 - impala - Gitea: Git with a cup of tea

jprdonnelly/impala

mirror of https://github.com/apache/impala.git synced 2026-01-15 15:00:36 -05:00

Files

History

Zoltan Borok-Nagy f602c3f80f IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_0000001_0000001_0000/0000_0
 * full_acid/delta_0000002_0000002_0000/0000_0
 * full_acid/delete_delta_0000003_0000003_0000/0000_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Reviewed-on: http://gerrit.cloudera.org:8080/16082
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2020-07-14 12:53:51 +00:00

..

IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION

2020-06-18 10:07:20 +00:00

IMPALA-9215: report_benchmark_results.py fails with missing key

2019-12-05 17:07:55 +00:00

Fix bug in report_benchmark_results.py

2019-12-19 17:02:40 +00:00

catalog_service

IMPALA-5534: Fix and enable experimental failure tests

2020-07-14 00:03:14 +00:00

IMPALA-5534: Fix and enable experimental failure tests

2020-07-14 00:03:14 +00:00

IMPALA-3695: Remove KUDU_IS_SUPPORTED

2020-06-18 01:11:18 +00:00

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 12:53:51 +00:00

IMPALA-9531: Dropped support for dateless timestamps

2020-07-08 19:32:15 +00:00

IMPALA-5534: Fix and enable experimental failure tests

2020-07-14 00:03:14 +00:00

IMPALA-9611: fix hang when cancelling join builder

2020-04-07 23:26:14 +00:00

IMPALA-9669: Fix wrong types/comments of loaded tables/views for GET_TABLES in LocalCatalog

2020-05-21 05:01:45 +00:00

IMPALA-8207: Fix query loading for perf and stress tests

2019-02-19 22:31:17 +00:00

IMPALA-7538: Support HDFS caching with LocalCatalog

2020-06-18 22:03:24 +00:00

IMPALA-9182: Print the socket address of the client closing a session or cancelling a query from the WebUI

2020-02-13 21:43:07 +00:00

IMPALA-8207: Fix query loading for perf and stress tests

2019-02-19 22:31:17 +00:00

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 12:53:51 +00:00

IMPALA-9540 Test that Impala Shell no longer sends duplicate "Host" headers in http mode.

2020-06-30 21:46:27 +00:00

IMPALA-7995: part 1: fixes for e2e dockerised impala tests

2019-04-13 02:42:32 +00:00

IMPALA-9199: Add support for single query retries on cluster membership changes

2020-05-15 20:11:07 +00:00

IMPALA-8841: Try to fix Tez related dataload flakiness

2019-08-16 23:00:01 +00:00

IMPALA-8572: Log query events before unregister.

2019-09-12 02:02:44 +00:00

IMPALA-9917: grouping() and grouping_id() support

2020-07-14 03:13:18 +00:00

IMPALA-3766: optionally compress spilled data

2020-03-31 01:36:44 +00:00

IMPALA-9077: Remove scalable admission control configs

2020-06-10 04:02:16 +00:00

__init__.py

Initial Impala failure testing library + modularize run-workload

2014-01-08 10:46:16 -08:00

.gitignore

Updates several .gitignore files.

2017-08-31 01:40:47 +00:00

conftest.py

IMPALA-9702: Cleanup unique_database directories

2020-06-02 14:01:23 +00:00

pytest.ini

Test execution should continue even if a test fails.

2015-05-01 03:32:35 +00:00

run-custom-cluster-tests.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

run-process-failure-tests.sh

IMPALA-7399: Emit a junit xml report when trapping errors

2018-08-23 18:33:58 +00:00

run-tests.py

IMPALA-9887: Add support for sharding end-to-end tests

2020-07-13 00:13:19 +00:00