Currently EXPLAIN statements might open ACID transactions and
create locks on ACID tables.
This is not necessary since we won't modify the table. But the
real problem is that these transactions and locks are leaked and
open forever. They are even getting heartbeated while the
coordinator is still running.
The solution is to not consume any ACID resources for EXPLAIN
statements.
Testing:
* Added EXPLAIN INSERT OVERWRITE in front of an actual INSERT OVERWRITE
in an e2e test
Change-Id: I05113b1fd9a3eb2d0dd6cf723df916457f3fbf39
Reviewed-on: http://gerrit.cloudera.org:8080/16923
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Because of a bug the INSERT part of a CTAS statement didn't
run in a transaction and it just put the new files under the
root directory of the table. This didn't cause too much problems
because there couldn't be any concurrent operations as the table was
under construction. However, this behavior is not working particularly
well in the context of replication, as the notification event needs
a transaction id.
With this fix the INSERT operation runs in a transaction and the new
files are created under a delta directory.
Testing:
* Added CTAS statements and SHOW FILES <tbl> queries to acid-insert.test
Check if the files are created in a delta directory, if so, then
the INSERT must have been running in a transaction.
Change-Id: I6ed96aeadbcead9fdc548da5922a066460ff9f77
Reviewed-on: http://gerrit.cloudera.org:8080/16472
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_acid_nonacid_insert has been failing lately. HMS became more
strict about checking the capabilities of its clients. Seems like
the Python client doesn't set any capabilities for itself therefore
HMS rejects its attempts of creating and dropping tables.
Now instead of using the RESET utility from the e2e test framework
(to drop and re-create tables), the test is using a unique database
and creates the tables through Impala. Different file formats are
exercised with the help of the DEFAULT_FILE_FORMAT query option.
Change-Id: I3a82338a7820d0ee748c961c8656fa3319c3929c
Reviewed-on: http://gerrit.cloudera.org:8080/14064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds INSERT support for insert-only ACID tables.
The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.
The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.
When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.
Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
inserts data with Impala and reads with Hive. (These integration
tests only run with exhaustive exploration strategy)
TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
is handled correctly during SELECT, but would be better to create a
base directory from the beginning. Hive creates a delta directory
for CTAS.
Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Reviewed-on: http://gerrit.cloudera.org:8080/13559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>