To support FIPS, we have to avoid MD5 since it's not allowed algorithms
for FIPS. But we use MD5 hash for lineage graph.
This patch replace MD5 with non-cryptographic hash function murmur3_128
which generates hash value with same length as MD5.
External dependency on the hash value:
Went through Apache Atlas source code. ImpalaQuery.getHash() function
(https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/
main/java/org/apache/atlas/impala/model/ImpalaQuery.java#L60) is not
called anywhere. Don't see any dependency on the hash value in Atlas.
Testing:
- Passed test_lineage.py.
- Passed core tests.
Change-Id: I22b1e91cf9d6c89a3c62749ae0fd88528ae69885
Reviewed-on: http://gerrit.cloudera.org:8080/16564
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TABLE' DDL.
Atlas needs table location to establish lineage between a newly
created external table and its table location.
The table location information is not available until the createTable
catalog op succeeds. After this change, location information is sent
to the backend in the TDDLExecResponse message which adds it to the
lineage graph. This information is sent only for create external
table queries.
Testing:
Added a test to verify the tableLocation field is populated for a
create external table query lineage. Also, modified the
lineage.test file to include location information for all lineages.
Change-Id: If02b0cc16d52c1956298171628f5737cab62ce9f
Reviewed-on: http://gerrit.cloudera.org:8080/14515
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
DDLs like 'create table' should generate minimal lineage graphs so
that consumers like Atlas can use information like 'queryText' to
establish lineages.
This change adds a call to the computeLineageGraph() method during
analysis phase of createTable which populates the graph with basic
information like queryText. If it is a CTAS, this graph is enhanced
in the "insert" phase with dependencies.
Testing:
Add an EE test to verify lineage information and also to check it
is flushed to disk properly.
Change-Id: Ia6c7ed9fe3265fd777fe93590cf4eb2d9ba0dd1e
Reviewed-on: http://gerrit.cloudera.org:8080/14458
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, the query events (audits and lineages) are logged as a
part of query unregistration. This delays the event logging in cases
where the Unregister() is delayed by client for some reason (ex: Hue
does not call Unregister until the browser tab is closed) or the client
goes away without calling Unregister and the query timeout kicks in.
This patch moves this event logging to an earlier stage in the query
lifecycle. Moved the event logging related code into ClientRequestState
for easier code refactoring.
The conditions under which the events are logged are slightly
modified by this patch. Without the patch, events are logged for
unsuccessful queries if atleast a single fetch is perfomed. This patch
relaxes this guarantee to log events for any query that reaches
the FINISHED state (rows are available to fetch by the client) and does
not wait for a fetch to be performed. This simplifies the coordinator
state machine by avoiding unnecessary synchronization.
Added some test coverage for coordinator side code paths for writing
lineages. fe specific lineage tests only verified the correctness of
lineage created but did not test whether it was being flushed correctly
to the disk.
Change-Id: I639b9c1acb9806b29292cd85be2863688453ca2e
Reviewed-on: http://gerrit.cloudera.org:8080/14143
Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>