If a partition had a location that did not exist in HDFS, Impala would
refuse to load its metadata. This meant a typo could render a table
unloadable. We fix this problem by removing the existence check from the
frontend, and by inheriting access from the first extant parent of the
partition directory.
Fixing this exposed a second issue, where Impala wouldn't create
directories for partitions in the right place after an INSERT if the
partition location had been changed. To get this right we have to plumb
the partition ID through to Coordinator::FinalizeSuccessfulInsert(), so
that the coordinator can look up the partition's location from the
query-wide descriptor table. As a by-product, this patch rationalises
the per-partition, per-fragment statistics gathering a little bit by
putting almost all the per-partition stats into TInsertPartitionStatus.
Change-Id: I9ee0a1a1ef62cf28f55be3249e8142c362083163
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2851
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED
When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.
When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.
It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).
Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.
Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
These are used as our internal representation of the authorization policy metadata
(as opposed to directly using the Sentry Thrift structs). Versioned/managed in the
same way as other TCatalogObjects.
Change-Id: Ia1ed9bd4e25e9072849edebcae7c2d3a7aed660d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2545
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c89431775fcca19cdbeddba635b83fd121d39b04)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2646
This is a port of 0b9134a from CDH4, but required some adjustments to work on CDH5 due to
the HDFS caching work. The differences from CDH4/CDH5 are mainly in
HdfsTable/HdfsPartition. I added a new BlockReplica class to represent a single block
with info on the host index + caching info.
This removes duplication TNetworkAddresses in the block location metadata of HdfsTable.
Each HdfsTable now contains a list of TNetworkAddress and the BlockLocations
just reference an index in this list to specify the host, rather than duplicating the
TNetworkAddress.
For a table with 100K blocks, this reduces the size of the THdfsTable struct by an additional
~50+% (on top of the duplicate file path changes). This takes the total size of the table
from:
21.1MB -> 9.4MB (file path duplication) -> 4.2MB (host duplication) = ~80% total improvement.
Change-Id: If7f11764dc0961376f9648779d253829f4cd83a2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1367
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1887
Reviewed-by: Nong Li <nong@cloudera.com>
This fixes a bug that can happen with 'invalidate metadata <table name>' if the following
sequences of events happens:
1) Table is created in Impala (table names are always treated as lower case)
2) Table is dropped and re-created in Hive, using the same name but different casing
3) invalidate metadata <table name> is run in Impala, which will update the existing
table with the version from the Hive metastore.
When building the next statestore update, the catalog server will send an update out
thinking that the table from 1) was dropped and the table from 3) was added because
the topic entry key is case sensitive. This may incorrectly remove the table from
an impalad's catalog. The fix is to always treat db/table names as case insensitive.
Change-Id: Ib59edc403989781bf12e0405c0ccd37b8e41ee41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1634
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1637
Made the following changes:
1) Removed filePath from FileBlock. It wasn't used anywhere.
2) Update FileDescriptor to use file name rather than file path. The full path to the file
can be found by prepending the parent partition directory the file name.
3) Removed fileLength from FileBlock. This was the same length used in FileDescriptor.
Testing these changes out on two table on a 20 node cluster. One table had ~100K blocks
and one table had ~15K blocks.
In both cases I saw the following improvements:
* After making change 1) the total serialized size of the table dropped by ~30%
* After making change 1) and 2) the total serialized size of the table dropped by ~52%
* After making change 1), 2), and 3) the total serialized size of the table dropped by ~55%
Change-Id: Ic85b3cbcf775569f69b7303bec4adc52593fc35c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1351
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1480
There was race when the catalog was invalidated at the same time a table
was being loaded. This is because an uninitialized Table was being returned
unexpectedly to the impalad due to the concurrent invalidate.
This fixes the problem by updating the CatalogObjectCache to load when
a catalog object is uninitialized, rather than load when null. New items can
now be added in a initialized or uninitialized state; uninitialized objects
are loaded on access.
Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh
In addition, it cleans up the locking in the Catalog to make it more
straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog
and this lock is used to protect the catalogVersion_. Operations that need to
perform an atomic bulk catalog operation can use this lock (such as when the
CatalogServer needs to take a snapshot of the catalog to calculate what delta to send
to the statestore). Otherwise, the lock is not needed and objects are protected by the
synchronization at each level in the object heirarchy (Db->[Function/Table]). That is,
Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized
independently.
Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418
This change adds support for lazy loading of table metadata to the
CatalogService/Impalad. The way this works is that the CatalogService initially
sends out an update with only the databases and table names (wrapped as
IncompleteTables). When an Impalad encounters one of these tables, it will contact
the catalog service to get the metadata, possibly triggering a metadata load if the
catalog server has not yet loaded this table.
With these changes the catalog server starts up in just seconds, even for large
metastores since it only needs to call into the metastore to get the list of tables
and databases. The performance of "invalidate metadata" also improves for the same reason.
I also picked up the catalog cleanup patch I had to make the APIs a bit more consistent and
remove the need for using a LoadingCache for databases.
This also fixes up the FE tests to run in a more realistic fashion. The FE tests now run
against catalog object recieved from the catalog server. This actually turned up some bugs
in our previous test configuration where we were not running with the correct column stats
(we were always running with avgSerializedSize = slotSize). This changed some plans so the
planner tests needed to be updated.
Still TODO:
This does not include the changes to perform background metadata loading. I will send
that out as a separate patch on top of this.
Change-Id: Ied16f8a7f3a3393e89d6bfea78f0ba708d0ddd0e
Saving changes
Change-Id: I48c34408826b7396004177f5fc61a9523e664acc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1328
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1338
Tested-by: Lenni Kuff <lskuff@cloudera.com>
PrimitiveType is an enum and cannot be used for more complex types. The change
touches a lot of files but very mechanically.
A similar change needs to be done in the BE which will be a subsequent patch.
The version as I have it breaks rolling upgrade due to the thrift changes. If
this is not okay, we can work around that but it will be annoying.
Change-Id: If3838bb27377bfc436afd6d90a327de2ead0af54
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1287
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1304
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
When there was an error loading the table metadata the Catalog Server
includes (among other things) the table loading error message(s) and
the call stack where the error happened. The error messages returned
to the user previously included the call stack, but with this change
the call remote (catalog service) call stack is only output to the impalad
log when the table is accessed by the user. This is done by annotating
and appending the remote call stack to the local call stack.
Change-Id: Icf46a3d7fc15d8b0a8e59564722fdb991b074618
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1063
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly
Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
did not catch that they were incorrect
Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
The concept of partitioning is specific to HDFS and shouldn't be part of the Table
class or the TTable thrift struct. The more general concept is clustering, so renamed to
'clustering_columns'.
Change-Id: I09475d5f13877e6eddd9d53375e461bc68f26bcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/773
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Unfortunately, the BE does not have the codegen path to execute UDAs.
This puts some restrictions on the UDAs we can run.
- No IR UDAs
- No varargs
- Must have 8 arguments or less.
The code to do this is almost all there for UDFs but I'm not sure I'll get to it.
Change-Id: I8a06e635a9138397c8474a5704c3e588bb92347b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
With this change we now detect if a table is read-only and disable INSERT/LOAD operations
on these tables. A table is read-only if Impala does not have write permission on the HDFS
base directory of the table or any one of the partition directories (if
the table is partitioned).
Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/713
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).
This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.
Some other minor cleanup in:
- JNI from FE to BE
- UDFs/UDAs that are loaded as test data
Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Adds impalad and catalogd debug webpage handlers for dumping the thrift
catalog objects structs by object name and type. For example, the following URL
could be used to dump tpch.lineitem:
<host>:25020/catalog_objects?object_type=TABLE&object_name=tpch.lineitem
This will be useful for debugging since it provides visibility into the
current state of the catalog.
Change-Id: I46f1cd675b41d456e9b784c810e6e3053f0cc137
Reviewed-on: http://gerrit.ent.cloudera.com:8080/640
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.
Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.
What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
contains the change. An impalad will wait for the statestore heartbeat that contains this
version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing
Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
same JAR.
Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>