Commit Graph

21 Commits

Author SHA1 Message Date
Paden Tomasello
0326f17bb3 Adding Lz4 Codec.
Change-Id: I037d4e0de3b2cd2b8582caea058c8e1f2f880ff3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3027
Reviewed-by: Paden Tomasello <paden.tomasello@cloudera.com>
Tested-by: jenkins
2014-06-16 14:20:34 -07:00
Henry Robinson
60cbe1b0e1 IMPALA-741: Support partitions with non-existant HDFS locations
If a partition had a location that did not exist in HDFS, Impala would
refuse to load its metadata. This meant a typo could render a table
unloadable. We fix this problem by removing the existence check from the
frontend, and by inheriting access from the first extant parent of the
partition directory.

Fixing this exposed a second issue, where Impala wouldn't create
directories for partitions in the right place after an INSERT if the
partition location had been changed. To get this right we have to plumb
the partition ID through to Coordinator::FinalizeSuccessfulInsert(), so
that the coordinator can look up the partition's location from the
query-wide descriptor table. As a by-product, this patch rationalises
the per-partition, per-fragment statistics gathering a little bit by
putting almost all the per-partition stats into TInsertPartitionStatus.

Change-Id: I9ee0a1a1ef62cf28f55be3249e8142c362083163
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2851
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
2014-06-08 18:44:45 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Lenni Kuff
83e239723f Add TRole/TPrivilege structs to Thrift CatalogObjects
These are used as our internal representation of the authorization policy metadata
(as opposed to directly using the Sentry Thrift structs). Versioned/managed in the
same way as other TCatalogObjects.

Change-Id: Ia1ed9bd4e25e9072849edebcae7c2d3a7aed660d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2545
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c89431775fcca19cdbeddba635b83fd121d39b04)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2646
2014-05-21 15:51:24 -07:00
Matthew Jacobs
ebc6c5894e External Data Source: Frontend and catalog changes
Initial frontend and catalog changes for external data sources.

Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485
2014-05-08 14:56:19 -07:00
Lenni Kuff
fd174a5e69 [CDH5] Remove duplication of network addresses in HdfsTable (updated for HDFS caching)
This is a port of 0b9134a from CDH4, but required some adjustments to work on CDH5 due to
the HDFS caching work. The differences from CDH4/CDH5 are mainly in
HdfsTable/HdfsPartition. I added a new BlockReplica class to represent a single block
with info on the host index + caching info.

This removes duplication TNetworkAddresses in the block location metadata of HdfsTable.
Each HdfsTable now contains a list of TNetworkAddress and the BlockLocations
just reference an index in this list to specify the host, rather than duplicating the
TNetworkAddress.

For a table with 100K blocks, this reduces the size of the THdfsTable struct by an additional
~50+% (on top of the duplicate file path changes). This takes the total size of the table
from:
21.1MB -> 9.4MB (file path duplication) -> 4.2MB (host duplication) = ~80% total improvement.

Change-Id: If7f11764dc0961376f9648779d253829f4cd83a2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1367
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1887
Reviewed-by: Nong Li <nong@cloudera.com>
2014-03-14 14:30:08 -07:00
Lenni Kuff
d6cbd3dc44 Ensure db/table names are always case insensitive in catalog topic entry keys
This fixes a bug that can happen with 'invalidate metadata <table name>' if the following
sequences of events happens:

1) Table is created in Impala (table names are always treated as lower case)
2) Table is dropped and re-created in Hive, using the same name but different casing
3) invalidate metadata <table name> is run in Impala, which will update the existing
   table with the version from the Hive metastore.

When building the next statestore update, the catalog server will send an update out
thinking that the table from 1) was dropped and the table from 3) was added because
the topic entry key is case sensitive. This may incorrectly remove the table from
an impalad's catalog. The fix is to always treat db/table names as case insensitive.

Change-Id: Ib59edc403989781bf12e0405c0ccd37b8e41ee41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1634
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1637
2014-02-23 00:20:16 -08:00
Lenni Kuff
5646156382 Reduce duplicate/unused string data from FileDescriptor/FileBlock thrift structs
Made the following changes:

1) Removed filePath from FileBlock. It wasn't used anywhere.
2) Update FileDescriptor to use file name rather than file path. The full path to the file
   can be found by prepending the parent partition directory the file name.
3) Removed fileLength from FileBlock. This was the same length used in FileDescriptor.

Testing these changes out on two table on a 20 node cluster. One table had ~100K blocks
and one table had ~15K blocks.

In both cases I saw the following improvements:

* After making change 1) the total serialized size of the table dropped by ~30%
* After making change 1) and 2) the total serialized size of the table dropped by ~52%
* After making change 1), 2), and 3) the total serialized size of the table dropped by ~55%

Change-Id: Ic85b3cbcf775569f69b7303bec4adc52593fc35c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1351
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1480
2014-02-07 00:02:09 -08:00
Lenni Kuff
7a6892dcbe Fix race when invalidating catalog metadata and loading a new table
There was race when the catalog was invalidated at the same time a table
was being loaded. This is because an uninitialized Table was being returned
unexpectedly to the impalad due to the concurrent invalidate.

This fixes the problem by updating the CatalogObjectCache to load when
a catalog object is uninitialized, rather than load when null. New items can
now be added in a initialized or uninitialized state; uninitialized objects
are loaded on access.

Also adds a stress test for invalidate metadata/invalidate metadata <table>/refresh

In addition, it cleans up the locking in the Catalog to make it more
straight forward. The top-level catalogLock_ is now only in CatalogServiceCatalog
and this lock is used to protect the catalogVersion_. Operations that need to
perform an atomic bulk catalog operation can use this lock (such as when the
CatalogServer needs to take a snapshot of the catalog to calculate what delta to send
to the statestore). Otherwise, the lock is not needed and objects are protected by the
synchronization at each level in the object heirarchy (Db->[Function/Table]). That is,
Dbs are synchronized by the Db cache, each Db has a Table Cache which is synchronized
independently.

Change-Id: I9e542cd39cdbef26ddf05499470c0d96bb888765
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1355
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1418
2014-01-31 16:16:32 -08:00
Lenni Kuff
b4f5c1edcf Enable lazy loading of table metadata for the CatalogService/Impalad
This change adds support for lazy loading of table metadata to the
CatalogService/Impalad. The way this works is that the CatalogService initially
sends out an update with only the databases and table names (wrapped as
IncompleteTables). When an Impalad encounters one of these tables, it will contact
the catalog service to get the metadata, possibly triggering a metadata load if the
catalog server has not yet loaded this table.

With these changes the catalog server starts up in just seconds, even for large
metastores since it only needs to call into the metastore to get the list of tables
and databases. The performance of "invalidate metadata" also improves for the same reason.

I also picked up the catalog cleanup patch I had to make the APIs a bit more consistent and
remove the need for using a LoadingCache for databases.

This also fixes up the FE tests to run in a more realistic fashion. The FE tests now run
against catalog object recieved from the catalog server. This actually turned up some bugs
in our previous test configuration where we were not running with the correct column stats
(we were always running with avgSerializedSize = slotSize).  This changed some plans so the
planner tests needed to be updated.

Still TODO:
This does not include the changes to perform background metadata loading. I will send
that out as a separate patch on top of this.

Change-Id: Ied16f8a7f3a3393e89d6bfea78f0ba708d0ddd0e

Saving changes

Change-Id: I48c34408826b7396004177f5fc61a9523e664acc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1328
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1338
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-21 21:43:29 -08:00
Nong Li
69fe1c6c10 Change FE to use ColumnType instead of PrimitiveType.
PrimitiveType is an enum and cannot be used for more complex types. The change
touches a lot of files but very mechanically.

A similar change needs to be done in the BE which will be a subsequent patch.

The version as I have it breaks rolling upgrade due to the thrift changes. If
this is not okay, we can work around that but it will be annoying.

Change-Id: If3838bb27377bfc436afd6d90a327de2ead0af54
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1287
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1304
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:32:55 -08:00
Nong Li
04b501d3a1 [CDH5] Collect metadata for cached blocks.
Change-Id: I81026de2f9a08553dc15e07090b8297120aa7462
(cherry picked from commit 69414f67b20016e49b739a46d6e2b4b57e1d1a3c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1252
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-15 15:12:20 -08:00
Lenni Kuff
d39062f848 Fixup error message returned to user for tables with incomplete metadata
When there was an error loading the table metadata the Catalog Server
includes (among other things) the table loading error message(s) and
the call stack where the error happened. The error messages returned
to the user previously included the call stack, but with this change
the call remote (catalog service) call stack is only output to the impalad
log when the table is accessed by the user. This is done by annotating
and appending the remote call stack to the local call stack.

Change-Id: Icf46a3d7fc15d8b0a8e59564722fdb991b074618
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1063
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:23 -08:00
Alex Behm
1497002013 Added SHOW TABLE/COLUMN STATS command.
Fixed the following stats-related bugs:
- Per-partition row count was not distributed properly via CatalogService
- HBase column stats were not loaded and distributed properly

Enhancements to test framework:
- Allow regex specification of expected row or column values
- Fixed expected results of some tests because the test framework
  did not catch that they were incorrect

Change-Id: I1fa8e710bbcf0ddb62b961fdd26ecd9ce7b75d51
Reviewed-on: http://gerrit.ent.cloudera.com:8080/813
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:51 -08:00
Lenni Kuff
1ad26f77f6 Rename TTable field 'partition_columns' to 'clustering_columns'
The concept of partitioning is specific to HDFS and shouldn't be part of the Table
class or the TTable thrift struct. The more general concept is clustering, so renamed to
'clustering_columns'.

Change-Id: I09475d5f13877e6eddd9d53375e461bc68f26bcf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/773
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:44 -08:00
Lenni Kuff
9d5b94baa5 CatalogServer follow-on code review changes
Changes to address follow-on code review comments. This change consists mainly of:
* Comment cleanup / clarification
* Thrift struct consolidation
* Minor naming changes
* Small code fixes/changes, etc

Change-Id: Idd03cc8adeb9c0d99744688a02f81a08135966de
Reviewed-on: http://gerrit.ent.cloudera.com:8080/667
Tested-by: jenkins
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:42 -08:00
Nong Li
601f24a198 UDA execution loose ends.
Unfortunately, the BE does not have the codegen path to execute UDAs.
This puts some restrictions on the UDAs we can run.

- No IR UDAs
- No varargs
- Must have 8 arguments or less.

The code to do this is almost all there for UDFs but I'm not sure I'll get to it.

Change-Id: I8a06e635a9138397c8474a5704c3e588bb92347b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/703
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:38 -08:00
Lenni Kuff
8b2acf5c22 IMPALA-425: Detect read-only tables and disable INSERT/LOAD operations on these tables
With this change we now detect if a table is read-only and disable INSERT/LOAD operations
on these tables. A table is read-only if Impala does not have write permission on the HDFS
base directory of the table or any one of the partition directories (if
the table is partitioned).

Change-Id: I25515b2d0ffb7fe297359437fd937a3d6e0406a0
Reviewed-on: http://gerrit.ent.cloudera.com:8080/713
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:37 -08:00
Nong Li
6b9a7de02e Add symbol resolution during analysis for create function stmts.
Before this, we had to specify the entire mangled symbol. This can be quite
long and quite tedious (take a look at some of the create UDA test cases that
specify all the symbols).

This patch adds some code to convert from the user function signature to the
mangled name. This means the user can specify the unmangled name and we can
do the symbol lookup. The mangling rules are pretty convoluted but if it is
messed up, the user can always specify the full symbol.

Some other minor cleanup in:
  - JNI from FE to BE
  - UDFs/UDAs that are loaded as test data

Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/624
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:53:20 -08:00
Lenni Kuff
b9ac04e3f3 Add debug webpage handlers for dumping catalog objects by type and name
Adds impalad and catalogd debug webpage handlers for dumping the thrift
catalog objects structs by object name and type. For example, the following URL
could be used to dump tpch.lineitem:
<host>:25020/catalog_objects?object_type=TABLE&object_name=tpch.lineitem

This will be useful for debugging since it provides visibility into the
current state of the catalog.

Change-Id: I46f1cd675b41d456e9b784c810e6e3053f0cc137
Reviewed-on: http://gerrit.ent.cloudera.com:8080/640
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:19 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00