Commit Graph

10 Commits

Author SHA1 Message Date
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Ippokratis Pandis
e1ae5fe95a IMPALA-1068: COMPUTE STATS should place -1 in #NULLs
With IMPALA-1033 we disabled the counting of the number of NULLs in each column,
and that gave a 2x speed-up in the computation. But erroneously the value 0 was
being placed in the number of NULLs, instead of the correct -1 that indicates
'unknown'.

Change-Id: Ib882eb2a87e7e2469f606081cb2881461b441a45
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3377
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3378
2014-07-07 15:13:25 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Alex Behm
70d7ff07af CDH-19856: Disable Hive's stats autogathering.
Change-Id: I04e91f91d29b7863848a750e362c9d94469df7f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3156
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3169
2014-06-19 16:48:34 -07:00
Lenni Kuff
745c091fcc [CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats
Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-28 08:54:54 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Lenni Kuff
9e2dd7e049 Add support for SHOW PARTITIONS <table name>
This statement returns info on all partitions for the given table. It is implemented as
an alias for SHOW TABLE STATS, with some extended analysis checks (such as throwing if
the statement targets an unpartitioned table).

Change-Id: I19154a9d90314de18f86ba355aa5dbed808f147f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2145
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2179
Tested-by: jenkins
2014-04-10 12:15:39 -07:00
Alex Behm
74164e8f99 IMPALA-688: Fix column stats computation for HBase row key. Use regex to fix flaky tests.
Change-Id: I1d3fb915921bbc5366da0ee51608fd54aa237777
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1135
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:54:33 -08:00
Alex Behm
e4ad086dee Added max/avg length for string columns in COMPUTE STATS.
Change-Id: I6f61de2323ee12681642684ec633ed4bb7506de2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1079
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:30 -08:00
Lenni Kuff
bfb16ff552 Disable SHOW STATS tests because results are unstable (IMPALA-688)
Change-Id: Ib4b4fe3a29d3bd0e3c7ece8b5b21c4ec4b5eb289
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1060
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:54:22 -08:00