Commit Graph

15 Commits

Author SHA1 Message Date
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Ippokratis Pandis
e34ede292c IMPALA-1016: Return correct number of NULL values when projecting newly added column
This patch handles the case where when a query was projecting a newly added column,
the parquet scanner was returning infinite values.

Change-Id: Ie5f4d4a88d5868e8d9e5c39fa9440821776dde3c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2725
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2761
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
2014-06-01 01:28:25 -07:00
Lenni Kuff
745c091fcc [CDH5] Update SHOW TABLE STATS to include per-partition HDFS caching stats
Change-Id: I71b01f84bbd308108d775e78c644e867b48e05be
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2621
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-28 08:54:54 -07:00
Lenni Kuff
c45e9a70d9 [CDH5] Add DDL support for HDFS caching
This change adds DDL support for HDFS caching. The DDL allows the user to indicate a
table or partition should be cached and which pool to cache the data into:
* Create a cached table: CREATE TABLE ... CACHED IN 'poolName'
* Cache a table/partition: ALTER TABLE ... [partitionSpec] SET CACHED IN 'poolName'
* Uncache a table/partition: ALTER TABLE ... [partitionSpec] SET UNCACHED

When a table/partition is marked as cached, a new HDFS caching request is submitted
to cache the location (HDFS path) of the table/partition and the ID of that request
is stored with in the table metadata (in the table properties). This is stored as:
'cache_directive_id'='<requestId>'. The cache requests and IDs are managed by HDFS
and persisted across HDFS restarts.

When a cached table or partition is dropped it is important to uncache the cached data
(drop the associated cache request). For partitioned tables, this means dropping all
cache requests from all cached partitions in the table.
Likewise, if a partitioned table is created as cached, new partitions should be marked
as cached by default.

It is desirable to know which cache pools exists early on (in analysis) so the query
will fail without hitting HDFS/CatalogServer if a non-existent pool is specified. To
support this, a new cache pool catalog object type was introduced. The catalog server
caches the known pools (periodically refreshing the cache) and sends the known pools out
in catalog updates. This allows impalads to perform analysis checks on cache pool
existence going to HDFS. It would be easy to use this to add basic cache pool management
in the future (ADD/DROP/SHOW CACHE POOL).

Waiting for the table/partition to become cached may take a long time. Instead of
blocking the user from access the time during this period we will wait for the cache
requests to complete in the background and once they have finished the table metadata
will be automatically refreshed.

Change-Id: I1de9c6e25b2a3bdc09edebda5510206eda3dd89b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2310
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-05-27 16:47:15 -07:00
Lenni Kuff
e97a1b52e0 Remove flaky verification in ALTER/CREATE table tests
This fixes the flaky ALTER/CREATE tests by removing a verification step that
didn't add value and was non-deterministic. The verficiation step that was
removed verified that CREATE/ALTER set the appropriate file format by
changing the format to something that didn't match the underlying data files,
then attempting to read the data. This is already covered by the positive
test case where the file format is changed to match the underlying data.

Change-Id: I66f485405234f472f3b83f3e776bf7f2c10de874
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1379
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1382
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-28 16:03:02 -08:00
Alex Behm
24662b1941 Allow ALTER TABLE to set per-partition serde and table properties.
The main motivation is to allow users to set the per-partition number
of rows for manual incremental stats maintenance, as well as a means
to 'drop' stats that may have caused undesirable plan changes.

Change-Id: Iff38317a993e5d7952ea4df839947f5ec341e930
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1010
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:22 -08:00
Lenni Kuff
39f77b8b8f Add support for cluster-synchronized catalog operations
This change adds support for cluster-synchronized catalog operations. This provides the
guaranteethat after a catalog op completes, all other subscribers to the catalog topic have
also processed that update. This is useful when load balancing, because a common workflow
is to target a different impalad for each statement executed.
For example if each of the following were executed sequentially, but targeting
a different node:
1) CREATE TABLE Foo
2) INSERT INTO Foo
3) SELECT * FROM Foo
4) INSERT INTO Foo ....

Since both the INSERT and the CREATE update the catalog, it would not work as expected
without this patch. The user might either get a "table not found" error or would be
missing partition information from the INSERT.

The downside is that this approach to DDL takes a bit longer because we need to wait
until all subscribers have processed an update. If all nodes are healthy, this overhead
should not be significantly longer than the current DDL time. However, a single bad node
might slow down or completely block the completion of all DDL operations. By default
this feature is disabled, but it can be enabled using a new query option: SYNCED_DDL=1

To test this, the base test suite was updated to support selecting a random impalad
to execute each query section in a query test file. This is currently only enabled
for the insert and DDL tests, but could be leveraged by more tests in the future.

TODO: Add additional failure tests around this functionality.
TODO: Add an explicit "sync" statement so users do not need to run all their DDL
in this mode (since it is slower).

Change-Id: I45e757a931bf2a4740cc0cdd1e76ce49a1e22b83
Reviewed-on: http://gerrit.ent.cloudera.com:8080/899
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:58 -08:00
Lenni Kuff
8bb0010415 IMPALA-597: Do not crash when multiple partitions have the same LOCATION
This patch fixes an issue where Impala would crash if two partitions had
the same HDFS location. This is now fixed in hdfs-scan-node. It also includes some
cleanup and bug fixes to the FE partition related classes and adds tests.

There is still a problem where partition location metadata is not sent
to the BE for INSERT statements, but that will be resolved in a separate
patch.

Change-Id: I0f1c3113d654f7d2b410f00e793ff6b0cae1ae18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/876
Reviewed-by: Alan Choi <alan@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:57 -08:00
Lenni Kuff
e0507e192b Fix unstable alter table test 2014-01-08 10:50:26 -08:00
Henry Robinson
ead69d377f IMPALA-249, IMPALA-252: Fixes for static partition keys. 2014-01-08 10:50:14 -08:00
Alex Behm
673d7b97cf IMPALA-190: Insert with NULL partition keys results in SIGSEGV. 2014-01-08 10:49:22 -08:00
Lenni Kuff
018a72bfe2 IMPALA-189: Properly support NULL partition key values in ALTER .. PARTITION statements 2014-01-08 10:49:21 -08:00
Lenni Kuff
5a0b1270c4 Add support for ALTER ... PARTITION (partitionSpec) SET FILEFORMAT/LOCATION
Adds support for:
* ALTER TABLE <table> PARTITION (partitionSpec) SET FILEFORMAT
* ALTER TABLE <table> PARTITION (partitionSpec) SET LOCATION

This enables setting the location and fileformat of specific partitions.
2014-01-08 10:49:17 -08:00
Lenni Kuff
f4a5c0628f Cleanup HDFS directories before and after running ALTER TABLE tests 2014-01-08 10:49:17 -08:00
Lenni Kuff
1fb72fbc73 IMPALA-156: Support core 'ALTER TABLE' DDL command
This patch adds support for
- ALTER TABLE ADD|REPLACE COLUMNS
- ALTER TABLE DROP COLUMN
- ALTER TABLE ADD/DROP PARTITION
- ALTER TABLE SET FILEFORMAT
- ALTER TABLE SET LOCATION
- ALTER TABLE RENAME
2014-01-08 10:49:14 -08:00