Commit Graph

5 Commits

Author SHA1 Message Date
Henry Robinson
2a69019525 IMPALA-945: Fix column reordering with SELECT expressions
Previously, to produce the correct output expressions for the root plan
fragment before a table sink, InsertStmt would reorder the result
expressions for the query statement at the plan root. This had stopped
working for SelectStmts (and test coverage didn't catch that).

Now InsertStmt produces its own output expressions that can substitute
for the originals from the query statement, and the planner uses those
instead.

All query tests for column reordering have been duplicated to use SELECT
expressions.

Change-Id: Ib909fe35d27416b33ba2e5ac797aa931e1fe43f9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2204
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
(cherry picked from commit d526db7ac6274f35b6affcb7428327100026e14e)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2275
2014-04-18 00:12:12 -07:00
Lenni Kuff
76fa3b2ded Update DDL to support 'STORED AS PARQUET' and 'STORED AS AVRO' syntax
This change updates our DDL syntax support to allow for using 'STORED AS PARQUET'
as well as 'STORED AS PARQUETFILE'. Moving forward we should prefer the new syntax,
but continue to support the old.  I made the same change for 'AVROFILE', but since
we have not yet documented the 'AVROFILE' syntax I left out support for the old syntax.

Change-Id: I10c73a71a94ee488c9ae205485777b58ab8957c9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1053
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:18 -08:00
Alex Behm
b82880738c IMPALA-617: Cast NULLs in INSERT statement due to incomplete permutation list to expected column type.
Inserting NULLs with NULL_TYPE into Parquet tables cases a crash.

Change-Id: I350c7ee2789c017cee5c4b6a1292c9fae36087f1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/696
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-01-08 10:53:29 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Henry Robinson
79b36a5eb3 IMPALA-375: Add column permutation clause to INSERT statement 2014-01-08 10:50:59 -08:00