Commit Graph

29 Commits

Author SHA1 Message Date
Alex Behm
e9864d5f78 Introduce type hierarchy and add complex types.
This patch replaces ColumnType with a hierarchy of types that models
the existing scalar types as well as the new complex types ARRAY, MAP,
and STRUCT.

Change-Id: Ia895f41153e99febb0c35412acac12689c3c2064
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3491
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3538
2014-07-21 20:00:46 -07:00
Matthew Jacobs
ebc6c5894e External Data Source: Frontend and catalog changes
Initial frontend and catalog changes for external data sources.

Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485
2014-05-08 14:56:19 -07:00
Nong Li
69fe1c6c10 Change FE to use ColumnType instead of PrimitiveType.
PrimitiveType is an enum and cannot be used for more complex types. The change
touches a lot of files but very mechanically.

A similar change needs to be done in the BE which will be a subsequent patch.

The version as I have it breaks rolling upgrade due to the thrift changes. If
this is not okay, we can work around that but it will be annoying.

Change-Id: If3838bb27377bfc436afd6d90a327de2ead0af54
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1287
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1304
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2014-01-17 14:32:55 -08:00
Lenni Kuff
a2cbd2820e Add Catalog Service and support for automatic metadata refresh
The Impala CatalogService manages the caching and dissemination of cluster-wide metadata.
The CatalogService combines the metadata from the Hive Metastore, the NameNode,
and potentially additional sources in the future. The CatalogService uses the
StateStore to broadcast metadata updates across the cluster.
The CatalogService also directly handles executing metadata updates request from
impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to
directly connect execute their DDL operations.
The CatalogService has two main components - a C++ server that implements StateStore
integration, Thrift service implementiation, and exporting of the debug webpage/metrics.
The other main component is the Java Catalog that manages caching and updating of of all
the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast
to the rest of the cluster.

Some Notes On the Changes
---
* The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views,
Databases, UDFs) have thrift struct to represent them. These are sent with each statestore
delta update.
* The existing Catalog class has been seperated into two seperate sub-classes. An
ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more
details.

What is working:
* New CatalogService created
* Working with statestore delta updates and latest UDF changes
* DDL performed on Node 1 is now visible on all other nodes without a "refresh".
* Each DDL operation against the Catalog Service will return the catalog version that
  contains the change. An impalad will wait for the statestore heartbeat that contains this
  version before returning from the DDL comment.
* All table types (Hbase, Hdfs, Views) getting their metadata propagated properly
* Block location information included in CS updates and used by Impalads
* Column and table stats included in CS updates and used by Impalads
* Query tests are all passing

Still TODO:
* Directly return catalog object metadata from DDL requests
* Poll the Hive Metastore to detect new/dropped/modified tables
* Reorganize the FE code for the Catalog Service. I don't think we want everything in the
  same JAR.

Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda
Reviewed-on: http://gerrit.ent.cloudera.com:8080/601
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:11 -08:00
Skye Wanderman-Milne
c8a8308ece Avro schema resolution (minus default values) 2014-01-08 10:51:26 -08:00
Alan Choi
254ee6ef89 IMPALA-434 Support binary hbase encoding 2014-01-08 10:51:18 -08:00
Alex Behm
9ff09cd3f4 IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL 2014-01-08 10:50:28 -08:00
Nong Li
547c75e3d5 Add gzip compression to parquet. 2014-01-08 10:50:24 -08:00
Nong Li
1f6481382e Fix parquet test setup. 2014-01-08 10:49:41 -08:00
Nong Li
6e293090e6 Parquet writer.
Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec
2014-01-08 10:48:44 -08:00
Skye Wanderman-Milne
57c3072188 Add support for reading Avro files compressed using the deflate codec. 2014-01-08 10:48:36 -08:00
Skye Wanderman-Milne
8b87099998 IMPALA-2: Support for Avro data files
Adds HdfsAvroScanner, as well as modifies the sequence scanners to be more general.
2014-01-08 10:48:05 -08:00
Henry Robinson
7ba437a52e Code changes to build against thrift 0.9.0 in thirdparty/ 2014-01-08 10:47:22 -08:00
Michael Ubell
8a5297a526 Add HdfsLzoTextScanner 2014-01-08 10:46:35 -08:00
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
ishaan
05c65789bb Change Copyrights from 2011 ti 2012 2014-01-08 10:46:29 -08:00
Henry Robinson
4b60df6458 IMP-63 and IMP-140: Update metastore after INSERT query 2014-01-08 10:44:22 -08:00
Michael Ubell
02d63d8dc3 Trevni file support 2014-01-08 10:44:19 -08:00
Alan Choi
bee2736de7 IMP-35: query without any column reference failed
For example,
  select 1 from alltypessmall;

This is because TDescriptorTAble.slotDescriptors is empty. The fix is to
make it optional.
2012-07-17 17:50:51 -07:00
Alexander Behm
097616a31d Single node execution of union. 2012-07-11 13:12:43 -07:00
Michael Ubell
f2ea38831d Refactor compress/decompress. 2012-07-09 22:20:29 -07:00
Michael Ubell
c0b384f713 IMP-89: Fix RC and SEQ files if splits read out of order.
We need to skip the header if we read it in a previous split.
2012-06-27 14:57:34 -07:00
Michael Ubell
9a0433eebd Add compression and blocksize to serde parameters for Trevni 2012-06-14 07:37:34 -07:00
Henry Robinson
3ff3559805 Add support for per-partition file formats to front end and backend.
At the same time, this patch removes the partitionKeyRegex in favour
of explicitly sending a list of literal expressions for each file path
from the front end.
2012-06-05 12:00:09 -07:00
Nong Li
344c171c6a Aggregation Node Codegen. 2012-05-21 14:47:57 -07:00
Henry Robinson
2af14392a6 Serial INSERT support 2012-05-03 13:44:32 -07:00
Michael Ubell
62d29ff1c6 Sequence File Scanner 2012-05-01 17:48:24 -07:00
Nong Li
783480d6bf - Cleaned up some TODOs.
- Fix tuple template.  Fixed strcmp
- atoi/atof handle overflows.
- added likely/unlikely compiler directive
- Runquery now reports mean/stddev for profile runs
- removed quoted char
2012-01-18 23:08:29 -08:00
Nong Li
c84fec38d3 - Move thrift out of FE src and into impala/common
- Thrift files now build using cmake instead of mvn
- Added cmake build to impala/ which drives the build process
2011-12-30 19:35:20 -08:00