impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 09:02:19 -05:00

Author	SHA1	Message	Date
Alex Behm	e9864d5f78	Introduce type hierarchy and add complex types. This patch replaces ColumnType with a hierarchy of types that models the existing scalar types as well as the new complex types ARRAY, MAP, and STRUCT. Change-Id: Ia895f41153e99febb0c35412acac12689c3c2064 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3491 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3538	2014-07-21 20:00:46 -07:00
Matthew Jacobs	ebc6c5894e	External Data Source: Frontend and catalog changes Initial frontend and catalog changes for external data sources. Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485	2014-05-08 14:56:19 -07:00
Nong Li	69fe1c6c10	Change FE to use ColumnType instead of PrimitiveType. PrimitiveType is an enum and cannot be used for more complex types. The change touches a lot of files but very mechanically. A similar change needs to be done in the BE which will be a subsequent patch. The version as I have it breaks rolling upgrade due to the thrift changes. If this is not okay, we can work around that but it will be annoying. Change-Id: If3838bb27377bfc436afd6d90a327de2ead0af54 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1287 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1304 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2014-01-17 14:32:55 -08:00
Lenni Kuff	a2cbd2820e	Add Catalog Service and support for automatic metadata refresh The Impala CatalogService manages the caching and dissemination of cluster-wide metadata. The CatalogService combines the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogService uses the StateStore to broadcast metadata updates across the cluster. The CatalogService also directly handles executing metadata updates request from impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to directly connect execute their DDL operations. The CatalogService has two main components - a C++ server that implements StateStore integration, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is the Java Catalog that manages caching and updating of of all the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast to the rest of the cluster. Some Notes On the Changes --- * The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views, Databases, UDFs) have thrift struct to represent them. These are sent with each statestore delta update. * The existing Catalog class has been seperated into two seperate sub-classes. An ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more details. What is working: * New CatalogService created * Working with statestore delta updates and latest UDF changes * DDL performed on Node 1 is now visible on all other nodes without a "refresh". * Each DDL operation against the Catalog Service will return the catalog version that contains the change. An impalad will wait for the statestore heartbeat that contains this version before returning from the DDL comment. * All table types (Hbase, Hdfs, Views) getting their metadata propagated properly * Block location information included in CS updates and used by Impalads * Column and table stats included in CS updates and used by Impalads * Query tests are all passing Still TODO: * Directly return catalog object metadata from DDL requests * Poll the Hive Metastore to detect new/dropped/modified tables * Reorganize the FE code for the Catalog Service. I don't think we want everything in the same JAR. Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda Reviewed-on: http://gerrit.ent.cloudera.com:8080/601 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
Skye Wanderman-Milne	c8a8308ece	Avro schema resolution (minus default values)	2014-01-08 10:51:26 -08:00
Alan Choi	254ee6ef89	IMPALA-434 Support binary hbase encoding	2014-01-08 10:51:18 -08:00
Alex Behm	9ff09cd3f4	IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL	2014-01-08 10:50:28 -08:00
Nong Li	547c75e3d5	Add gzip compression to parquet.	2014-01-08 10:50:24 -08:00
Nong Li	1f6481382e	Fix parquet test setup.	2014-01-08 10:49:41 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Skye Wanderman-Milne	57c3072188	Add support for reading Avro files compressed using the deflate codec.	2014-01-08 10:48:36 -08:00
Skye Wanderman-Milne	8b87099998	IMPALA-2: Support for Avro data files Adds HdfsAvroScanner, as well as modifies the sequence scanners to be more general.	2014-01-08 10:48:05 -08:00
Henry Robinson	7ba437a52e	Code changes to build against thrift 0.9.0 in thirdparty/	2014-01-08 10:47:22 -08:00
Michael Ubell	8a5297a526	Add HdfsLzoTextScanner	2014-01-08 10:46:35 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
ishaan	05c65789bb	Change Copyrights from 2011 ti 2012	2014-01-08 10:46:29 -08:00
Henry Robinson	4b60df6458	IMP-63 and IMP-140: Update metastore after INSERT query	2014-01-08 10:44:22 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Alan Choi	bee2736de7	IMP-35: query without any column reference failed For example, select 1 from alltypessmall; This is because TDescriptorTAble.slotDescriptors is empty. The fix is to make it optional.	2012-07-17 17:50:51 -07:00
Alexander Behm	097616a31d	Single node execution of union.	2012-07-11 13:12:43 -07:00
Michael Ubell	f2ea38831d	Refactor compress/decompress.	2012-07-09 22:20:29 -07:00
Michael Ubell	c0b384f713	IMP-89: Fix RC and SEQ files if splits read out of order. We need to skip the header if we read it in a previous split.	2012-06-27 14:57:34 -07:00
Michael Ubell	9a0433eebd	Add compression and blocksize to serde parameters for Trevni	2012-06-14 07:37:34 -07:00
Henry Robinson	3ff3559805	Add support for per-partition file formats to front end and backend. At the same time, this patch removes the partitionKeyRegex in favour of explicitly sending a list of literal expressions for each file path from the front end.	2012-06-05 12:00:09 -07:00
Nong Li	344c171c6a	Aggregation Node Codegen.	2012-05-21 14:47:57 -07:00
Henry Robinson	2af14392a6	Serial INSERT support	2012-05-03 13:44:32 -07:00
Michael Ubell	62d29ff1c6	Sequence File Scanner	2012-05-01 17:48:24 -07:00
Nong Li	783480d6bf	- Cleaned up some TODOs. - Fix tuple template. Fixed strcmp - atoi/atof handle overflows. - added likely/unlikely compiler directive - Runquery now reports mean/stddev for profile runs - removed quoted char	2012-01-18 23:08:29 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00

29 Commits