impala

mirror of https://github.com/apache/impala.git synced 2025-12-31 06:02:51 -05:00

Author	SHA1	Message	Date
Nong Li	5d903efca3	ExecSummary The runtime profile as we present it is not very useful and I think the structure of it makes it hard to consume. This patch adds a new client facing schemed set of counters that are collected from the runtime profiles. For example, with this structure it would be easy to have the shell get the stats of a running query and print a useful progress report or to check the most relevant metrics for diagnosing issues. Here's an example of the output for one of the tpch queries: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------ 09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED 05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B 04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE 08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE 07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority) 03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB 02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST \|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST \| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o 00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l Change-Id: Iaad4b9dd577c375006313f19442bee6d3e27246a Reviewed-on: http://gerrit.ent.cloudera.com:8080/2964 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-11 03:10:11 -07:00
Matthew Jacobs	25c0ebf58c	External Data Source: Public API Adds the thrift structures for the public external data source API and a new maven project containing the Java ExternalDataSource interface and the generated Java thrift classes. The ExternalDataSource.thrift structures can evolve in a backward compatible way. The ExternalDataSource Java interface will always contain a version number in the namespace (e.g. com.cloudera.impala.extdatasource.v1 for V1) so we can potentially make breaking changes to the interface in the future but still support older versions. A trivial implementation of the ExternalDataSource API is also added for testing purposes. TODO: Make the sample data source implementation realistic. Change-Id: I827d6420a87ed7a2bce34e050362ca98ddc5dbcc Reviewed-on: http://gerrit.ent.cloudera.com:8080/2241 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: jenkins (cherry picked from commit f29814e9ede9d4c889f2648606fcf511feeb47ae) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2313	2014-04-22 18:34:48 -07:00
Nong Li	0d2919fe7f	Refactor scalar and aggregate function analysis and execution. This patch cleans up analysis and execution of scalar and aggregate functions so that there is no difference between how builtins and user functions are handled. The only difference is that the catalog is populated with the builtins all the time. The BE always gets a TFunction object and just executes it (builtins will have an empty hdfs file location). This removes the opcode registry and all of the functionality is subsumed by the catalog, most of which was already duplicated there anyway. This also introduces the concept of a system database; databases that the user cannot modify and is populated automatically on startup. Change-Id: Iaa3f84dad0a1a57691f5c7d8df7305faf01d70ed Reviewed-on: http://gerrit.ent.cloudera.com:8080/1386 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1577	2014-02-18 18:40:08 -08:00
Alex Behm	dc7b398bd3	Impala reserves resources from YARN via LLama. Impala reserves resources from YARN via Llama and handles resources preemptions by cancelling affected queries. Adds the Impala Resource Broker for interacting with Llama. Refactors scheduler and coordinator to move fragment-to-host assignment logic into scheduler. Local test setup uses MiniLLama. Change-Id: Ic7b0fe43de52d30f4207b4e65cce7e6a294e54e1	2014-01-15 15:12:04 -08:00
Henry Robinson	51e58e1f3c	Statestore aesthetic cleanup * Statestore is now one word, without camelcase, eveywhere. Previous names included StateStore, state-store and state_store, variously. The only exception is a couple of flags that have 'state_store', and can't be changed for compatibility reasons. * File names are also changed to reflect the standard naming. * Most comments are now 90 chars wide (from 80 before) Change-Id: I83b666c87991537f9b1b80c2f0ea70c2e0c07dcf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1225 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-01-09 09:56:04 -08:00
Lenni Kuff	9d5b94baa5	CatalogServer follow-on code review changes Changes to address follow-on code review comments. This change consists mainly of: * Comment cleanup / clarification * Thrift struct consolidation * Minor naming changes * Small code fixes/changes, etc Change-Id: Idd03cc8adeb9c0d99744688a02f81a08135966de Reviewed-on: http://gerrit.ent.cloudera.com:8080/667 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:42 -08:00
Lenni Kuff	bf139d1eba	Update catalogd to forward log4j log messages to glog Change-Id: I4620b77ba731e134a3e48883e8ae7ee3820ed584 Reviewed-on: http://gerrit.ent.cloudera.com:8080/612 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-01-08 10:53:12 -08:00
Lenni Kuff	a2cbd2820e	Add Catalog Service and support for automatic metadata refresh The Impala CatalogService manages the caching and dissemination of cluster-wide metadata. The CatalogService combines the metadata from the Hive Metastore, the NameNode, and potentially additional sources in the future. The CatalogService uses the StateStore to broadcast metadata updates across the cluster. The CatalogService also directly handles executing metadata updates request from impalad servers (DDL requests). It exposes a Thrift interface to allow impalads to directly connect execute their DDL operations. The CatalogService has two main components - a C++ server that implements StateStore integration, Thrift service implementiation, and exporting of the debug webpage/metrics. The other main component is the Java Catalog that manages caching and updating of of all the metadata. For each StateStore heartbeat, a delta of all metadata updates is broadcast to the rest of the cluster. Some Notes On the Changes --- * The metadata is all sent as thrift structs. To do this all catalog objects (Tables/Views, Databases, UDFs) have thrift struct to represent them. These are sent with each statestore delta update. * The existing Catalog class has been seperated into two seperate sub-classes. An ImpladCatalog and a CatalogServiceCatalog. See the comments on those classes for more details. What is working: * New CatalogService created * Working with statestore delta updates and latest UDF changes * DDL performed on Node 1 is now visible on all other nodes without a "refresh". * Each DDL operation against the Catalog Service will return the catalog version that contains the change. An impalad will wait for the statestore heartbeat that contains this version before returning from the DDL comment. * All table types (Hbase, Hdfs, Views) getting their metadata propagated properly * Block location information included in CS updates and used by Impalads * Column and table stats included in CS updates and used by Impalads * Query tests are all passing Still TODO: * Directly return catalog object metadata from DDL requests * Poll the Hive Metastore to detect new/dropped/modified tables * Reorganize the FE code for the Catalog Service. I don't think we want everything in the same JAR. Change-Id: I8c61296dac28fb98bcfdc17361f4f141d3977eda Reviewed-on: http://gerrit.ent.cloudera.com:8080/601 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:11 -08:00
Nong Li	af90c8a133	Fix memory usage tracking. Changes MemLimit to MemTracker: - the limit is optional - it also records a label and an optional parent - Consume() and Release() also update the ancestors and there's also a new AnyLimitExceeded(), which also checks the ancestors - the consumption counter is a HighwaterMarkCounter and can optionally be created as part of a profile Each fragment instance now has a MemTracker that is part of a 3-level hierarchy: process, query, fragment instance. Change-Id: I5f580f4956fdf07d70bd9a6531032439aaf0fd07 Reviewed-on: http://gerrit.ent.cloudera.com:8080/339 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:36 -08:00
Henry Robinson	90ed9f0ab8	Remove planservice	2014-01-08 10:50:20 -08:00
Henry Robinson	2ae20cbbb7	Statestore-2.0: New state-store implementation * API simplified to deal only with 'topics', not services and objects * Scalability improved: heartbeat loop is now multi-threaded * State-store can store arbitrary objects * State-store may send either deltas or complete topic state (delta computation to come)	2014-01-08 10:49:23 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Nong Li	868a99135a	Add network benchmark	2014-01-08 10:47:56 -08:00
Alan Choi	be98df19c8	HiveServer2 This patch implements the HiveServer2 API. We have tested it with Lenni's patch against the tpch workload. It has also been tested manually against Hive's beeline with queries and metadata operations. All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax code is refactored to impala-beeswax-server.cc. HiveServer2 has a few more metadata operations. These operations go through impala-hs2-server to ddl-executor and then to FE. The logics are implemented in fe/src/main/java/com/cloudera/impala/service/MetadataOp.java. Because of the Thrift union issue, I have to modify the generated c++ file. Therefore, all the HiveServer2 thrift generated c++ code are checked into be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove these files. Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881	2014-01-08 10:47:24 -08:00
Henry Robinson	7ba437a52e	Code changes to build against thrift 0.9.0 in thirdparty/	2014-01-08 10:47:22 -08:00
Henry Robinson	986f3cddf6	Move sparrow/ to statestore/ and remove sparrow namespace	2014-01-08 10:47:12 -08:00
Nong Li	2289906a5a	Fix linker dependencies.	2014-01-08 10:46:56 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
ishaan	05c65789bb	Change Copyrights from 2011 ti 2012	2014-01-08 10:46:29 -08:00
Michael Ubell	ad46b98366	Add Kerberos authentication.	2014-01-08 10:45:10 -08:00
Marcel Kornacker	c004cdaa1c	Thrift structures for the new planner interface.	2014-01-08 10:44:47 -08:00
Marcel Kornacker	fb32d40b03	Switching to an asynchronous plan fragment exec interface; this entails: - making the coordinator asynchronous - renamed ImpalaBackendService to ImpalaInternalService; - new class ImpalaServer implements ImpalaService and ImpalaInternalService - renaming ImpalaInternalService fields to conform to c++ style - merged impala-service.{cc,h} and backend-service.{cc,h} into impala-server.{cc,h} - added TStatusCode field to Status.ErrorDetail - removed ImpalaInternalService.CloseChannel Also removed JdbcDriverTest.java	2014-01-08 10:44:15 -08:00
Kay Ousterhout	073e38d6c2	Added the StateStore, a centralized repository for soft state. The commit also adds the StateStoreSubscriber, a component that runs alongside each impalad and handles communication with the state store.	2012-07-13 09:26:16 -07:00
Alan Choi	f52286f72c	This completes the Beeswax implementation for ODBC. All the ODBC tests (CDH/hive-odbc-test) passes (except those with "create table" and "show table". We should have nightly regression of the odbc test to run against impalad. There're still a few issues: 1. running with num_node > 0 crashes the coordinator; 2. work around for a few ODBC jiras 3. no test for bool/timestamp because ODBC doesn't support them. review: issue 110	2012-06-18 14:46:46 -07:00
Alan Choi	ef10afa439	This changes the Thrift from 0.6.1 to 0.7.0. Please uninstall the old thrift and download/install Thrift 0.7.0. Beeswax service now depends on Hive metastore; fix buildall.sh to clean generated-source in FE; fix .gitignore to clean generated-source in BE;	2012-06-14 18:21:08 -07:00
Alan Choi	7af87c7dea	Beeswax Service for Impala (partiial implementation) review id: 82	2012-06-06 10:08:06 -07:00
Henry Robinson	3ff3559805	Add support for per-partition file formats to front end and backend. At the same time, this patch removes the partitionKeyRegex in favour of explicitly sending a list of literal expressions for each file path from the front end.	2012-06-05 12:00:09 -07:00
Marcel Kornacker	4a4a07fde7	A number of changes for the Jenkins build: - added option to run with derby metastore, based on whether env var METASTORE_IS_DERBY is set - emoved hardwired file locations from planner tests - switching to linking statically against libthrift.a Also added script rebuild.sh, which contains the build steps of buildall.sh (against impala sources).	2012-03-08 16:19:47 -08:00
Nong Li	b410b62716	Add distributed profile counter for the BE.	2012-03-01 13:59:17 -08:00
Nong Li	88237350f0	Change the build to allow debug and release builds to coexist.	2012-02-17 18:14:04 -08:00
Nong Li	94db70c9fd	Fix build. Dependencies don't propagate right on first build.	2011-12-30 21:28:18 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00
Marcel Kornacker	c056445612	Added m:n data streams: - DataStreamSender: sender side (1:n) for a single stream - DataStreamMgr: receiver side; singleton class for all incoming streams active at a node Changed ExecNode::GetNext() to return eos indicator explicitly; this allows us to pass incoming TRowBatches (which may not be full) up w/o copying the data. Added data-stream-test.	2012-01-10 18:00:20 -08:00
Alexander Behm	c7f7382c31	Added planner changes and data sinks for INSERT statements.	2011-12-12 15:14:49 -08:00
Nong Li	b1833d4de8	Implmented opcode registry. Added substr() and pi() functions. Added backend testing to buildall.sh	2011-11-20 13:44:41 -08:00
Marcel Kornacker	0914fedea9	Defining Impala backend service, which is exported by backend processes to service plan fragment execution requests. Changing thrift plan-related structs to pull out runtime parameters in preparation for parallel execution.	2011-11-02 14:47:32 -07:00
Marcel Kornacker	c534062c20	fixing build failure introduced by 7d9b7e2	2011-08-03 15:37:58 -07:00
Marcel Kornacker	cc141953de	Adding plan service for be test driver Adding mock implementation of libhdfs (only what's needed for text-scan-node) in order to avoid having to make any jni calls. Some bug fixes.	2011-07-22 12:09:55 -07:00
marcel	08e8a5db4c	fixed jni and linker problems some bug fixes and missing functions some cleanup	2011-07-15 13:17:20 -07:00
Marcel Kornacker	c23616a30c	deserializing plan request in c++ Coordinator.main(): util function to execute single query against test schema removed dead code from TestSchemaUtils	2011-07-13 13:48:54 -07:00
marcel	3286190599	Initial version of backend.	2011-07-07 15:49:46 -07:00

41 Commits