Commit Graph

91 Commits

Author SHA1 Message Date
Michael Ubell
8a5297a526 Add HdfsLzoTextScanner 2014-01-08 10:46:35 -08:00
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
ishaan
ccb020c4a0 Adding copyrights to remaining files. 2014-01-08 10:46:30 -08:00
ishaan
05c65789bb Change Copyrights from 2011 ti 2012 2014-01-08 10:46:29 -08:00
Henry Robinson
dd0e9f1180 IMP-265: State-store subscriber recovery mode 2014-01-08 10:46:25 -08:00
Michael Ubell
c1852e2dcf Add from_unixtime and unix_timestamp(string, string) 2014-01-08 10:46:22 -08:00
Marcel Kornacker
ea050a43ad Switching over backend runtime structures to new planner.
Added container-util.h
2014-01-08 10:46:20 -08:00
Nong Li
08968c1d07 Performance improvements for aggregation and hash join nodes with codegen. 2014-01-08 10:46:19 -08:00
Alan Choi
0ce8a044e3 Disable RC/Trevni (with option to allow it); remove file_buffer_size
IMP-336: remove file_buffer_size query options
Add "allow_unsupported_formats" query options to allow RC/Trevni in our test; disabled by
default
2014-01-08 10:46:02 -08:00
Alan Choi
dbf1074066 Fragments report errors to coordinator.
Enable multi-node DataErrorTest (IMP-250 resolved)
Check fragment/coord errors in DataErrorTest
2014-01-08 10:46:00 -08:00
Henry Robinson
91c3b979ca IMP-370: SHOW TABLES IN support and IMP-363: SHOW DATABASES
Change-Id: Ic41c4b0767a0480f0a18e1e985f25de3bc2ca947
2014-01-08 10:45:59 -08:00
Henry Robinson
540673763f Add session key handling to ThriftServer, and session support to the frontend 2014-01-08 10:45:59 -08:00
Marcel Kornacker
927f4c52f8 Adding the remaining pieces of functionality to the new planner:
- HBaseScanNode.getScanRangeLocations()
- new planner creates INSERT plans
- Frontend.createExecRequest2(), which calls NewPlanner.
2014-01-08 10:45:58 -08:00
Michael Ubell
48c454d319 IMP-267 Add version() function. 2014-01-08 10:45:57 -08:00
Nong Li
a9ff7323f2 Fix our string to numeric casts to use StringParser instead of lexical_cast. 2014-01-08 10:45:57 -08:00
Marcel Kornacker
5984c0be52 First cut of partitioned plan generation:
- created new class PlanFragment, which encapsulates everything having to do with a single
  plan fragment, including its partition, output exprs, destination node, etc.
- created new class DataPartition
- explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints
- Adding IdGenerator class.
- moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for
  PlanFragment.getExplainString()
- Changed planner interface to return scan ranges with a complete list of server locations,
  instead of making a server assignment.

Also included: cleaned up AggregateInfo:
- the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation.
- moved analysis functionality into AggregateInfo

Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).
2014-01-08 10:45:56 -08:00
Nong Li
2ea454fcda Updated coordinator to output summary profiles. 2014-01-08 10:45:14 -08:00
Michael Ubell
ad46b98366 Add Kerberos authentication. 2014-01-08 10:45:10 -08:00
Nong Li
073837de79 Change default abort on error to false. 2014-01-08 10:45:08 -08:00
Marcel Kornacker
a0f0064a2a additional Thrift changes for new planner 2014-01-08 10:45:07 -08:00
Nong Li
be33587e10 Added wall based rate counters and other counter cleanup. 2014-01-08 10:45:06 -08:00
Henry Robinson
e7348a209b IMP-232: Parallel INSERT OVERWRITE 2014-01-08 10:45:04 -08:00
Nong Li
126971edbb Update Impala to use CDH4.1 rc3. 2014-01-08 10:45:04 -08:00
Henry Robinson
e3e6ba984b Show / describe 2014-01-08 10:44:49 -08:00
Marcel Kornacker
c004cdaa1c Thrift structures for the new planner interface. 2014-01-08 10:44:47 -08:00
Nong Li
a417099e66 Fix runtime profile aggregated throughput. 2014-01-08 10:44:47 -08:00
Nong Li
689fb7d799 Push throughput counter to io mgr and other counter fixes. 2014-01-08 10:44:46 -08:00
Marcel Kornacker
c18d0970d7 Changed RuntimeProfile::PrettyPrint() and Coordinator::BackendExecState::GetNodeThroughput()
not to hold locks while they make function calls.

Changed Frontend.assignIds() to use UUID.randomUUID() to generate the query id.
2014-01-08 10:44:46 -08:00
Nong Li
e160e09a85 Fix incorrect use of memcpy llvm codegen intrinsic. 2014-01-08 10:44:44 -08:00
Marcel Kornacker
7725f25ff5 This combines changes related to periodic reporting of plan fragment exec profiles:
- executor takes report callback; passed in by ImpalaServer::FragmentExecState
- the PlanFragmentExecutor invokes profile reporting cb in background thread.
- RuntimeProfile is now thread-safe and has an RuntimeProfile::Update()

Also included:
- a number of bug fixes related to async cancellation of query
  and propagation of errors through PlanFragmentExecutor/Coordinator/ImpalaServer.
- changing COUNTER_SCOPED_TIMER to SCOPED_TIMER
- derived counters: RuntimeProfile now lets you add counters that return a
  value via a function call, which is useful for reporting something like normalized
  ScanNode throughput; retrofitted to ScanNode and all subclasses
- changed coordinator to make cancellation atomic wrt recognition of an error status
  for the overall query.
- Removed InProcessQueryExecutor from data-stream-test.

Added aggregate throughput counters to coordinator:
- all throughput counters are grouped in a sub-profile "AggregateThroughput"
- each scan node gets its own counter
- the value is aggregated across all registered backends which contain that node in
  their plan fragments
2014-01-08 10:44:42 -08:00
Henry Robinson
0fd68e5718 USE stub implementation 2014-01-08 10:44:42 -08:00
Nong Li
81bba16dac Parallel scanners. 2014-01-08 10:44:38 -08:00
Henry Robinson
e5893064b0 Fix build failure 2014-01-08 10:44:37 -08:00
Henry Robinson
fb681fba4e Simple Python shell for Impala 2014-01-08 10:44:37 -08:00
Henry Robinson
c472213eeb Parallel INSERT, sink-per-scan-node plan 2014-01-08 10:44:35 -08:00
Alan Choi
88ae4d748f Fill Disk ID in THdfsFileSplit in the FE 2014-01-08 10:44:33 -08:00
Alexander Behm
ee705e3083 Added timestamp arithmetic expressions. 2014-01-08 10:44:31 -08:00
Alan Choi
f15ef994fb "mvn test" now uses impalad and beeswax api to submit query and fetch, including
insert query.

review issue: 260
2014-01-08 10:44:30 -08:00
Alan Choi
88101bc90e This patch implements the probabilistic counting algorithm as an aggregate
"distinctpc" and "distinctpcsa".

We've gathered statistics on an internal dataset (all columns) which is
part of our regression data. It's roughly 400mb, ~100 columns,
int/bigint/string type.

On Hive, it took roughly 64sec.
On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't),
 we can achieve 24~26sec.

Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3
2014-01-08 10:44:27 -08:00
Alan Choi
cbadb4eac4 When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes
this problem.

review: 162
2014-01-08 10:44:24 -08:00
Alan Choi
41200fc307 Impalad now accept Query.Configuration as execution option
issue: 210
2014-01-08 10:44:22 -08:00
Henry Robinson
4b60df6458 IMP-63 and IMP-140: Update metastore after INSERT query 2014-01-08 10:44:22 -08:00
Lenni Kuff
64058cb9b8 Fix some ImpalaServer bugs due to incorrectly assigning fragment ids
[Submitting on behalf of Marcel]
- fragment ids weren't assigned correctly (they need to be unique across all
nodes on which they're executing)
- some of the execution logic that I checked in yesterday was flawed
2014-01-08 10:44:19 -08:00
Michael Ubell
02d63d8dc3 Trevni file support 2014-01-08 10:44:19 -08:00
Alexander Behm
5a92fee31c Added now() function. 2014-01-08 10:44:19 -08:00
Marcel Kornacker
10bf3e91e3 Cancellation support:
- added DataStreamMgr::Cancel(), which is used to propagate cancellation from the
  coordinator to all (possibly blocked) ExchangeNodes
- all exec nodes now check for cancellation before they do anything that might block for a while
- fixed up logic related to async cancellation

Added support for async query execution via beeswax interface:
- implemented ImpalaServer::query()
- QueryExecState now tracks beeswax's idea of the query state
- ImpalaServer::get_state() now returns the actual state

Fixed handling of ExecNode::Close():
- needs to be called for entire plan tree, regardless of what fails (can't use
  RETURN_IF_ERROR() inside of it)
- needs to be called for every Open() call by coordinator/ImpalaServer
2014-01-08 10:44:18 -08:00
Marcel Kornacker
fb32d40b03 Switching to an asynchronous plan fragment exec interface; this entails:
- making the coordinator asynchronous
- renamed ImpalaBackendService to ImpalaInternalService;
- new class ImpalaServer implements ImpalaService and ImpalaInternalService
- renaming ImpalaInternalService fields to conform to c++ style
- merged impala-service.{cc,h} and backend-service.{cc,h} into impala-server.{cc,h}
- added TStatusCode field to Status.ErrorDetail
- removed ImpalaInternalService.CloseChannel

Also removed JdbcDriverTest.java
2014-01-08 10:44:15 -08:00
Alexander Behm
a1e03b81cd Added BETWEEN and IN predicates. 2014-01-08 10:44:15 -08:00
Kay Ousterhout
ac00649369 Changed SubscriptionId to be a 64-bit integer.
Currently, subscriptions are per-subscriber. However, we don't want
to be stuck with this decision later on, and if subscriptions are
uniquely numbered, a 32-bit integer may not allow as many subscriptions
as we'd like.
2014-01-08 10:44:14 -08:00
Alan Choi
bee2736de7 IMP-35: query without any column reference failed
For example,
  select 1 from alltypessmall;

This is because TDescriptorTAble.slotDescriptors is empty. The fix is to
make it optional.
2012-07-17 17:50:51 -07:00