Commit Graph

323 Commits

Author SHA1 Message Date
Lenni Kuff
1fb72fbc73 IMPALA-156: Support core 'ALTER TABLE' DDL command
This patch adds support for
- ALTER TABLE ADD|REPLACE COLUMNS
- ALTER TABLE DROP COLUMN
- ALTER TABLE ADD/DROP PARTITION
- ALTER TABLE SET FILEFORMAT
- ALTER TABLE SET LOCATION
- ALTER TABLE RENAME
2014-01-08 10:49:14 -08:00
Skye Wanderman-Milne
c5afb11558 Compress serialized RowBatchs 2014-01-08 10:49:13 -08:00
Alan Choi
051c56073a IMPALA-158: query options should be optional 2014-01-08 10:49:06 -08:00
Elliott Clark
0e0c02b6bd Add the ability to Select into HBase table.
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
2014-01-08 10:49:06 -08:00
Alan Choi
991db9001b IMPALA-113 Raise error when default order by limit is exceeded 2014-01-08 10:49:03 -08:00
Marcel Kornacker
0c36c7f327 Partitioned merge aggregation. 2014-01-08 10:48:59 -08:00
Lenni Kuff
ca0d23a844 IMPALA-157: Support CREATE TABLE LIKE DDL 2014-01-08 10:48:55 -08:00
Alex Behm
be03e6c21c IMPALA-138: Error messages for unknown column types are particularly bad. 2014-01-08 10:48:53 -08:00
Nong Li
6e293090e6 Parquet writer.
Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec
2014-01-08 10:48:44 -08:00
Lenni Kuff
0bcb54fcf8 Add GetRuntimeProfile RPC and enable printing runtime profile from impala-shell 2014-01-08 10:48:44 -08:00
Marcel Kornacker
d7bfe6c68d IMPALA-144: partition pruning for arbitrary predicates that are fully bound by partition columns
This makes partition pruning more effective by extending it to predicates that are fully bound by the partition column,
e.g., '<col> IN (1, 2, 3)' will also be used to prune partitions, in addition to equality and binary comparisons.
2014-01-08 10:48:41 -08:00
Lenni Kuff
d57440e87d Allow column comments for CREATE TABLE and DESCRIBE <table> statements 2014-01-08 10:48:37 -08:00
Skye Wanderman-Milne
57c3072188 Add support for reading Avro files compressed using the deflate codec. 2014-01-08 10:48:36 -08:00
Lenni Kuff
9f71374875 IMPALA-102: Add support for CREATE TABLE ... PARTITIONED BY (col1, col2) 2014-01-08 10:48:35 -08:00
Henry Robinson
71e6d81d1b IMP-261: Clean up network address handling 2014-01-08 10:48:33 -08:00
Marcel Kornacker
77f4fc8cf9 Adding memory limits
- new class MemLimit
- new query flag MEM_LIMIT
- implementation of impalad flag mem_limit

Still missing:
- parsing a mem limit spec that contains "M/G", as in: 1.25G
2014-01-08 10:48:33 -08:00
Alan Choi
4b6ce8ecb3 This patch changes the clock to CLOCK_MONOTONIC.
Rdtsc is not accurate, due to changes in cpu frequency. Very often, the time
reported in the profile is even longer than the time reported by the shell.

This patch replaces Rdtcs with CLOCK_MONOTONIC. It is as fast as Rdtsc and
accurate. It is not affected by cpu frequency changes and it is not affected by
user setting the system clock.

Note that the new profile report will always report time, rather than in clock
cycle.  Here's the new profile:

  Averaged Fragment 1:(68.241ms 0.00%)
    completion times: min:69ms  max:69ms  mean: 69ms  stddev:0
    execution rates: min:91.60 KB/sec  max:91.60 KB/sec  mean:91.60 KB/sec
stddev:0.00 /sec
    split sizes:  min: 6.32 KB, max: 6.32 KB, avg: 6.32 KB, stddev: 0.00
     - RowsProduced: 1
    CodeGen:
       - CodegenTime: 566.104us    <--* reporting in microsec instead of
clock cycle
       - CompileTime: 33.202ms
       - LoadTime: 2.671ms
       - ModuleFileSize: 44.61 KB
    DataStreamSender:
       - BytesSent: 16.00 B
       - DataSinkTime: 50.719us
       - SerializeBatchTime: 18.365us
       - ThriftTransmitTime: 145.945us
    AGGREGATION_NODE (id=1):(68.384ms 15.50%)
       - BuildBuckets: 1.02K
       - BuildTime: 13.734us
       - GetResultsTime: 6.650us
       - MemoryUsed: 32.01 KB
       - RowsReturned: 1
       - RowsReturnedRate: 14.00 /sec
    HDFS_SCAN_NODE (id=0):(57.808ms 84.71%)
       - BytesRead: 6.32 KB
       - DelimiterParseTime: 62.370us
       - MaterializeTupleTime: 767ns
       - MemoryUsed: 0.00
       - PerDiskReadThroughput: 9.32 MB/sec
       - RowsReturned: 100
       - RowsReturnedRate: 1.73 K/sec
       - ScanRangesComplete: 4
       - ScannerThreadsInvoluntaryContextSwitches: 0
       - ScannerThreadsReadTime: 662.431us
       - ScannerThreadsSysTime: 0
       - ScannerThreadsTotalWallClockTime: 25ms
       - ScannerThreadsUserTime: 0
       - ScannerThreadsVoluntaryContextSwitches: 4
       - TotalReadThroughput: 0.00 /sec
2014-01-08 10:48:32 -08:00
Lenni Kuff
87d8f79efe Add support for CREATE TABLE ... STORED AS PARQUETFILE 2014-01-08 10:48:32 -08:00
Lenni Kuff
1cd847c856 IMPALA-81: Add support for CREATE/DROP DATABASE/TABLE
This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala
supports creating tables in the metastore stored as text, sequence, and rc file format.
It currently only supports creating unpartitioned tables and tables stored in HDFS.
2014-01-08 10:48:30 -08:00
Marcel Kornacker
c02d25baa8 IMPALA-20: Limit clause in inline view not handled correctly by planner
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
  by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
  by the merging TopN node
2014-01-08 10:48:29 -08:00
Alan Choi
9c11c0ce2d HiveServer2 clean up
This patch adds

1. use boost uuid
2. add unit test for HiveServer2 metadata operation
3. add JDBC metadata unit test
4. implement all remaining HiveServer2: GetFunctions and GetTableTypes
5. remove in-process impala server from fe-support
2014-01-08 10:48:06 -08:00
Skye Wanderman-Milne
8b87099998 IMPALA-2: Support for Avro data files
Adds HdfsAvroScanner, as well as modifies the sequence scanners to be more general.
2014-01-08 10:48:05 -08:00
Nong Li
868a99135a Add network benchmark 2014-01-08 10:47:56 -08:00
Alan Choi
073de3e02e Generate HiveServer2 files from thrift 2014-01-08 10:47:52 -08:00
Marcel Kornacker
63e3cd0279 Adding query option DEBUG_ACTION 2014-01-08 10:47:37 -08:00
Alan Choi
be98df19c8 HiveServer2
This patch implements the HiveServer2  API.

We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.

All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.

HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.

Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.

Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
2014-01-08 10:47:24 -08:00
Henry Robinson
7ba437a52e Code changes to build against thrift 0.9.0 in thirdparty/ 2014-01-08 10:47:22 -08:00
Alan Choi
ff704ce586 IMP-690: impala-shell calls PingImpalaService thrift API to verify
the connected server is an impalad.
2014-01-08 10:47:13 -08:00
Henry Robinson
b7f937577d Add missing thrift file 2014-01-08 10:47:12 -08:00
Henry Robinson
986f3cddf6 Move sparrow/ to statestore/ and remove sparrow namespace 2014-01-08 10:47:12 -08:00
Skye Wanderman-Milne
982747c856 IMP-653: add CURRENT_TIMESTAMP() function as synonym for now() 2014-01-08 10:47:09 -08:00
Marcel Kornacker
bf56c21c1b IMP-618
Adding DEFAULT_ORDER_BY_LIMIT query option.
Also removing deprecated PARTITION_AGG query option.
2014-01-08 10:47:04 -08:00
Michael Ubell
8a5297a526 Add HdfsLzoTextScanner 2014-01-08 10:46:35 -08:00
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
ishaan
ccb020c4a0 Adding copyrights to remaining files. 2014-01-08 10:46:30 -08:00
ishaan
05c65789bb Change Copyrights from 2011 ti 2012 2014-01-08 10:46:29 -08:00
Henry Robinson
dd0e9f1180 IMP-265: State-store subscriber recovery mode 2014-01-08 10:46:25 -08:00
Michael Ubell
c1852e2dcf Add from_unixtime and unix_timestamp(string, string) 2014-01-08 10:46:22 -08:00
Marcel Kornacker
ea050a43ad Switching over backend runtime structures to new planner.
Added container-util.h
2014-01-08 10:46:20 -08:00
Nong Li
08968c1d07 Performance improvements for aggregation and hash join nodes with codegen. 2014-01-08 10:46:19 -08:00
Alan Choi
0ce8a044e3 Disable RC/Trevni (with option to allow it); remove file_buffer_size
IMP-336: remove file_buffer_size query options
Add "allow_unsupported_formats" query options to allow RC/Trevni in our test; disabled by
default
2014-01-08 10:46:02 -08:00
Alan Choi
dbf1074066 Fragments report errors to coordinator.
Enable multi-node DataErrorTest (IMP-250 resolved)
Check fragment/coord errors in DataErrorTest
2014-01-08 10:46:00 -08:00
Henry Robinson
91c3b979ca IMP-370: SHOW TABLES IN support and IMP-363: SHOW DATABASES
Change-Id: Ic41c4b0767a0480f0a18e1e985f25de3bc2ca947
2014-01-08 10:45:59 -08:00
Henry Robinson
540673763f Add session key handling to ThriftServer, and session support to the frontend 2014-01-08 10:45:59 -08:00
Marcel Kornacker
927f4c52f8 Adding the remaining pieces of functionality to the new planner:
- HBaseScanNode.getScanRangeLocations()
- new planner creates INSERT plans
- Frontend.createExecRequest2(), which calls NewPlanner.
2014-01-08 10:45:58 -08:00
Michael Ubell
48c454d319 IMP-267 Add version() function. 2014-01-08 10:45:57 -08:00
Nong Li
a9ff7323f2 Fix our string to numeric casts to use StringParser instead of lexical_cast. 2014-01-08 10:45:57 -08:00
Marcel Kornacker
5984c0be52 First cut of partitioned plan generation:
- created new class PlanFragment, which encapsulates everything having to do with a single
  plan fragment, including its partition, output exprs, destination node, etc.
- created new class DataPartition
- explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints
- Adding IdGenerator class.
- moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for
  PlanFragment.getExplainString()
- Changed planner interface to return scan ranges with a complete list of server locations,
  instead of making a server assignment.

Also included: cleaned up AggregateInfo:
- the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation.
- moved analysis functionality into AggregateInfo

Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).
2014-01-08 10:45:56 -08:00
Nong Li
2ea454fcda Updated coordinator to output summary profiles. 2014-01-08 10:45:14 -08:00
Michael Ubell
ad46b98366 Add Kerberos authentication. 2014-01-08 10:45:10 -08:00