This patch adds support for
- ALTER TABLE ADD|REPLACE COLUMNS
- ALTER TABLE DROP COLUMN
- ALTER TABLE ADD/DROP PARTITION
- ALTER TABLE SET FILEFORMAT
- ALTER TABLE SET LOCATION
- ALTER TABLE RENAME
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
This makes partition pruning more effective by extending it to predicates that are fully bound by the partition column,
e.g., '<col> IN (1, 2, 3)' will also be used to prune partitions, in addition to equality and binary comparisons.
- new class MemLimit
- new query flag MEM_LIMIT
- implementation of impalad flag mem_limit
Still missing:
- parsing a mem limit spec that contains "M/G", as in: 1.25G
Rdtsc is not accurate, due to changes in cpu frequency. Very often, the time
reported in the profile is even longer than the time reported by the shell.
This patch replaces Rdtcs with CLOCK_MONOTONIC. It is as fast as Rdtsc and
accurate. It is not affected by cpu frequency changes and it is not affected by
user setting the system clock.
Note that the new profile report will always report time, rather than in clock
cycle. Here's the new profile:
Averaged Fragment 1:(68.241ms 0.00%)
completion times: min:69ms max:69ms mean: 69ms stddev:0
execution rates: min:91.60 KB/sec max:91.60 KB/sec mean:91.60 KB/sec
stddev:0.00 /sec
split sizes: min: 6.32 KB, max: 6.32 KB, avg: 6.32 KB, stddev: 0.00
- RowsProduced: 1
CodeGen:
- CodegenTime: 566.104us <--* reporting in microsec instead of
clock cycle
- CompileTime: 33.202ms
- LoadTime: 2.671ms
- ModuleFileSize: 44.61 KB
DataStreamSender:
- BytesSent: 16.00 B
- DataSinkTime: 50.719us
- SerializeBatchTime: 18.365us
- ThriftTransmitTime: 145.945us
AGGREGATION_NODE (id=1):(68.384ms 15.50%)
- BuildBuckets: 1.02K
- BuildTime: 13.734us
- GetResultsTime: 6.650us
- MemoryUsed: 32.01 KB
- RowsReturned: 1
- RowsReturnedRate: 14.00 /sec
HDFS_SCAN_NODE (id=0):(57.808ms 84.71%)
- BytesRead: 6.32 KB
- DelimiterParseTime: 62.370us
- MaterializeTupleTime: 767ns
- MemoryUsed: 0.00
- PerDiskReadThroughput: 9.32 MB/sec
- RowsReturned: 100
- RowsReturnedRate: 1.73 K/sec
- ScanRangesComplete: 4
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsReadTime: 662.431us
- ScannerThreadsSysTime: 0
- ScannerThreadsTotalWallClockTime: 25ms
- ScannerThreadsUserTime: 0
- ScannerThreadsVoluntaryContextSwitches: 4
- TotalReadThroughput: 0.00 /sec
This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala
supports creating tables in the metastore stored as text, sequence, and rc file format.
It currently only supports creating unpartitioned tables and tables stored in HDFS.
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
by the merging TopN node
This patch adds
1. use boost uuid
2. add unit test for HiveServer2 metadata operation
3. add JDBC metadata unit test
4. implement all remaining HiveServer2: GetFunctions and GetTableTypes
5. remove in-process impala server from fe-support
This patch implements the HiveServer2 API.
We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.
All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.
HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.
Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.
Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
- created new class PlanFragment, which encapsulates everything having to do with a single
plan fragment, including its partition, output exprs, destination node, etc.
- created new class DataPartition
- explicit classes for fragment and plan node ids, to avoid getting them mixed up, which is easy to do with ints
- Adding IdGenerator class.
- moved PlanNode.ExplainPlanLevel to Types.thrift, so it can also be used for
PlanFragment.getExplainString()
- Changed planner interface to return scan ranges with a complete list of server locations,
instead of making a server assignment.
Also included: cleaned up AggregateInfo:
- the 2nd phase of a DISTINCT aggregation is now captured separately from a merge aggregation.
- moved analysis functionality into AggregateInfo
Removing broken test cases from workload functional-planner (they're being handled correctly in functional-newplanner).