This patch adds support for the following SQL constructs
- Unary + operator
- The ALL keyword, in SELECT ALL and SELECT aggregate_func(ALL *)
- REAL and INTEGER as type synonyms for DOUBLE and INT respectively
- The AS keyword after a table spec. e.g. SELECT * FROM tbl AS t0
This makes partition pruning more effective by extending it to predicates that are fully bound by the partition column,
e.g., '<col> IN (1, 2, 3)' will also be used to prune partitions, in addition to equality and binary comparisons.
- new class MemLimit
- new query flag MEM_LIMIT
- implementation of impalad flag mem_limit
Still missing:
- parsing a mem limit spec that contains "M/G", as in: 1.25G
This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala
supports creating tables in the metastore stored as text, sequence, and rc file format.
It currently only supports creating unpartitioned tables and tables stored in HDFS.
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
by the merging TopN node
This patch adds
1. use boost uuid
2. add unit test for HiveServer2 metadata operation
3. add JDBC metadata unit test
4. implement all remaining HiveServer2: GetFunctions and GetTableTypes
5. remove in-process impala server from fe-support
This change modifies the Catalog to create HiveMetaStoreClient connections on a
per-request basis. This resolves an issue when Impala is talking over Thrift to a Hive
Metastore Service. The Hive Thrift client is not thread safe so concurrent metadata loads
would fail.
To reduce the overhead associated with creating a new connection each time, a simple
connection pool was added. The pool is initialized with a fixed number of connections
and new connections are added on an as-needed basis.
This patch implements the HiveServer2 API.
We have tested it with Lenni's patch against the tpch workload. It has also
been tested manually against Hive's beeline with queries and metadata operations.
All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax
code is refactored to impala-beeswax-server.cc.
HiveServer2 has a few more metadata operations. These operations go through
impala-hs2-server to ddl-executor and then to FE. The logics are implemented in
fe/src/main/java/com/cloudera/impala/service/MetadataOp.java.
Because of the Thrift union issue, I have to modify the generated c++ file.
Therefore, all the HiveServer2 thrift generated c++ code are checked into
be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove
these files.
Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881
- we're now pre-computing and caching the result of HdfsTable.getBlockMetadata() on a per-partition basis
- to make the cache more compact, we're collecting pools of unique strings:
file names are collected per partition and ip/port strings are collected per table