This version is a custom build of Hadoop cdh4b2, including:
1. The BlockReader direct read API
2. A forward-port of MiniHadoopClusterManager from cdh3
3. The local-read security check has been removed
4. The sizes of the BlockReaderLocal slow-read / checksum buffers have
been increased to handle reads up to 1MB at a time.
It is built from github.sf.cloudera.com/Henry/hadoop-common, branch
cdh4-23-direct-read, commit 8f8c63
- adding flag --backends="host:port,host:port,..." , which TestEnv uses to create clients for ImpalaBackendServices
running on those nodes; this is just a hack in order to be able to use runquery for multi-node execution
- impalad-main.cc: main() of impala daemon, which will export both ImpalaService and
ImpalaBackendService (but at the moment only does the latter; everything related to ImpalaService is commented out)
- com.cloudera.impala.service.Frontend: API to the frontend functionality; invoked by impalad via jni; ignore for now
- std::string replacement for hdfs-text-scan-node boundary strings
- remove bzero in RowBatch::AddTuple
Improved the benchmark running script to compare to previous results if available.
- changing block assignment to plan fragments if numPartitions != all
- adding class HdfsFsCache, which caches Hdfs connections for the lifetime of the process.
This fixes a bug in the hdfs text scan node, which used to obtain (inadvertently) shared connections
via hdfsConnect() and then end up closing them process-wide via a call to hdfsDisconnect().
- adapting tests to eliminate random output
Also fixed handling of empty tables (or empty scan ranges) (jira IMP-28)
* new class ExchangeNode: ExecNode for incoming data stream
* new class Coordinator: coordinates execution of all plan fragments
* reorganized classes PlanExecutor and QueryExecutor
* renamed PlanExecutorAdaptor to JniCoordinator
* backend-service: creates thrift server that exports ImpalaBackendService
* added --num_backends flag for runquery
mvn -Pload_testdata ... from fe/ will run a three-node,
single-process HDFS cluster loaded with data from
AllTypesAgg in /impala-dist-test.
A Hive table over this data is created as AllTypesAggMini.
- Fixed issue with SSE file parse.
- Moved build scripts to impala/bin. Rebuilding from just BE does not work.
- Cleanedup a few compiler warnings.
- Add option to disable automatic counters for profilers.
- DataStreamSender: sender side (1:n) for a single stream
- DataStreamMgr: receiver side; singleton class for all incoming streams active at a node
Changed ExecNode::GetNext() to return eos indicator explicitly; this allows us to pass incoming TRowBatches (which may not be full) up w/o copying the data.
Added data-stream-test.
1) partitioning of scans along hdfs file splits (ScanNode.getScanParams());
2) rudimentary distributed plan generator, which adds a merge phase to what are essentially single-node plans and partitions the plan based on the partitioning of the leftmost scan in the plan tree
There is no test coverage for 1) yet, because with the current hdfs setup (all paths point to the local fs) there are no file splits - every file is a single block. I will change our test setup in a forthcoming CL to use the hadoop minicluster environment, which should allow us to create files with splits.
Adding MemPool::GetOffset()/GetDataPtr().
Fixed planner bug (wouldn't generate TScanParams for more than one scan).
Fixed bug in java test harness (which made it ignore the fact that the join tests have been broken for a while).
cases in executor code.
Adding MemPool::Release(), which allows passing data between pools.
Changing the semantics of GetNext() not to overwrite tuple data even beyond the next call;
the previous semantics (data only good until the next call) would have required joins
to create copies.
Adding mem-pool-test.