Commit Graph

2565 Commits

Author SHA1 Message Date
Henry Robinson
a0d7b3731e Workaround for Hive 0.8.1 bug: load hive-builtins jar into HDFS
TODO: Remove once we understand how to make Hive look in the local FS
for the jar.
2012-02-21 14:32:54 -08:00
Henry Robinson
6260989cd8 Upgrade to hbase-0.92.0-cdh4b1 2012-02-21 14:32:52 -08:00
Henry Robinson
f2a602ea9d Update impala-config.sh and FindHDFS.cmake to adjust to new hadoop
layout. Add core-site.xml to $HADOOP_CONF_DIR for direct read
configuration. Update fe pom.xml to point to new Hadoop.
2012-02-21 14:32:51 -08:00
Henry Robinson
09dd1c2ecd Upgrade Hive to hive-0.8.0-cdh4b1 2012-02-20 23:21:12 -08:00
Henry Robinson
518ebece37 Add hadoop-0.23.0-cdh4b2-SNAPSHOT
This version is a custom build of Hadoop cdh4b2, including:

1. The BlockReader direct read API
2. A forward-port of MiniHadoopClusterManager from cdh3
3. The local-read security check has been removed
4. The sizes of the BlockReaderLocal slow-read / checksum buffers have
been increased to handle reads up to 1MB at a time.

It is built from github.sf.cloudera.com/Henry/hadoop-common, branch
cdh4-23-direct-read, commit 8f8c63
2012-02-20 23:21:08 -08:00
Henry Robinson
36129bca76 Remove Hadoop-0.20.2 2012-02-20 23:21:06 -08:00
Nong Li
88237350f0 Change the build to allow debug and release builds to coexist. 2012-02-17 18:14:04 -08:00
Nong Li
e120233fb4 Update HdfsTestScanNode to be able to handle scan ranges. 2012-02-14 16:18:21 -08:00
Nong Li
a078d0abcf Fix build issue with gflags. 2012-02-10 13:24:33 -08:00
Marcel Kornacker
6a57a1d879 Enabling multi-node distributed execution:
- adding flag --backends="host:port,host:port,..." , which TestEnv uses to create clients for ImpalaBackendServices
  running on those nodes; this is just a hack in order to be able to use runquery for multi-node execution
- impalad-main.cc: main() of impala daemon, which will export both ImpalaService and
  ImpalaBackendService (but at the moment only does the latter; everything related to ImpalaService is commented out)
- com.cloudera.impala.service.Frontend: API to the frontend functionality; invoked by impalad via jni; ignore for now
2012-02-10 10:53:40 -08:00
Marcel Kornacker
aec4f13dda Changing the conversion of TRowBatch to RowBatch to make a copy of the tuple data
and fixing a leak in ExchangeNode::GetNext().
2012-02-08 11:14:55 -08:00
Nong Li
797bd1ee58 Fix a couple of hotspots resulting in ~9% perf improvements on the hive benchmark.
- std::string replacement for hdfs-text-scan-node boundary strings
   - remove bzero in RowBatch::AddTuple
Improved the benchmark running script to compare to previous results if available.
2012-01-26 08:40:35 -08:00
Marcel Kornacker
17aa46eb82 fixing Jenkins build failure 2012-02-07 17:13:54 -08:00
Marcel Kornacker
a607c6c69f Partitioned parallel execution in QueryTest:
- changing block assignment to plan fragments if numPartitions != all
- adding class HdfsFsCache, which caches Hdfs connections for the lifetime of the process.
  This fixes a bug in the hdfs text scan node, which used to obtain (inadvertently) shared connections
  via hdfsConnect() and then end up closing them process-wide via a call to hdfsDisconnect().
- adapting tests to eliminate random output

Also fixed handling of empty tables (or empty scan ranges) (jira IMP-28)
2012-02-07 16:05:57 -08:00
Michael Ubell
52d95e90fc Fixe typeo in impala_functions.py 2012-02-06 16:23:12 -08:00
Michael Ubell
352deb82aa Added the string functions "reverse" "strleft" and "strright"
Fixed some comments in gen_opcodes.py
changed to make -j
2012-02-06 15:55:40 -08:00
Marcel Kornacker
ff7e268c6f disabling tcmalloc in libbackend.so for more meaningful stack traces 2012-02-01 14:30:09 -08:00
Marcel Kornacker
38b6d6286e Added support for single-process distributed query execution:
* new class ExchangeNode: ExecNode for incoming data stream
* new class Coordinator: coordinates execution of all plan fragments
* reorganized classes PlanExecutor and QueryExecutor
* renamed PlanExecutorAdaptor to JniCoordinator
* backend-service: creates thrift server that exports ImpalaBackendService
* added --num_backends flag for runquery
2012-02-01 12:06:55 -08:00
Henry Robinson
32e230b256 Shell scripts to start, load and kill a mini dfs cluster.
mvn -Pload_testdata ... from fe/ will run a three-node,
single-process HDFS cluster loaded with data from
AllTypesAgg in /impala-dist-test.

A Hive table over this data is created as AllTypesAggMini.
2012-01-30 17:32:12 -08:00
Nong Li
783480d6bf - Cleaned up some TODOs.
- Fix tuple template.  Fixed strcmp
- atoi/atof handle overflows.
- added likely/unlikely compiler directive
- Runquery now reports mean/stddev for profile runs
- removed quoted char
2012-01-18 23:08:29 -08:00
Henry Robinson
e3ae3a5823 Remove CMakeCache.txt from root when cleaning with ./buildall.sh 2012-01-16 13:59:40 -08:00
Nong Li
bf74bc25e3 Some cleanup:
- Fixed issue with SSE file parse.
  - Moved build scripts to impala/bin.  Rebuilding from just BE does not work.
  - Cleanedup a few compiler warnings.
  - Add option to disable automatic counters for profilers.
2011-12-31 06:17:28 -08:00
Nong Li
94db70c9fd Fix build. Dependencies don't propagate right on first build. 2011-12-30 21:28:18 -08:00
Nong Li
c84fec38d3 - Move thrift out of FE src and into impala/common
- Thrift files now build using cmake instead of mvn
- Added cmake build to impala/ which drives the build process
2011-12-30 19:35:20 -08:00
Marcel Kornacker
c056445612 Added m:n data streams:
- DataStreamSender: sender side (1:n) for a single stream
- DataStreamMgr: receiver side; singleton class for all incoming streams active at a node

Changed ExecNode::GetNext() to return eos indicator explicitly; this allows us to pass incoming TRowBatches (which may not be full) up w/o copying the data.

Added data-stream-test.
2012-01-10 18:00:20 -08:00
Henry Robinson
01ff5b842f Add length, lower and upper to string functions 2012-01-09 17:26:41 -08:00
Nong Li
2880f54d35 Perf Work:
- Added perf counter utility
  - Added google perf tools
  - Added html data set
  - Added escape char test
  - Initial perf tuning
2011-12-30 00:26:27 -08:00
Nong Li
4930632c8d Remove BE testing. Not working on jenkins build machines. 2011-12-19 23:29:59 -08:00
Marcel Kornacker
482e83a396 Removing testdata submodule, we need to have large data files hosted outside of git. 2011-12-16 13:16:28 -08:00
Carl Steinbach
2378417376 IMP-18. Add RCFile support to Impala backend 2011-12-15 15:50:54 -08:00
Alexander Behm
f51ed6a47c Fixed bug in planner tests due to absolute path. 2011-12-12 15:37:25 -08:00
Alexander Behm
c7f7382c31 Added planner changes and data sinks for INSERT statements. 2011-12-12 15:14:49 -08:00
Marcel Kornacker
83d0d90943 This covers:
1) partitioning of scans along hdfs file splits (ScanNode.getScanParams());
2) rudimentary distributed plan generator, which adds a merge phase to what are essentially single-node plans and partitions the plan based on the partitioning of the leftmost scan in the plan tree

There is no test coverage for 1) yet, because with the current hdfs setup (all paths point to the local fs) there are no file splits - every file is a single block. I will change our test setup in a forthcoming CL to use the hadoop minicluster environment, which should allow us to create files with splits.
2011-12-08 15:12:58 -08:00
Nong Li
5ae17ad5f9 Adding grep data. 2011-12-06 03:18:30 -08:00
Carl Steinbach
44fa92a639 IMP-29. Remove mock libhdfs implementation 2011-12-06 13:25:18 -08:00
Nong Li
ea9d4b94f4 Adding opcode cmakelists.txt 2011-11-20 14:27:27 -08:00
Nong Li
b1833d4de8 Implmented opcode registry. Added substr() and pi() functions. Added backend testing to buildall.sh 2011-11-20 13:44:41 -08:00
Marcel Kornacker
a8acd52281 Defining data serialization format (Data.thrift).
Adding MemPool::GetOffset()/GetDataPtr().
Fixed planner bug (wouldn't generate TScanParams for more than one scan).
Fixed bug in java test harness (which made it ignore the fact that the join tests have been broken for a while).
2011-11-22 15:42:32 -08:00
Alexander Behm
62177d4d9c Added parser and analyzer support for INSERT statements. 2011-11-22 15:34:04 -08:00
Nong Li
27b29e4568 Updated HDFS scan node FE and BE to no longer pass all of the partition key values via thrift. Instead, the FE passes a regex which the BE uses to extract the values. 2011-11-16 15:37:23 -08:00
Nong Li
6eda6d19c6 Implemented TopN. 2011-11-06 17:03:33 -08:00
Nong Li
bfc9824b73 Added doxygen ("build docs") to backend. 2011-11-06 16:27:29 -08:00
Marcel Kornacker
0914fedea9 Defining Impala backend service, which is exported by backend processes to service plan
fragment execution requests.
Changing thrift plan-related structs to pull out runtime parameters in preparation for parallel execution.
2011-11-02 14:47:32 -07:00
Nong Li
84e915bb6e Added support for string aggregation. 2011-11-01 15:36:45 -07:00
Carl Steinbach
3b7d5e3980 IMP-26. Add cpplint to the build 2011-10-20 14:22:06 -07:00
Marcel Kornacker
33166839c7 Changing QueryTests to run queries with different batch sizes, in order to hit more corner
cases in executor code.
Adding MemPool::Release(), which allows passing data between pools.
Changing the semantics of GetNext() not to overwrite tuple data even beyond the next call;
the previous semantics (data only good until the next call) would have required joins
to create copies.
Adding mem-pool-test.
2011-10-17 05:10:21 -07:00
Marcel Kornacker
d5708b272f adding backend implementation for LIMIT clause 2011-09-29 22:34:46 -07:00
Carl Steinbach
b49efb3a87 IMP-25. Add project configuration file for Astyle source code formatter 2011-09-29 15:30:21 -07:00
Marcel Kornacker
0827146a2b adding outer joins plus new tests 2011-09-28 09:02:07 -07:00
Carl Steinbach
6e2c757c5c IMP-23. Generate Cscope index file during build 2011-09-19 11:49:27 -07:00