Commit Graph

12 Commits

Author SHA1 Message Date
Marcel Kornacker
7725f25ff5 This combines changes related to periodic reporting of plan fragment exec profiles:
- executor takes report callback; passed in by ImpalaServer::FragmentExecState
- the PlanFragmentExecutor invokes profile reporting cb in background thread.
- RuntimeProfile is now thread-safe and has an RuntimeProfile::Update()

Also included:
- a number of bug fixes related to async cancellation of query
  and propagation of errors through PlanFragmentExecutor/Coordinator/ImpalaServer.
- changing COUNTER_SCOPED_TIMER to SCOPED_TIMER
- derived counters: RuntimeProfile now lets you add counters that return a
  value via a function call, which is useful for reporting something like normalized
  ScanNode throughput; retrofitted to ScanNode and all subclasses
- changed coordinator to make cancellation atomic wrt recognition of an error status
  for the overall query.
- Removed InProcessQueryExecutor from data-stream-test.

Added aggregate throughput counters to coordinator:
- all throughput counters are grouped in a sub-profile "AggregateThroughput"
- each scan node gets its own counter
- the value is aggregated across all registered backends which contain that node in
  their plan fragments
2014-01-08 10:44:42 -08:00
Nong Li
4c9c82910a Text parser fix for columns off end. 2014-01-08 10:44:40 -08:00
Nong Li
4d0319d32b Fix null string parsing. 2014-01-08 10:44:40 -08:00
Alan Choi
dd1537d116 IMP-132: collect unique agg expr 2014-01-08 10:44:39 -08:00
Nong Li
81bba16dac Parallel scanners. 2014-01-08 10:44:38 -08:00
Alan Choi
9ac664f1f7 Fix IMP-239: text_converter_->WriteSlot returns true when it's ok
QueryTest and HBaseQueryTest set AbortOnError to false except the expected error case
2014-01-08 10:44:37 -08:00
Henry Robinson
c472213eeb Parallel INSERT, sink-per-scan-node plan 2014-01-08 10:44:35 -08:00
Alexander Behm
ee705e3083 Added timestamp arithmetic expressions. 2014-01-08 10:44:31 -08:00
Alan Choi
f15ef994fb "mvn test" now uses impalad and beeswax api to submit query and fetch, including
insert query.

review issue: 260
2014-01-08 10:44:30 -08:00
Alan Choi
88101bc90e This patch implements the probabilistic counting algorithm as an aggregate
"distinctpc" and "distinctpcsa".

We've gathered statistics on an internal dataset (all columns) which is
part of our regression data. It's roughly 400mb, ~100 columns,
int/bigint/string type.

On Hive, it took roughly 64sec.
On this Impala implementation, it took 35sec. By adding inline to hash-util.h (which we don't),
 we can achieve 24~26sec.

Change-Id: Ibcba3c9512b49e8b9eb0c2fec59dfd27f14f84c3
2014-01-08 10:44:27 -08:00
Alan Choi
cbadb4eac4 When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes
this problem.

review: 162
2014-01-08 10:44:24 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00