Commit Graph

14 Commits

Author SHA1 Message Date
Henry Robinson
2f339f2ed8 Add ASL license to all public files 2014-01-08 10:46:32 -08:00
ishaan
ccb020c4a0 Adding copyrights to remaining files. 2014-01-08 10:46:30 -08:00
Nong Li
bc08241ffb IR cross compile fixes for inlined string-value functions. 2014-01-08 10:46:19 -08:00
ishaan
e84cc0a9eb Enable code coverage on release builds. 2014-01-08 10:44:41 -08:00
Michael Ubell
02d63d8dc3 Trevni file support 2014-01-08 10:44:19 -08:00
Lenni Kuff
0da77037e3 Updated Impala performance schema and test vector generation
This change updates the Impala performance schema and test vector generation
techniques. It also migrates the existing benchmark scripts that were Ruby over
to use Python. The changes has a few parts:

1) Conversion of test vector generation and benchmark statement generation from
Ruby to Python. A result of this was also to update the benchmark test vector
and dimension files to be written in CSV format (python doesn't have built-in
YAML support)

2) Standardize on the naming for benchmark tables to (somewhat match Query
tests). In general the form is:
* If file_format=text and compression=none, do not use a        table suffix
* Abbreviate sequence file as (seq) rc file as (rc) etc
* If using BLOCK compression don't append anything to table name, if using
 'record' append 'record'

3) Created a new way to adding new schemas. this is the
benchmark_schema_template.sql file. The generate_benchmark_statements.py script
reads this in and breaks up the sections. The section format is:
====
Data Set Name
---
BASE table name
---
CREATE STATEMENT Template
---
INSERT ... SELECT * format
---
LOAD Base statement
---
LOAD STATEMENT Format

Where BASE Table is a table the other file formats/compression types can be
generated from. This would generally be a local file.

The thinking is that if the files already exist in HDFS then we can just load
the file directly rather than issue an INSERT ... SELECT * statement. The
generate_benchmark_statements.py script has been updated to use this new
template as well as query HDFS for each table to determine how it should be
created. It then outputs an ideal file call load-benchmark-*-generated.sql.
Since this file is geneated dynamically we can remove the old benchmark
statement files.

4) This has been hooked into load-benchmark-data.sh and run_query has been
updated to use the new format as well
2012-07-12 23:12:20 -07:00
Alan Choi
ad073ef1b2 IMP- 78
We want to expose issues in an distributed env locally.

We already have 3 data nodes running locally in the MiniDFS. However, the
planner does not distinguish data nodes
on the same host, even though they're running on a different port. So, we're
effectively only running a single node
all the time.

First, we make the change in FE to identify data location as "host/port" instead
of just "host". Then, in
TQueryExecRequest, we list the host/port that serves the data, instead of just
using "host".

The result is that PlannerTest and QueryTest exposes distributed planning issue.
Plans are still correct when the
number of node is 1 or 2. So, to make all the tests passes, I've forced
Planner/Query test to execute with at most 2
nodes.

To see the faulty plan, we simply have to change the number of node back to 0
(all nodes).
o

We've discussed randomizing the SimpleScheduler but I choose not to do it
because we don't need randomization to
expose the distributed planning issue.

I also discovered that exchange node (BE) does not respect the "limit". I fixed
it.

One of the limit test (QueryTest) is completely unstable. It doesn't really test
much. I removed it.
2012-06-22 14:08:44 -07:00
Lenni Kuff
0e844e7187 Updated make_release script to add flag for controlling whether or not to do PGO build 2012-06-12 17:52:27 -07:00
Michael Ubell
3608b3fb06 RC File rewrite 2012-05-22 20:37:47 -07:00
Lenni Kuff
35951643f5 Fixed benchmark generation scripts and make_release scripts to properly
generate and execute the benchmark queries.

Updated to demove Lzo compression and add coverage of 'DefaultCodec'

Fixed up make_release to more cleanly list queries.
2012-05-17 17:40:00 -07:00
Michael Ubell
62d29ff1c6 Sequence File Scanner 2012-05-01 17:48:24 -07:00
Nong Li
17f7b16da8 Allow runquery to run multiple queries from the command line. 2012-03-01 10:47:34 -08:00
Nong Li
88237350f0 Change the build to allow debug and release builds to coexist. 2012-02-17 18:14:04 -08:00
Nong Li
bf74bc25e3 Some cleanup:
- Fixed issue with SSE file parse.
  - Moved build scripts to impala/bin.  Rebuilding from just BE does not work.
  - Cleanedup a few compiler warnings.
  - Add option to disable automatic counters for profilers.
2011-12-31 06:17:28 -08:00