impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
ishaan	ccb020c4a0	Adding copyrights to remaining files.	2014-01-08 10:46:30 -08:00
Nong Li	bc08241ffb	IR cross compile fixes for inlined string-value functions.	2014-01-08 10:46:19 -08:00
ishaan	e84cc0a9eb	Enable code coverage on release builds.	2014-01-08 10:44:41 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Lenni Kuff	0da77037e3	Updated Impala performance schema and test vector generation This change updates the Impala performance schema and test vector generation techniques. It also migrates the existing benchmark scripts that were Ruby over to use Python. The changes has a few parts: 1) Conversion of test vector generation and benchmark statement generation from Ruby to Python. A result of this was also to update the benchmark test vector and dimension files to be written in CSV format (python doesn't have built-in YAML support) 2) Standardize on the naming for benchmark tables to (somewhat match Query tests). In general the form is: * If file_format=text and compression=none, do not use a table suffix * Abbreviate sequence file as (seq) rc file as (rc) etc * If using BLOCK compression don't append anything to table name, if using 'record' append 'record' 3) Created a new way to adding new schemas. this is the benchmark_schema_template.sql file. The generate_benchmark_statements.py script reads this in and breaks up the sections. The section format is: ==== Data Set Name --- BASE table name --- CREATE STATEMENT Template --- INSERT ... SELECT * format --- LOAD Base statement --- LOAD STATEMENT Format Where BASE Table is a table the other file formats/compression types can be generated from. This would generally be a local file. The thinking is that if the files already exist in HDFS then we can just load the file directly rather than issue an INSERT ... SELECT * statement. The generate_benchmark_statements.py script has been updated to use this new template as well as query HDFS for each table to determine how it should be created. It then outputs an ideal file call load-benchmark-*-generated.sql. Since this file is geneated dynamically we can remove the old benchmark statement files. 4) This has been hooked into load-benchmark-data.sh and run_query has been updated to use the new format as well	2012-07-12 23:12:20 -07:00
Alan Choi	ad073ef1b2	IMP- 78 We want to expose issues in an distributed env locally. We already have 3 data nodes running locally in the MiniDFS. However, the planner does not distinguish data nodes on the same host, even though they're running on a different port. So, we're effectively only running a single node all the time. First, we make the change in FE to identify data location as "host/port" instead of just "host". Then, in TQueryExecRequest, we list the host/port that serves the data, instead of just using "host". The result is that PlannerTest and QueryTest exposes distributed planning issue. Plans are still correct when the number of node is 1 or 2. So, to make all the tests passes, I've forced Planner/Query test to execute with at most 2 nodes. To see the faulty plan, we simply have to change the number of node back to 0 (all nodes). o We've discussed randomizing the SimpleScheduler but I choose not to do it because we don't need randomization to expose the distributed planning issue. I also discovered that exchange node (BE) does not respect the "limit". I fixed it. One of the limit test (QueryTest) is completely unstable. It doesn't really test much. I removed it.	2012-06-22 14:08:44 -07:00
Lenni Kuff	0e844e7187	Updated make_release script to add flag for controlling whether or not to do PGO build	2012-06-12 17:52:27 -07:00
Michael Ubell	3608b3fb06	RC File rewrite	2012-05-22 20:37:47 -07:00
Lenni Kuff	35951643f5	Fixed benchmark generation scripts and make_release scripts to properly generate and execute the benchmark queries. Updated to demove Lzo compression and add coverage of 'DefaultCodec' Fixed up make_release to more cleanly list queries.	2012-05-17 17:40:00 -07:00
Michael Ubell	62d29ff1c6	Sequence File Scanner	2012-05-01 17:48:24 -07:00
Nong Li	17f7b16da8	Allow runquery to run multiple queries from the command line.	2012-03-01 10:47:34 -08:00
Nong Li	88237350f0	Change the build to allow debug and release builds to coexist.	2012-02-17 18:14:04 -08:00
Nong Li	bf74bc25e3	Some cleanup: - Fixed issue with SSE file parse. - Moved build scripts to impala/bin. Rebuilding from just BE does not work. - Cleanedup a few compiler warnings. - Add option to disable automatic counters for profilers.	2011-12-31 06:17:28 -08:00

14 Commits