Commit Graph

43 Commits

Author SHA1 Message Date
Skye Wanderman-Milne
9147cd7518 IMPALA-525: Adjust IO buffer size based on read length and other memory fixes
We were previously wasting memory by always reading into 8MB IO
buffers, even when the data read was much less than 8MB. With this
patch, the IO manager picks a buffer size closer to the actual amount
being read (we don't use the exact size so we can continue to recycle
buffers). The minimum IO buffer size is determined via the
--min_buffer_size flag, and the max IO buffer size via the --read_size
flag.

This technique also helps with IMPALA-652, since short columns will
not use as much memory as before (we will not use considerably more
memory than the size of the table).

This patch also changes StringBuffer to use a doubling strategy so it
doesn't end up allocating many large unused buffers, and has the
scanner context use the requested length as the sync read size if it's
larger than the size produced by read_past_size_cb(). These changes
help prevent the boundary buffer in the scanner context from
allocating excess memory.

Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50
Reviewed-on: http://gerrit.ent.cloudera.com:8080/938
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:01 -08:00
Lenni Kuff
92829b8400 IMPALA-587: Support implicit hbase column mapping keys
The Hive HBase spec specifies that the key column mapping can either be
defined explicitly (using the :key syntax) or left out completely in
which case a mapping to the first table column is implied. This change
updates Impala to support implicit key mappings and also adds some
checks in our ALTER TABLE DDL to unsure we cannot get into this state by
dropping a column from an Hbase table (a similar restriction that Hive
puts in place)

Change-Id: I920d642261659ee3e881da2553ffe83300923af8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/554
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:14 -08:00
Lenni Kuff
f5c9e4d075 Fix comment location in schema generation script
Our data generation doesn't appear to fully support comments in the
base table name section. This fixes the data generation, but we should
follow on by improving our comment support in the framework.

Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229
Reviewed-on: http://gerrit.ent.cloudera.com:8080/465
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:47 -08:00
Lenni Kuff
8e2a313673 IMPALA-590: Impala should more gracefully fail when loading HBase tables with complex types
Change-Id: Ifc3338ee1339ff0544ed14066824f1aa2d9d7c25
Reviewed-on: http://gerrit.ent.cloudera.com:8080/457
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:47 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alex Behm
9a201645cd IMPALA-496: Fix escaping of field delimiter and escape character in inserts
Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/163
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:09 -08:00
Alex Behm
f0e2d539fc IMPALA-495: Views Sometimes Not Utilizing Partition Pruning.
Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:04 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Alan Choi
254ee6ef89 IMPALA-434 Support binary hbase encoding 2014-01-08 10:51:18 -08:00
Alex Behm
9ff09cd3f4 IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL 2014-01-08 10:50:28 -08:00
Alex Behm
558590140c IMPALA-238: Problems inserting into tables with TIMESTAMP partition columns ... 2014-01-08 10:50:05 -08:00
Henry Robinson
018028f8e3 IMPALA-269: Throw exception if serde unrecognised 2014-01-08 10:50:01 -08:00
Lenni Kuff
c74b7e41dd Enable insert tests to run against parquet 2014-01-08 10:49:47 -08:00
Lenni Kuff
cba9cd00dd Fix full data load build break due to constructing incorrect HDFS paths 2014-01-08 10:49:34 -08:00
Lenni Kuff
558d5ce755 Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists 2014-01-08 10:49:28 -08:00
Elliott Clark
0e0c02b6bd Add the ability to Select into HBase table.
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
2014-01-08 10:49:06 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
Elliott Clark
ade4453a17 Allow impala hbase serdproperties to have newlines 2014-01-08 10:48:54 -08:00
Alex Behm
be03e6c21c IMPALA-138: Error messages for unknown column types are particularly bad. 2014-01-08 10:48:53 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
Lenni Kuff
d57440e87d Allow column comments for CREATE TABLE and DESCRIBE <table> statements 2014-01-08 10:48:37 -08:00
ishaan
f1d37646ed Fix creation of the insert_string_partitioned_table 2014-01-08 10:48:20 -08:00
Henry Robinson
222d15c6ca IMPALA-72: String partition keys should be URL encoded 2014-01-08 10:48:20 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
bed633c1ae Extract config/metastore creation from buildall + script for loading warehouse snapshot 2014-01-08 10:46:53 -08:00
Lenni Kuff
1e25c98fb4 Test data loading framework improvements
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
2014-01-08 10:46:49 -08:00
Nong Li
adf36b81f9 Fix data errors test. 2014-01-08 10:46:45 -08:00
Nong Li
34879a4ddc Fix IMP-297 2014-01-08 10:46:44 -08:00
Michael Ubell
7536510b69 IMP-258 Test writing nulls. 2014-01-08 10:46:31 -08:00
Michael Ubell
37aaf06f79 IMP-390 Get rid of test dependencies on InProcessQE and Runquery 2014-01-08 10:46:18 -08:00
Michael Ubell
0c4f025a5e Fix loading of nulltable data, remove loading functional-planner data 2014-01-08 10:45:58 -08:00
Michael Ubell
bf57ae27a5 IMP-291 Read sequence file to next sync mark when; ragged columns 2014-01-08 10:45:57 -08:00
Michael Ubell
5f951ffc4a Handle missing columns at the end of a row 2014-01-08 10:45:11 -08:00
Henry Robinson
e7348a209b IMP-232: Parallel INSERT OVERWRITE 2014-01-08 10:45:04 -08:00
Nong Li
7c411da86c Fixed schema template. 2014-01-08 10:44:41 -08:00
Nong Li
5b2621a401 Fix null table creation to workaround hive issue. 2014-01-08 10:44:41 -08:00
Nong Li
4c9c82910a Text parser fix for columns off end. 2014-01-08 10:44:40 -08:00
Nong Li
4d0319d32b Fix null string parsing. 2014-01-08 10:44:40 -08:00
Lenni Kuff
6e07e0b8d8 Added support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements during data loading
Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading
workflow. This allows for capturing simple table stats such as number of rows, number of
partitions, and table size in bytes. These are stored into a new mysql database with the same
name as the metastore except with a '_Stats' suffix. If using Derby a new database results are
stored in a new derby database.
2014-01-08 10:44:34 -08:00
Alan Choi
cbadb4eac4 When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes
this problem.

review: 162
2014-01-08 10:44:24 -08:00
Michael Ubell
02d63d8dc3 Trevni file support 2014-01-08 10:44:19 -08:00
Lenni Kuff
84d91fca4f Fix sequence file data loading for the alltypesmixedformat table
Moved this out of the data loading framework because it is kind of a special
case. I will consider how we can update the framework to address mixed format
tables.
2014-01-08 10:44:18 -08:00
Lenni Kuff
bf27a31f98 Move functional data loading to new framework + initial changes for workload directory structure
This change moves (almost) all the functional data loading to the new data
loading framework. This removes the need for the create.sql, load.sql, and
load-raw-data.sql file. Instead we just have the single schema template file:
testdata/datasets/functional/functional_schema_template.sql

This template can be used to generate the schema for all file formats and
compression variations. It also should help make loading data easier. Now you
can run:

bin/load-impala-data.sh "query-test" "exhaustive"

And get all data needed for running the query tests.

This change also includes the initial changes for new dataset/workload directory
structure. The new structure looks like:

testdata/workload  <- Will contain query files and test vectors/dimensions

testdata/datasets <- WIll contain the data files and schema templates

Note: This is the first part of the change to this directory structure - it's
not yet complete. # Please enter the commit message for your changes. Lines starting
2014-01-08 10:44:18 -08:00