The Hive HBase spec specifies that the key column mapping can either be
defined explicitly (using the :key syntax) or left out completely in
which case a mapping to the first table column is implied. This change
updates Impala to support implicit key mappings and also adds some
checks in our ALTER TABLE DDL to unsure we cannot get into this state by
dropping a column from an Hbase table (a similar restriction that Hive
puts in place)
Change-Id: I920d642261659ee3e881da2553ffe83300923af8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/554
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
Our data generation doesn't appear to fully support comments in the
base table name section. This fixes the data generation, but we should
follow on by improving our comment support in the framework.
Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229
Reviewed-on: http://gerrit.ent.cloudera.com:8080/465
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading
workflow. This allows for capturing simple table stats such as number of rows, number of
partitions, and table size in bytes. These are stored into a new mysql database with the same
name as the metastore except with a '_Stats' suffix. If using Derby a new database results are
stored in a new derby database.
Moved this out of the data loading framework because it is kind of a special
case. I will consider how we can update the framework to address mixed format
tables.
This change moves (almost) all the functional data loading to the new data
loading framework. This removes the need for the create.sql, load.sql, and
load-raw-data.sql file. Instead we just have the single schema template file:
testdata/datasets/functional/functional_schema_template.sql
This template can be used to generate the schema for all file formats and
compression variations. It also should help make loading data easier. Now you
can run:
bin/load-impala-data.sh "query-test" "exhaustive"
And get all data needed for running the query tests.
This change also includes the initial changes for new dataset/workload directory
structure. The new structure looks like:
testdata/workload <- Will contain query files and test vectors/dimensions
testdata/datasets <- WIll contain the data files and schema templates
Note: This is the first part of the change to this directory structure - it's
not yet complete. # Please enter the commit message for your changes. Lines starting