impala

mirror of https://github.com/apache/impala.git synced 2026-01-05 21:00:54 -05:00

Author	SHA1	Message	Date
Lenni Kuff	92829b8400	IMPALA-587: Support implicit hbase column mapping keys The Hive HBase spec specifies that the key column mapping can either be defined explicitly (using the :key syntax) or left out completely in which case a mapping to the first table column is implied. This change updates Impala to support implicit key mappings and also adds some checks in our ALTER TABLE DDL to unsure we cannot get into this state by dropping a column from an Hbase table (a similar restriction that Hive puts in place) Change-Id: I920d642261659ee3e881da2553ffe83300923af8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/554 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:14 -08:00
Lenni Kuff	f5c9e4d075	Fix comment location in schema generation script Our data generation doesn't appear to fully support comments in the base table name section. This fixes the data generation, but we should follow on by improving our comment support in the framework. Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229 Reviewed-on: http://gerrit.ent.cloudera.com:8080/465 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:47 -08:00
Lenni Kuff	8e2a313673	IMPALA-590: Impala should more gracefully fail when loading HBase tables with complex types Change-Id: Ifc3338ee1339ff0544ed14066824f1aa2d9d7c25 Reviewed-on: http://gerrit.ent.cloudera.com:8080/457 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:47 -08:00
ishaan	d3a94aa4fe	Don't load tpcds.store_sales_unpartitioned into any file format except text. Change-Id: I398d26ca8e36a45cb3d0a076cdd604ff6eba793d Reviewed-on: http://gerrit.ent.cloudera.com:8080/444 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:43 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Alex Behm	f0e2d539fc	IMPALA-495: Views Sometimes Not Utilizing Partition Pruning. Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Alan Choi	254ee6ef89	IMPALA-434 Support binary hbase encoding	2014-01-08 10:51:18 -08:00
Alex Behm	9ff09cd3f4	IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL	2014-01-08 10:50:28 -08:00
Alex Behm	558590140c	IMPALA-238: Problems inserting into tables with TIMESTAMP partition columns ...	2014-01-08 10:50:05 -08:00
Henry Robinson	018028f8e3	IMPALA-269: Throw exception if serde unrecognised	2014-01-08 10:50:01 -08:00
Lenni Kuff	c74b7e41dd	Enable insert tests to run against parquet	2014-01-08 10:49:47 -08:00
Nong Li	563cbfa3a8	Enable parquet testing	2014-01-08 10:49:40 -08:00
Lenni Kuff	cba9cd00dd	Fix full data load build break due to constructing incorrect HDFS paths	2014-01-08 10:49:34 -08:00
Lenni Kuff	558d5ce755	Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists	2014-01-08 10:49:28 -08:00
Elliott Clark	0e0c02b6bd	Add the ability to Select into HBase table. * Changed frontend analysis for HBase tables * Changed Thrift messages to allow HBase as a sink type. * JNI Wrapper around htable * Create hbase-table-sink * Create hbase-table-writer * Static init lots of JNI related code for HBase. * Cleaned up some cpplint issues. * Changed junit analysis tests * Create a new HBase test table. * Added functional tests for HBase inserts.	2014-01-08 10:49:06 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Elliott Clark	ade4453a17	Allow impala hbase serdproperties to have newlines	2014-01-08 10:48:54 -08:00
Alex Behm	be03e6c21c	IMPALA-138: Error messages for unknown column types are particularly bad.	2014-01-08 10:48:53 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	d57440e87d	Allow column comments for CREATE TABLE and DESCRIBE <table> statements	2014-01-08 10:48:37 -08:00
ishaan	f1d37646ed	Fix creation of the insert_string_partitioned_table	2014-01-08 10:48:20 -08:00
Henry Robinson	222d15c6ca	IMPALA-72: String partition keys should be URL encoded	2014-01-08 10:48:20 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Nong Li	a0229cd12e	Update tpch schema to use bigint for keys.	2014-01-08 10:47:54 -08:00
Lenni Kuff	e20720f5d3	Disable loading Trevni for load/select statements that are not union compatible	2014-01-08 10:46:58 -08:00
Lenni Kuff	bed633c1ae	Extract config/metastore creation from buildall + script for loading warehouse snapshot	2014-01-08 10:46:53 -08:00
Lenni Kuff	1b248d067b	Add TPC-DS dataset and workload	2014-01-08 10:46:52 -08:00
Lenni Kuff	1e25c98fb4	Test data loading framework improvements This change includes a number of improvements for the test data loading framework: * Named sections for schema template definitions * Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE) * More granular data loading via table name filters * Improved robustness in detecting failed data loads * Table level constraints for specific file formats * Re-written compute stats script	2014-01-08 10:46:49 -08:00
Nong Li	adf36b81f9	Fix data errors test.	2014-01-08 10:46:45 -08:00
Nong Li	34879a4ddc	Fix IMP-297	2014-01-08 10:46:44 -08:00
Michael Ubell	7536510b69	IMP-258 Test writing nulls.	2014-01-08 10:46:31 -08:00
Michael Ubell	37aaf06f79	IMP-390 Get rid of test dependencies on InProcessQE and Runquery	2014-01-08 10:46:18 -08:00
Michael Ubell	0c4f025a5e	Fix loading of nulltable data, remove loading functional-planner data	2014-01-08 10:45:58 -08:00
Michael Ubell	bf57ae27a5	IMP-291 Read sequence file to next sync mark when; ragged columns	2014-01-08 10:45:57 -08:00
Michael Ubell	5f951ffc4a	Handle missing columns at the end of a row	2014-01-08 10:45:11 -08:00
Henry Robinson	e7348a209b	IMP-232: Parallel INSERT OVERWRITE	2014-01-08 10:45:04 -08:00
Nong Li	7c411da86c	Fixed schema template.	2014-01-08 10:44:41 -08:00
Nong Li	5b2621a401	Fix null table creation to workaround hive issue.	2014-01-08 10:44:41 -08:00
Nong Li	4c9c82910a	Text parser fix for columns off end.	2014-01-08 10:44:40 -08:00
Nong Li	4d0319d32b	Fix null string parsing.	2014-01-08 10:44:40 -08:00
Lenni Kuff	6e07e0b8d8	Added support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements during data loading Add support for generating ANALYZE TABLE ... COMPUTE STATISTICS statements to the data loading workflow. This allows for capturing simple table stats such as number of rows, number of partitions, and table size in bytes. These are stored into a new mysql database with the same name as the metastore except with a '_Stats' suffix. If using Derby a new database results are stored in a new derby database.	2014-01-08 10:44:34 -08:00
Alan Choi	cbadb4eac4	When a scan range begins at the starting point fo the tuple, we'll missed that tuple. This patch fixes this problem. review: 162	2014-01-08 10:44:24 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Michael Ubell	02d63d8dc3	Trevni file support	2014-01-08 10:44:19 -08:00
Lenni Kuff	84d91fca4f	Fix sequence file data loading for the alltypesmixedformat table Moved this out of the data loading framework because it is kind of a special case. I will consider how we can update the framework to address mixed format tables.	2014-01-08 10:44:18 -08:00

1 2

51 Commits