impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 18:02:33 -05:00

Author	SHA1	Message	Date
Nong Li	707a566b5d	Add test to tpcds queries to validate table row counts. I tried to investigate the jenkins issue where we weren't returning any rows. I setup the cluster on that box manually and noticed there weren't any results because the store_sales table was empty. Refresh did not fix. This looks like a data loading issue. Adding this test would make discovering this like this much easier. Change-Id: I8ccddd43892b279d506371b9de717629815c6a08 Reviewed-on: http://gerrit.ent.cloudera.com:8080/260 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:17 -08:00
Lenni Kuff	a3016cc4d4	Add partitioned tpcds insert workload and tests Change-Id: Iff45853153bf0830be3e423c994392998385a64f Reviewed-on: http://gerrit.ent.cloudera.com:8080/256 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:16 -08:00
Nong Li	a3bc1ce133	Some parquet encoder/decoder refactoring. Added dictionary to other types. Split out the encoder/type for parquet reader/writer. I think this puts us in a better place to support future encodings. On the tpch lineitem table, the results are: Before: BytesWritten: 236.45 MB Per Column Sizes: l_comment: 75.71 MB l_commitdate: 8.64 MB l_discount: 11.19 MB l_extendedprice: 33.02 MB l_linenumber: 4.56 MB l_linestatus: 869.98 KB l_orderkey: 8.99 MB l_partkey: 27.02 MB l_quantity: 11.58 MB l_receiptdate: 8.65 MB l_returnflag: 1.40 MB l_shipdate: 8.65 MB l_shipinstruct: 1.45 MB l_shipmode: 2.17 MB l_suppkey: 21.91 MB l_tax: 10.68 MB After: BytesWritten: 198.63 MB (84%) Per Column Sizes: l_comment: 75.71 MB (100%) l_commitdate: 8.64 MB (100%) l_discount: 2.89 MB (25.8%) l_extendedprice: 33.13 MB (100.33%) l_linenumber: 1.50 MB (32.89%) l_linestatus: 870.26 KB (100.032%) l_orderkey: 9.18 MB (102.11%) l_partkey: 27.10 MB (100.29%) l_quantity: 4.32 MB (37.31%) l_receiptdate: 8.65 MB (100%) l_returnflag: 1.40 MB (100%) l_shipdate: 8.65 MB (100%) l_shipinstruct: 1.45 MB (100%) l_shipmode: 2.17 MB (100%) l_suppkey: 10.11 MB (46.14%) l_tax: 2.89 MB (27.06%) The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even more. The restructuring to use a virtual call doesn't seem to change things much and will go away when we codegen the scanner. Here's what they look like with this patch (note this is on the before data files, so only string cols are dictionary encoded). Before query times: Insert Time: 8.5 sec select : 2.3 sec select avg(l_orderkey): .33 sec After query times: Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding select : 2.4 sec <-- kind of noisy, possibly a slight slow down select avg(l_orderkey): .33 sec Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/238 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:16 -08:00
Nong Li	e098b4cd85	IMPALA-499: Allow parquet scanner to read files with mismatched schemas. We've always supported the hive metadata having fewer columns but this change will put NULLs for missing columns like the other scanners. Change-Id: I92de1decd30357476bbeb27f4248239fe8d0c668 Reviewed-on: http://gerrit.ent.cloudera.com:8080/233 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:14 -08:00
Skye Wanderman-Milne	b9ea32e9b7	Fix IMPALA-129, IMPALA-534, and other scanner bugs. Change-Id: Idbd29af3fcc35b9e1173d08ac55b5780751c5938 Reviewed-on: http://gerrit.ent.cloudera.com:8080/196 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:14 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Nong Li	41b1d36a9d	Fix IMPALA-122: Lzo scanner with small scan ranges. Change-Id: I5226fd1a1aa368f5b291b78ad371363057ef574e Reviewed-on: http://gerrit.ent.cloudera.com:8080/140 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:03 -08:00
Skye Wanderman-Milne	7b5836f3fb	IMPALA-510: Cannot query RC file for table that has more columns than the data file Change-Id: I94d582341ca972675d538623789e445fbcd5cfb8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/132 Tested-by: Skye Wanderman-Milne <skye@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:52:02 -08:00
Lenni Kuff	faeb7f5fa3	Add scanner test case for scenario where data and table schema do not match Change-Id: I16f007ad1cb2caac47506914512c5665fc3d5f56 Reviewed-on: http://gerrit.ent.cloudera.com:8080/98 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:01 -08:00
Lenni Kuff	69836f8ef4	IMPALA-503: DESCRIBE FORMATTED on a text/lzo table fails with ClassNotFoundException Change-Id: I8c495fb14f0fc983a2cbb4556f6d4bb05d7448b5 Reviewed-on: http://gerrit.ent.cloudera.com:8080/107 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:01 -08:00
Skye Wanderman-Milne	f87878f111	IMPALA-412: Impala might hang when an impalad dies during query execution Allows queries to be cancelled while fetching rows and fixes some test bugs.	2014-01-08 10:51:51 -08:00
Lenni Kuff	d84a33efa7	Make ABORT_ON_ERROR=true the default query option value	2014-01-08 10:51:46 -08:00
Skye Wanderman-Milne	3fecdeb793	IMPALA-441: support default values for Avro tables	2014-01-08 10:51:39 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Nong Li	c2370c3a2d	Remove gzip from parquet testing.	2014-01-08 10:51:17 -08:00
Lenni Kuff	abdfae5b24	Update DESCRIBE FORMATTED results to match the Hive HS2 output	2014-01-08 10:51:14 -08:00
Lenni Kuff	fe9dbb64d7	IMPALA-403: Add support for DESCRIBE FORMATTED <table>	2014-01-08 10:51:05 -08:00
Nong Li	7c6598066c	Add testing for different compression codecs with parquet.	2014-01-08 10:51:04 -08:00
Lenni Kuff	c2cfc7e2a3	IMPALA-373: Add support for 'LOAD DATA' statements This change adds Impala support for LOAD DATA statements. This allows the user to load one or more files into a table or partition from a given HDFS location. The load operation only moves files, it does not convert data to match the target table/partition's file format.	2014-01-08 10:51:02 -08:00
Alex Behm	045038e479	IMPALA-374: Added WITH clause without recursion.	2014-01-08 10:51:00 -08:00
Henry Robinson	79b36a5eb3	IMPALA-375: Add column permutation clause to INSERT statement	2014-01-08 10:50:59 -08:00
Alan Choi	b1de018298	IMPALA-31 Support EXPLAIN <query> Hue is moving to HiveServer2 but HiveServer2 does not have an "explain" RPC call. To support "explain", I added it to the language. An "explain" statement will return a result set: one row per explain line.	2014-01-08 10:50:32 -08:00
Alex Behm	0546d7a08a	IMPALA-339: Update lastDdlTime in Hive metastore.	2014-01-08 10:50:31 -08:00
Alex Behm	937a44f9f8	IMPALA-68: Support Values() statement.	2014-01-08 10:50:31 -08:00
Lenni Kuff	ef2a55d17b	IMPALA-349: Final query state for some successful DDL operations is EXCEPTION	2014-01-08 10:50:28 -08:00
Lenni Kuff	c74b7e41dd	Enable insert tests to run against parquet	2014-01-08 10:49:47 -08:00
Nong Li	1f6481382e	Fix parquet test setup.	2014-01-08 10:49:41 -08:00
Nong Li	741599dc2a	Move compressed table test out of core.	2014-01-08 10:49:40 -08:00
Alex Behm	1b2e8280d4	Fix NULL issues.	2014-01-08 10:49:32 -08:00
Nong Li	f60f2d3e50	Implement support for grouped scan ranges in io mgr and integration with parquet.	2014-01-08 10:49:18 -08:00
Alex Behm	0821e2f826	IMPALA-66: Support for UNION with constant SELECT clauses.	2014-01-08 10:49:18 -08:00
Lenni Kuff	f4a5c0628f	Cleanup HDFS directories before and after running ALTER TABLE tests	2014-01-08 10:49:17 -08:00
Alex Behm	fc310fadab	Reenabled -1 to indicate no memory limit since CM uses that.	2014-01-08 10:49:15 -08:00
Lenni Kuff	1fb72fbc73	IMPALA-156: Support core 'ALTER TABLE' DDL command This patch adds support for - ALTER TABLE ADD\|REPLACE COLUMNS - ALTER TABLE DROP COLUMN - ALTER TABLE ADD/DROP PARTITION - ALTER TABLE SET FILEFORMAT - ALTER TABLE SET LOCATION - ALTER TABLE RENAME	2014-01-08 10:49:14 -08:00
Alex Behm	a3a3411dc2	IMPALA-172: Add format options to --mem_limits flag: {M, G, %}	2014-01-08 10:49:14 -08:00
Elliott Clark	0e0c02b6bd	Add the ability to Select into HBase table. * Changed frontend analysis for HBase tables * Changed Thrift messages to allow HBase as a sink type. * JNI Wrapper around htable * Create hbase-table-sink * Create hbase-table-writer * Static init lots of JNI related code for HBase. * Cleaned up some cpplint issues. * Changed junit analysis tests * Create a new HBase test table. * Added functional tests for HBase inserts.	2014-01-08 10:49:06 -08:00
ishaan	b8c90d9852	Add a test for simple per-query memory_limits.	2014-01-08 10:49:05 -08:00
Lenni Kuff	03f04518d7	Fix planner test failure due to empty tpch temp tables	2014-01-08 10:49:04 -08:00
Alan Choi	991db9001b	IMPALA-113 Raise error when default order by limit is exceeded	2014-01-08 10:49:03 -08:00
Lenni Kuff	8d1674f638	Run only subset of tests with small batch_sizes + a few small fixes	2014-01-08 10:48:58 -08:00
Lenni Kuff	ca0d23a844	IMPALA-157: Support CREATE TABLE LIKE DDL	2014-01-08 10:48:55 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
ishaan	5ed84d7f65	IMP-739 Results for show queries should check for subset, not equality.	2014-01-08 10:48:46 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	90d7e085fa	Update tests to use num_nodes=0, use external impala cluster, add sanity check run mode	2014-01-08 10:48:38 -08:00
Lenni Kuff	1cd847c856	IMPALA-81: Add support for CREATE/DROP DATABASE/TABLE This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala supports creating tables in the metastore stored as text, sequence, and rc file format. It currently only supports creating unpartitioned tables and tables stored in HDFS.	2014-01-08 10:48:30 -08:00
Marcel Kornacker	c02d25baa8	IMPALA-20: Limit clause in inline view not handled correctly by planner - this adds a SelectNode that evaluates conjuncts and enforces the limit - all limits are now distributed: enforced both by the child plan fragment and by the merging ExchangeNode - all limits w/ Order By are now distributed: enforced both by the child plan fragment and by the merging TopN node	2014-01-08 10:48:29 -08:00
Lenni Kuff	5f9cd044ee	Add scanner test suite that runs across all file format/compression permuations	2014-01-08 10:48:25 -08:00

1 2 3 4 5

225 Commits