Commit Graph

225 Commits

Author SHA1 Message Date
Nong Li
707a566b5d Add test to tpcds queries to validate table row counts.
I tried to investigate the jenkins issue where we weren't returning any rows.
I setup the cluster on that box manually and noticed there weren't any results
because the store_sales table was empty. Refresh did not fix. This looks like
a data loading issue. Adding this test would make discovering this like this
much easier.

Change-Id: I8ccddd43892b279d506371b9de717629815c6a08
Reviewed-on: http://gerrit.ent.cloudera.com:8080/260
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:17 -08:00
Lenni Kuff
a3016cc4d4 Add partitioned tpcds insert workload and tests
Change-Id: Iff45853153bf0830be3e423c994392998385a64f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/256
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:16 -08:00
Nong Li
a3bc1ce133 Some parquet encoder/decoder refactoring. Added dictionary to other types.
Split out the encoder/type for parquet reader/writer. I think this puts us
in a better place to support future encodings.

On the tpch lineitem table, the results are:
Before:
  BytesWritten: 236.45 MB
  Per Column Sizes:
    l_comment: 75.71 MB
    l_commitdate: 8.64 MB
    l_discount: 11.19 MB
    l_extendedprice: 33.02 MB
    l_linenumber: 4.56 MB
    l_linestatus: 869.98 KB
    l_orderkey: 8.99 MB
    l_partkey: 27.02 MB
    l_quantity: 11.58 MB
    l_receiptdate: 8.65 MB
    l_returnflag: 1.40 MB
    l_shipdate: 8.65 MB
    l_shipinstruct: 1.45 MB
    l_shipmode: 2.17 MB
    l_suppkey: 21.91 MB
    l_tax: 10.68 MB
After:
 BytesWritten: 198.63 MB            (84%)
  Per Column Sizes:
    l_comment: 75.71 MB             (100%)
    l_commitdate: 8.64 MB           (100%)
    l_discount: 2.89 MB             (25.8%)
    l_extendedprice: 33.13 MB       (100.33%)
    l_linenumber: 1.50 MB           (32.89%)
    l_linestatus: 870.26 KB         (100.032%)
    l_orderkey: 9.18 MB             (102.11%)
    l_partkey: 27.10 MB             (100.29%)
    l_quantity: 4.32 MB             (37.31%)
    l_receiptdate: 8.65 MB          (100%)
    l_returnflag: 1.40 MB           (100%)
    l_shipdate: 8.65 MB             (100%)
    l_shipinstruct: 1.45 MB         (100%)
    l_shipmode: 2.17 MB             (100%)
    l_suppkey: 10.11 MB             (46.14%)
    l_tax: 2.89 MB                  (27.06%)

The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally
bigger. If the file filled  the 1 GB, I'd expect the overhead to decrease even
more.

The restructuring to use a virtual call doesn't seem to change things much and
will go away when we codegen the scanner.

Here's what they look like with this patch (note this is on the before data files,
so only string cols are dictionary encoded).

Before query times:
  Insert Time: 8.5 sec
  select *: 2.3 sec
  select avg(l_orderkey): .33 sec

After query times:
  Insert Time: 9.5 sec                  <-- Longer due to doing dictionary encoding
  select *: 2.4 sec                     <-- kind of noisy, possibly a slight slow down
  select avg(l_orderkey): .33 sec

Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6
Reviewed-on: http://gerrit.ent.cloudera.com:8080/238
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:16 -08:00
Nong Li
e098b4cd85 IMPALA-499: Allow parquet scanner to read files with mismatched schemas.
We've always supported the hive metadata having fewer columns but this change
will put NULLs for missing columns like the other scanners.

Change-Id: I92de1decd30357476bbeb27f4248239fe8d0c668
Reviewed-on: http://gerrit.ent.cloudera.com:8080/233
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:14 -08:00
Skye Wanderman-Milne
b9ea32e9b7 Fix IMPALA-129, IMPALA-534, and other scanner bugs.
Change-Id: Idbd29af3fcc35b9e1173d08ac55b5780751c5938
Reviewed-on: http://gerrit.ent.cloudera.com:8080/196
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:52:14 -08:00
Lenni Kuff
17ed6ea177 Partition TPC-DS dataset and add additional TPC-DS workload queries
Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300
Reviewed-on: http://gerrit.ent.cloudera.com:8080/99
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:13 -08:00
Alex Behm
9a201645cd IMPALA-496: Fix escaping of field delimiter and escape character in inserts
Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/163
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:09 -08:00
Nong Li
41b1d36a9d Fix IMPALA-122: Lzo scanner with small scan ranges.
Change-Id: I5226fd1a1aa368f5b291b78ad371363057ef574e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/140
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:03 -08:00
Skye Wanderman-Milne
7b5836f3fb IMPALA-510: Cannot query RC file for table that has more columns than the data file
Change-Id: I94d582341ca972675d538623789e445fbcd5cfb8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/132
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
2014-01-08 10:52:02 -08:00
Lenni Kuff
faeb7f5fa3 Add scanner test case for scenario where data and table schema do not match
Change-Id: I16f007ad1cb2caac47506914512c5665fc3d5f56
Reviewed-on: http://gerrit.ent.cloudera.com:8080/98
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:01 -08:00
Lenni Kuff
69836f8ef4 IMPALA-503: DESCRIBE FORMATTED on a text/lzo table fails with ClassNotFoundException
Change-Id: I8c495fb14f0fc983a2cbb4556f6d4bb05d7448b5
Reviewed-on: http://gerrit.ent.cloudera.com:8080/107
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:01 -08:00
Skye Wanderman-Milne
f87878f111 IMPALA-412: Impala might hang when an impalad dies during query execution
Allows queries to be cancelled while fetching rows and fixes some test bugs.
2014-01-08 10:51:51 -08:00
Lenni Kuff
d84a33efa7 Make ABORT_ON_ERROR=true the default query option value 2014-01-08 10:51:46 -08:00
Skye Wanderman-Milne
3fecdeb793 IMPALA-441: support default values for Avro tables 2014-01-08 10:51:39 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Nong Li
c2370c3a2d Remove gzip from parquet testing. 2014-01-08 10:51:17 -08:00
Lenni Kuff
abdfae5b24 Update DESCRIBE FORMATTED results to match the Hive HS2 output 2014-01-08 10:51:14 -08:00
Lenni Kuff
fe9dbb64d7 IMPALA-403: Add support for DESCRIBE FORMATTED <table> 2014-01-08 10:51:05 -08:00
Nong Li
7c6598066c Add testing for different compression codecs with parquet. 2014-01-08 10:51:04 -08:00
Lenni Kuff
c2cfc7e2a3 IMPALA-373: Add support for 'LOAD DATA' statements
This change adds Impala support for LOAD DATA statements. This allows the user
to load one or more files into a table or partition from a given HDFS location. The
load operation only moves files, it does not convert data to match the target
table/partition's file format.
2014-01-08 10:51:02 -08:00
Alex Behm
045038e479 IMPALA-374: Added WITH clause without recursion. 2014-01-08 10:51:00 -08:00
Henry Robinson
79b36a5eb3 IMPALA-375: Add column permutation clause to INSERT statement 2014-01-08 10:50:59 -08:00
Alan Choi
b1de018298 IMPALA-31 Support EXPLAIN <query>
Hue is moving to HiveServer2 but HiveServer2 does not have an "explain" RPC
call. To support "explain", I added it to the language.

An "explain" statement will return a result set: one row per explain line.
2014-01-08 10:50:32 -08:00
Alex Behm
0546d7a08a IMPALA-339: Update lastDdlTime in Hive metastore. 2014-01-08 10:50:31 -08:00
Alex Behm
937a44f9f8 IMPALA-68: Support Values() statement. 2014-01-08 10:50:31 -08:00
Lenni Kuff
ef2a55d17b IMPALA-349: Final query state for some successful DDL operations is EXCEPTION 2014-01-08 10:50:28 -08:00
Lenni Kuff
c74b7e41dd Enable insert tests to run against parquet 2014-01-08 10:49:47 -08:00
Nong Li
1f6481382e Fix parquet test setup. 2014-01-08 10:49:41 -08:00
Nong Li
741599dc2a Move compressed table test out of core. 2014-01-08 10:49:40 -08:00
Alex Behm
1b2e8280d4 Fix NULL issues. 2014-01-08 10:49:32 -08:00
Nong Li
f60f2d3e50 Implement support for grouped scan ranges in io mgr and integration with parquet. 2014-01-08 10:49:18 -08:00
Alex Behm
0821e2f826 IMPALA-66: Support for UNION with constant SELECT clauses. 2014-01-08 10:49:18 -08:00
Lenni Kuff
f4a5c0628f Cleanup HDFS directories before and after running ALTER TABLE tests 2014-01-08 10:49:17 -08:00
Alex Behm
fc310fadab Reenabled -1 to indicate no memory limit since CM uses that. 2014-01-08 10:49:15 -08:00
Lenni Kuff
1fb72fbc73 IMPALA-156: Support core 'ALTER TABLE' DDL command
This patch adds support for
- ALTER TABLE ADD|REPLACE COLUMNS
- ALTER TABLE DROP COLUMN
- ALTER TABLE ADD/DROP PARTITION
- ALTER TABLE SET FILEFORMAT
- ALTER TABLE SET LOCATION
- ALTER TABLE RENAME
2014-01-08 10:49:14 -08:00
Alex Behm
a3a3411dc2 IMPALA-172: Add format options to --mem_limits flag: {M, G, %} 2014-01-08 10:49:14 -08:00
Elliott Clark
0e0c02b6bd Add the ability to Select into HBase table.
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
2014-01-08 10:49:06 -08:00
ishaan
b8c90d9852 Add a test for simple per-query memory_limits. 2014-01-08 10:49:05 -08:00
Lenni Kuff
03f04518d7 Fix planner test failure due to empty tpch temp tables 2014-01-08 10:49:04 -08:00
Alan Choi
991db9001b IMPALA-113 Raise error when default order by limit is exceeded 2014-01-08 10:49:03 -08:00
Lenni Kuff
8d1674f638 Run only subset of tests with small batch_sizes + a few small fixes 2014-01-08 10:48:58 -08:00
Lenni Kuff
ca0d23a844 IMPALA-157: Support CREATE TABLE LIKE DDL 2014-01-08 10:48:55 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00
ishaan
5ed84d7f65 IMP-739 Results for show queries should check for subset, not equality. 2014-01-08 10:48:46 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
Lenni Kuff
328ceed4e7 Add support for generating lzo compressed text files and running tests against lzo 2014-01-08 10:48:38 -08:00
Lenni Kuff
90d7e085fa Update tests to use num_nodes=0, use external impala cluster, add sanity check run mode 2014-01-08 10:48:38 -08:00
Lenni Kuff
1cd847c856 IMPALA-81: Add support for CREATE/DROP DATABASE/TABLE
This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala
supports creating tables in the metastore stored as text, sequence, and rc file format.
It currently only supports creating unpartitioned tables and tables stored in HDFS.
2014-01-08 10:48:30 -08:00
Marcel Kornacker
c02d25baa8 IMPALA-20: Limit clause in inline view not handled correctly by planner
- this adds a SelectNode that evaluates conjuncts and enforces the limit
- all limits are now distributed: enforced both by the child plan fragment and
  by the merging ExchangeNode
- all limits w/ Order By are now distributed: enforced both by the child plan fragment and
  by the merging TopN node
2014-01-08 10:48:29 -08:00
Lenni Kuff
5f9cd044ee Add scanner test suite that runs across all file format/compression permuations 2014-01-08 10:48:25 -08:00