Commit Graph

63 Commits

Author SHA1 Message Date
Alex Behm
19bab59854 Create/alter/describe tables with complex types.
This patch adds parsing of complex types and tests for using complex
types in various exprs and create/alter/describe stmts.

Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594
2014-07-23 17:26:14 -07:00
Dimitris Tsirogiannis
2aedf5fab4 Add missing ALTER TABLE statement in alltypesaggmultifiles table.
The DDL statements for adding the partitions of alltypesaggmultifiles
did not include an ALTER TABLE stmt for one of the partitions, thereby
causing the planner tests to fail when test data were loaded from a
snapshot.

Change-Id: Id4b078cd334d816d6eb8eb15e5856189701a4bca
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3305
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3310
2014-06-27 18:00:09 -07:00
Dimitris Tsirogiannis
6a795915d6 Fix loading data from snapshopt for alltypesagg table.
The alltypesagg table was not loaded correctly from a snapshot file due
to a missing ALTER TABLE statement, thereby causing some tests to fail.

Change-Id: I74066a99529f24fc268bb5779d3fb64fbd4f66b9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3248
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3270
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
2014-06-25 21:52:11 -07:00
Dimitris Tsirogiannis
5a6f53db16 Add partition pruning tests
The following changes are included in this commit:
1. Modified the alltypesagg table to include an additional partition key
that has nulls.
2. Added a number of tests in hdfs.test that exercise the partition
pruning logic (see IMPALA-887).
3. Modified all the tests that are affected by the change in alltypesagg.

Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236
2014-06-24 02:14:27 -07:00
Dimitris Tsirogiannis
7dbd3a5860 IMPALA-1040: Reading a decimal partitioned column with invalid values
This commit fixes IMPALA-1040 in which when an invalid value is inserted
to a decimal partitioned column through hive it results in a non
informative error message and in some cases in the associated table to
disappear from Impala's catalog. The fix results in a more informative
error message to always be thrown by Impala to indicate the insertion of
an invalid partition key value.

Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200
2014-06-20 12:46:52 -07:00
Nong Li
8f4dc0f2f0 IMPALA-974: Switch from FloatLiteral to DecimalLiteral.
Float/Doubles are lossy so using those as the default literal type
is problematic.

Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-05-31 22:19:06 -07:00
Alex Behm
b252921363 IMPALA-994: Handle incorrect column metadata in views created by Hive.
Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-05-19 20:17:23 -07:00
Dimitris Tsirogiannis
a7a9cde86f CDH-18969: Incorrect query result in Impala
This commit fixes issue CDH-18969 where Impala returns wrong results
when querying an HBase table. This issue is triggered when a column family
sorts lexicographically before ":key", which is the column family of the
row key, thereby causing the wrong column to be used as a row key by the
backend.

The following changes are included:
1. Modified the load function in HBaseTable.java to make sure the
catalog object of an HBase table always stores the row key column first.

Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602
2014-05-18 16:29:11 -07:00
Skye Wanderman-Milne
edbbe6035e Decimal: read from Avro
Allows reading decimal columns with or without codegen. Includes tests
based on a data file posted on HIVE-5823.

Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6
(cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-05-16 22:26:11 -07:00
Nong Li
03e5665e56 Decimal: Read/Write to parquet.
This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and
encoding for decimals.

Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432
2014-05-02 16:38:35 -07:00
Skye Wanderman-Milne
60db4d4d82 CDH-18416: Don't inline ReadWriteUtil::ReadZLong()
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.

This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.

Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
2014-04-28 15:58:15 -07:00
Nong Li
87295a4e06 Decimal implementation.
This patch implements decimal support for text based formats.

Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238
Tested-by: jenkins
2014-04-14 21:07:32 -07:00
Skye Wanderman-Milne
e60bf29a96 IMPALA-13: Use SSE string functions that take an explicit length
This patch modifies DelimitedTextParser and StringValue to work with
data containing null characters by using SSE instructions that take a
length, rather than expecting null-terminated strings. It also adds
some other minor changes to correctly handle data with nulls and to
faciliate testing. I checked the execution time of a count(*) and a
select(*) limit 1 query locally, and saw no difference for either text
or sequence files.

Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181
2014-04-11 11:16:24 -07:00
Nong Li
c27bd34075 Revert "Disable decimal in analysis."
This reverts commit 695017410adf6d4f8426c4117798c93f823a4b4b.

Change-Id: I919d965e8e711d588e6c56dcdbd3c8e0d9ec7a05
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2104
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-27 12:45:55 -07:00
Lenni Kuff
cc1c0c61fd IMP-1291: Support "extended" ASCII characters as delimiters in text files
This fixes how we validate delimiters to be in line with Hive. A delimiter must
fit in a single byte and can be specified in the following formats, as far as I can
tell (there isn't documentation):
- A single ASCII or unicode character (ex. '|')
- An escape character in octal format (ex. \001. Stored in the metastore as a
  unicode character: \u0001).
- A signed decimal integer in the range [-128:127]. Used to support delimiters
  for ASCII character values between 128-255 (-2 maps to ASCII 254).

Previously, we were not handling the "signed integer" case so there was no way
to specify a delimiter in the "extended" ASCII range of 128-255.

To support result validation, the test infrastructure had to be updated to support
reading/writing different character encodings.

Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888
2014-03-13 13:00:15 -07:00
Nong Li
b1aeea3f0b Disable decimal in analysis.
Change-Id: I4cff1fc74ef0afeba15bee5b9eb6851abbfddbdf
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1874
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-03-12 18:55:06 -07:00
Nong Li
f0a67153d3 Decimal analysis changes.
Change-Id: Ib7d6a6a7650cc9058ff1486fc7546ab66c698d46
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1734
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-03-03 21:15:00 -08:00
Alex Behm
3d764619f7 Run Hive data loading through beeline instead of the Hive shell.
Fixes our log configuration to put the Hive logs in cluster_logs/hive.

Change-Id: I5d98581e35325f2173e4b3170e36bec42d33f8f3
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1497
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1615
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-02-20 15:43:31 -08:00
Lenni Kuff
c4879340e8 IMPALA-781: Block Impala from reading RC file tables created using the LazyBinaryColumnarSerDe
With Hive .12, the default RC file format can be configured to be ColumnarSerDe or
LazyBinaryColumnarSerDe. Impala does not yet support the LazyBinaryColumnarSerDe. This change
verifies it is properly disabled.

Change-Id: Ia84495868237ce2c89a9706ad75e0f7eb8499057
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1416
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1423
2014-02-01 10:05:19 -08:00
Skye Wanderman-Milne
de531e15bd IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8
parquet-mr had a bug where it didn't include the dictionary page's
header in the total column size. We now compensate for this by
detecting these files and padding the scan range length. This required
changing how the scanner detects when it's finished: it now counts the
number of rows rather than checking eosr (since the scan range may be
longer than the column).

Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne
9147cd7518 IMPALA-525: Adjust IO buffer size based on read length and other memory fixes
We were previously wasting memory by always reading into 8MB IO
buffers, even when the data read was much less than 8MB. With this
patch, the IO manager picks a buffer size closer to the actual amount
being read (we don't use the exact size so we can continue to recycle
buffers). The minimum IO buffer size is determined via the
--min_buffer_size flag, and the max IO buffer size via the --read_size
flag.

This technique also helps with IMPALA-652, since short columns will
not use as much memory as before (we will not use considerably more
memory than the size of the table).

This patch also changes StringBuffer to use a doubling strategy so it
doesn't end up allocating many large unused buffers, and has the
scanner context use the requested length as the sync read size if it's
larger than the size produced by read_past_size_cb(). These changes
help prevent the boundary buffer in the scanner context from
allocating excess memory.

Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50
Reviewed-on: http://gerrit.ent.cloudera.com:8080/938
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:01 -08:00
Lenni Kuff
92829b8400 IMPALA-587: Support implicit hbase column mapping keys
The Hive HBase spec specifies that the key column mapping can either be
defined explicitly (using the :key syntax) or left out completely in
which case a mapping to the first table column is implied. This change
updates Impala to support implicit key mappings and also adds some
checks in our ALTER TABLE DDL to unsure we cannot get into this state by
dropping a column from an Hbase table (a similar restriction that Hive
puts in place)

Change-Id: I920d642261659ee3e881da2553ffe83300923af8
Reviewed-on: http://gerrit.ent.cloudera.com:8080/554
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:53:14 -08:00
Lenni Kuff
f5c9e4d075 Fix comment location in schema generation script
Our data generation doesn't appear to fully support comments in the
base table name section. This fixes the data generation, but we should
follow on by improving our comment support in the framework.

Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229
Reviewed-on: http://gerrit.ent.cloudera.com:8080/465
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:47 -08:00
Lenni Kuff
8e2a313673 IMPALA-590: Impala should more gracefully fail when loading HBase tables with complex types
Change-Id: Ifc3338ee1339ff0544ed14066824f1aa2d9d7c25
Reviewed-on: http://gerrit.ent.cloudera.com:8080/457
Tested-by: jenkins
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
2014-01-08 10:52:47 -08:00
ishaan
53cd9eadab Treat HBase as a file format for functional tests
Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922
Reviewed-on: http://gerrit.ent.cloudera.com:8080/102
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Ishaan Joshi <ishaan@cloudera.com>
2014-01-08 10:52:36 -08:00
Alex Behm
9a201645cd IMPALA-496: Fix escaping of field delimiter and escape character in inserts
Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2
Reviewed-on: http://gerrit.ent.cloudera.com:8080/163
Tested-by: jenkins <kitchen-build@cloudera.com>
Reviewed-by: Nong Li <nong@cloudera.com>
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-01-08 10:52:09 -08:00
Alex Behm
f0e2d539fc IMPALA-495: Views Sometimes Not Utilizing Partition Pruning.
Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7
Reviewed-on: http://gerrit.ent.cloudera.com:8080/128
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:04 -08:00
Alex Behm
8ad15fabcf IMPALA-372: Added CREATE/DROP/ALTER VIEW. 2014-01-08 10:51:35 -08:00
Alan Choi
254ee6ef89 IMPALA-434 Support binary hbase encoding 2014-01-08 10:51:18 -08:00
Alex Behm
9ff09cd3f4 IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL 2014-01-08 10:50:28 -08:00
Alex Behm
558590140c IMPALA-238: Problems inserting into tables with TIMESTAMP partition columns ... 2014-01-08 10:50:05 -08:00
Henry Robinson
018028f8e3 IMPALA-269: Throw exception if serde unrecognised 2014-01-08 10:50:01 -08:00
Lenni Kuff
c74b7e41dd Enable insert tests to run against parquet 2014-01-08 10:49:47 -08:00
Lenni Kuff
cba9cd00dd Fix full data load build break due to constructing incorrect HDFS paths 2014-01-08 10:49:34 -08:00
Lenni Kuff
558d5ce755 Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists 2014-01-08 10:49:28 -08:00
Elliott Clark
0e0c02b6bd Add the ability to Select into HBase table.
* Changed frontend analysis for HBase tables
* Changed Thrift messages to allow HBase as a sink type.
* JNI Wrapper around htable
* Create hbase-table-sink
* Create hbase-table-writer
* Static init lots of JNI related code for HBase.
* Cleaned up some cpplint issues.
* Changed junit analysis tests
* Create a new HBase test table.
* Added functional tests for HBase inserts.
2014-01-08 10:49:06 -08:00
Lenni Kuff
831ee529be Fixed data loading bugs, moved most tables out of load-dependent-tables 2014-01-08 10:48:56 -08:00
Elliott Clark
ade4453a17 Allow impala hbase serdproperties to have newlines 2014-01-08 10:48:54 -08:00
Alex Behm
be03e6c21c IMPALA-138: Error messages for unknown column types are particularly bad. 2014-01-08 10:48:53 -08:00
Skye Wanderman-Milne
461a48df2b Refactor testing framework to generate Avro tables. 2014-01-08 10:48:45 -08:00
Lenni Kuff
d57440e87d Allow column comments for CREATE TABLE and DESCRIBE <table> statements 2014-01-08 10:48:37 -08:00
ishaan
f1d37646ed Fix creation of the insert_string_partitioned_table 2014-01-08 10:48:20 -08:00
Henry Robinson
222d15c6ca IMPALA-72: String partition keys should be URL encoded 2014-01-08 10:48:20 -08:00
ishaan
09d6d931f4 Change the way data is loaded 2014-01-08 10:48:09 -08:00
Lenni Kuff
bed633c1ae Extract config/metastore creation from buildall + script for loading warehouse snapshot 2014-01-08 10:46:53 -08:00
Lenni Kuff
1e25c98fb4 Test data loading framework improvements
This change includes a number of improvements for the test data loading framework:
* Named sections for schema template definitions
* Removal of uneeded sections from schema template definitions (ex. ANALYZE TABLE)
* More granular data loading via table name filters
* Improved robustness in detecting failed data loads
* Table level constraints for specific file formats
* Re-written compute stats script
2014-01-08 10:46:49 -08:00
Nong Li
adf36b81f9 Fix data errors test. 2014-01-08 10:46:45 -08:00
Nong Li
34879a4ddc Fix IMP-297 2014-01-08 10:46:44 -08:00
Michael Ubell
7536510b69 IMP-258 Test writing nulls. 2014-01-08 10:46:31 -08:00
Michael Ubell
37aaf06f79 IMP-390 Get rid of test dependencies on InProcessQE and Runquery 2014-01-08 10:46:18 -08:00