impala

mirror of https://github.com/apache/impala.git synced 2025-12-30 21:02:41 -05:00

Author	SHA1	Message	Date
Alex Behm	19bab59854	Create/alter/describe tables with complex types. This patch adds parsing of complex types and tests for using complex types in various exprs and create/alter/describe stmts. Change-Id: Ibc211a560c889f5ccfb616813700b923c89d8245 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3577 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3594	2014-07-23 17:26:14 -07:00
ishaan	52f223137c	[CDH5] Remove the constraint to not load avro tables for tpcds. Change-Id: I7e29ccb1db34e671c369d480e2ce7a46264c62c4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3440 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-07-15 19:37:38 -07:00
ishaan	2b5df0c6ff	[CDH5] Convert tpch schemas to decimal and change the queries where possible. I used the following document for reference: http://www.tpc.org/tpch/spec/tpch2.1.0.pdf Change-Id: Ic84db0628323c90e89552707f214bbb9fa2f2ae0 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3132 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-07-08 14:51:43 -07:00
Dimitris Tsirogiannis	2aedf5fab4	Add missing ALTER TABLE statement in alltypesaggmultifiles table. The DDL statements for adding the partitions of alltypesaggmultifiles did not include an ALTER TABLE stmt for one of the partitions, thereby causing the planner tests to fail when test data were loaded from a snapshot. Change-Id: Id4b078cd334d816d6eb8eb15e5856189701a4bca Reviewed-on: http://gerrit.ent.cloudera.com:8080/3305 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3310	2014-06-27 18:00:09 -07:00
Dimitris Tsirogiannis	6a795915d6	Fix loading data from snapshopt for alltypesagg table. The alltypesagg table was not loaded correctly from a snapshot file due to a missing ALTER TABLE statement, thereby causing some tests to fail. Change-Id: I74066a99529f24fc268bb5779d3fb64fbd4f66b9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3248 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3270 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>	2014-06-25 21:52:11 -07:00
Dimitris Tsirogiannis	5a6f53db16	Add partition pruning tests The following changes are included in this commit: 1. Modified the alltypesagg table to include an additional partition key that has nulls. 2. Added a number of tests in hdfs.test that exercise the partition pruning logic (see IMPALA-887). 3. Modified all the tests that are affected by the change in alltypesagg. Change-Id: I1a769375aaa71273341522eb94490ba5e4c6f00d Reviewed-on: http://gerrit.ent.cloudera.com:8080/2874 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3236	2014-06-24 02:14:27 -07:00
Dimitris Tsirogiannis	7dbd3a5860	IMPALA-1040: Reading a decimal partitioned column with invalid values This commit fixes IMPALA-1040 in which when an invalid value is inserted to a decimal partitioned column through hive it results in a non informative error message and in some cases in the associated table to disappear from Impala's catalog. The fix results in a more informative error message to always be thrown by Impala to indicate the insertion of an invalid partition key value. Change-Id: I2855ea69944e269fb7e02b3825f44e64352151e7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/3062 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3200	2014-06-20 12:46:52 -07:00
ishaan	db97981ab9	[CDH5] Switch the tpcds schemas to use decimal instead of float/double. This patch converts the tpcds schemas to use decimal instead of float/double. Currently, Impala can only r/w decimal in text, therefore, the tables are constrained to text. The schemas were obtained from the official tpc spec: http://www.tpc.org/tpcds/spec/tpcds_1.1.0.pdf Change-Id: I1ef0113dcb48bad52af75ee93b47b08adf9e1a69 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2403 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: jenkins	2014-06-08 11:47:23 -07:00
Nong Li	8f4dc0f2f0	IMPALA-974: Switch from FloatLiteral to DecimalLiteral. Float/Doubles are lossy so using those as the default literal type is problematic. Change-Id: I5a619dd931d576e2e6cd7774139e9bafb9452db9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2758 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-31 22:19:06 -07:00
Alex Behm	b252921363	IMPALA-994: Handle incorrect column metadata in views created by Hive. Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-05-19 20:17:23 -07:00
Dimitris Tsirogiannis	a7a9cde86f	CDH-18969: Incorrect query result in Impala This commit fixes issue CDH-18969 where Impala returns wrong results when querying an HBase table. This issue is triggered when a column family sorts lexicographically before ":key", which is the column family of the row key, thereby causing the wrong column to be used as a row key by the backend. The following changes are included: 1. Modified the load function in HBaseTable.java to make sure the catalog object of an HBase table always stores the row key column first. Change-Id: Icd7ebc973d81672c04d5c7c8bbabd813338d5eac Reviewed-on: http://gerrit.ent.cloudera.com:8080/2513 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2602	2014-05-18 16:29:11 -07:00
Skye Wanderman-Milne	edbbe6035e	Decimal: read from Avro Allows reading decimal columns with or without codegen. Includes tests based on a data file posted on HIVE-5823. Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6 (cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-05-16 22:26:11 -07:00
Nong Li	03e5665e56	Decimal: Read/Write to parquet. This adds support for the FIXED_LENGTH_BYTE_ARRAY parquet type and encoding for decimals. Change-Id: I9d5780feb4530989b568ec8d168cbdc32b7039bd Reviewed-on: http://gerrit.ent.cloudera.com:8080/1727 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2432	2014-05-02 16:38:35 -07:00
Skye Wanderman-Milne	60db4d4d82	CDH-18416: Don't inline ReadWriteUtil::ReadZLong() For wide Avro tables, ReadZLong() would get inlined many times into a single function body, causing LLVM to crash. Not inlining doesn't seem to have a performance impact on narrow tables, and helps with wide tables. This change also adds tests over wide (i.e. many-column) tables. The test tables are produced by specifying shell commands to generate test tables in functional_schema_template.sql, which are executed in generate-schema-statements.py. In the SQL templates, sections starting with a ` are treated as shell commands. The output of the shell command is then used as the section text. This is only a starting point; it isn't currently implemented for all sections, and may have to be tweaked if we use this mechanism for all tables. Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353 Tested-by: jenkins	2014-04-28 15:58:15 -07:00
Nong Li	87295a4e06	Decimal implementation. This patch implements decimal support for text based formats. Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238 Tested-by: jenkins	2014-04-14 21:07:32 -07:00
Skye Wanderman-Milne	e60bf29a96	IMPALA-13: Use SSE string functions that take an explicit length This patch modifies DelimitedTextParser and StringValue to work with data containing null characters by using SSE instructions that take a length, rather than expecting null-terminated strings. It also adds some other minor changes to correctly handle data with nulls and to faciliate testing. I checked the execution time of a count() and a select() limit 1 query locally, and saw no difference for either text or sequence files. Change-Id: Ia920b35bea7048aa286f39ec83e313c2a39251d1 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2110 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2181	2014-04-11 11:16:24 -07:00
Nong Li	c27bd34075	Revert "Disable decimal in analysis." This reverts commit 695017410adf6d4f8426c4117798c93f823a4b4b. Change-Id: I919d965e8e711d588e6c56dcdbd3c8e0d9ec7a05 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2104 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-27 12:45:55 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Nong Li	b1aeea3f0b	Disable decimal in analysis. Change-Id: I4cff1fc74ef0afeba15bee5b9eb6851abbfddbdf Reviewed-on: http://gerrit.ent.cloudera.com:8080/1874 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-03-12 18:55:06 -07:00
Nong Li	f0a67153d3	Decimal analysis changes. Change-Id: Ib7d6a6a7650cc9058ff1486fc7546ab66c698d46 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1734 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-03-03 21:15:00 -08:00
Alex Behm	3d764619f7	Run Hive data loading through beeline instead of the Hive shell. Fixes our log configuration to put the Hive logs in cluster_logs/hive. Change-Id: I5d98581e35325f2173e4b3170e36bec42d33f8f3 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1497 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1615 Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-02-20 15:43:31 -08:00
Lenni Kuff	c4879340e8	IMPALA-781: Block Impala from reading RC file tables created using the LazyBinaryColumnarSerDe With Hive .12, the default RC file format can be configured to be ColumnarSerDe or LazyBinaryColumnarSerDe. Impala does not yet support the LazyBinaryColumnarSerDe. This change verifies it is properly disabled. Change-Id: Ia84495868237ce2c89a9706ad75e0f7eb8499057 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1416 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1423	2014-02-01 10:05:19 -08:00
Alex Behm	fc6ecd39e5	[CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting. Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0	2014-01-15 15:11:32 -08:00
Skye Wanderman-Milne	de531e15bd	IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8 parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	9147cd7518	IMPALA-525: Adjust IO buffer size based on read length and other memory fixes We were previously wasting memory by always reading into 8MB IO buffers, even when the data read was much less than 8MB. With this patch, the IO manager picks a buffer size closer to the actual amount being read (we don't use the exact size so we can continue to recycle buffers). The minimum IO buffer size is determined via the --min_buffer_size flag, and the max IO buffer size via the --read_size flag. This technique also helps with IMPALA-652, since short columns will not use as much memory as before (we will not use considerably more memory than the size of the table). This patch also changes StringBuffer to use a doubling strategy so it doesn't end up allocating many large unused buffers, and has the scanner context use the requested length as the sync read size if it's larger than the size produced by read_past_size_cb(). These changes help prevent the boundary buffer in the scanner context from allocating excess memory. Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/938 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:01 -08:00
Lenni Kuff	92829b8400	IMPALA-587: Support implicit hbase column mapping keys The Hive HBase spec specifies that the key column mapping can either be defined explicitly (using the :key syntax) or left out completely in which case a mapping to the first table column is implied. This change updates Impala to support implicit key mappings and also adds some checks in our ALTER TABLE DDL to unsure we cannot get into this state by dropping a column from an Hbase table (a similar restriction that Hive puts in place) Change-Id: I920d642261659ee3e881da2553ffe83300923af8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/554 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:53:14 -08:00
Lenni Kuff	f5c9e4d075	Fix comment location in schema generation script Our data generation doesn't appear to fully support comments in the base table name section. This fixes the data generation, but we should follow on by improving our comment support in the framework. Change-Id: I68274bd98d5b0d54868d6c80b7137a59e7329229 Reviewed-on: http://gerrit.ent.cloudera.com:8080/465 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:47 -08:00
Lenni Kuff	8e2a313673	IMPALA-590: Impala should more gracefully fail when loading HBase tables with complex types Change-Id: Ifc3338ee1339ff0544ed14066824f1aa2d9d7c25 Reviewed-on: http://gerrit.ent.cloudera.com:8080/457 Tested-by: jenkins Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:47 -08:00
ishaan	d3a94aa4fe	Don't load tpcds.store_sales_unpartitioned into any file format except text. Change-Id: I398d26ca8e36a45cb3d0a076cdd604ff6eba793d Reviewed-on: http://gerrit.ent.cloudera.com:8080/444 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:43 -08:00
ishaan	53cd9eadab	Treat HBase as a file format for functional tests Change-Id: Ia01181a1e10eb108419122d347e9d869a69e8922 Reviewed-on: http://gerrit.ent.cloudera.com:8080/102 Reviewed-by: Ishaan Joshi <ishaan@cloudera.com> Tested-by: Ishaan Joshi <ishaan@cloudera.com>	2014-01-08 10:52:36 -08:00
Lenni Kuff	17ed6ea177	Partition TPC-DS dataset and add additional TPC-DS workload queries Change-Id: I5410e68fdfd818a8287e0974332c3e36c344c300 Reviewed-on: http://gerrit.ent.cloudera.com:8080/99 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Marcel Kornacker <marcel@cloudera.com>	2014-01-08 10:52:13 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Alex Behm	f0e2d539fc	IMPALA-495: Views Sometimes Not Utilizing Partition Pruning. Change-Id: I65daebbe8c4b72b956a409fe28edd3773fda7cb7 Reviewed-on: http://gerrit.ent.cloudera.com:8080/128 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:04 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00
Alan Choi	254ee6ef89	IMPALA-434 Support binary hbase encoding	2014-01-08 10:51:18 -08:00
Alex Behm	9ff09cd3f4	IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL	2014-01-08 10:50:28 -08:00
Alex Behm	558590140c	IMPALA-238: Problems inserting into tables with TIMESTAMP partition columns ...	2014-01-08 10:50:05 -08:00
Henry Robinson	018028f8e3	IMPALA-269: Throw exception if serde unrecognised	2014-01-08 10:50:01 -08:00
Lenni Kuff	c74b7e41dd	Enable insert tests to run against parquet	2014-01-08 10:49:47 -08:00
Nong Li	563cbfa3a8	Enable parquet testing	2014-01-08 10:49:40 -08:00
Lenni Kuff	cba9cd00dd	Fix full data load build break due to constructing incorrect HDFS paths	2014-01-08 10:49:34 -08:00
Lenni Kuff	558d5ce755	Data loading: Exec DDL statements via Impala and don't recreate metadata if it exists	2014-01-08 10:49:28 -08:00
Elliott Clark	0e0c02b6bd	Add the ability to Select into HBase table. * Changed frontend analysis for HBase tables * Changed Thrift messages to allow HBase as a sink type. * JNI Wrapper around htable * Create hbase-table-sink * Create hbase-table-writer * Static init lots of JNI related code for HBase. * Cleaned up some cpplint issues. * Changed junit analysis tests * Create a new HBase test table. * Added functional tests for HBase inserts.	2014-01-08 10:49:06 -08:00
Lenni Kuff	831ee529be	Fixed data loading bugs, moved most tables out of load-dependent-tables	2014-01-08 10:48:56 -08:00
Elliott Clark	ade4453a17	Allow impala hbase serdproperties to have newlines	2014-01-08 10:48:54 -08:00
Alex Behm	be03e6c21c	IMPALA-138: Error messages for unknown column types are particularly bad.	2014-01-08 10:48:53 -08:00
Skye Wanderman-Milne	461a48df2b	Refactor testing framework to generate Avro tables.	2014-01-08 10:48:45 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Lenni Kuff	328ceed4e7	Add support for generating lzo compressed text files and running tests against lzo	2014-01-08 10:48:38 -08:00
Lenni Kuff	d57440e87d	Allow column comments for CREATE TABLE and DESCRIBE <table> statements	2014-01-08 10:48:37 -08:00

1 2

76 Commits