impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 06:00:52 -05:00

Author	SHA1	Message	Date
Nong Li	a3bc1ce133	Some parquet encoder/decoder refactoring. Added dictionary to other types. Split out the encoder/type for parquet reader/writer. I think this puts us in a better place to support future encodings. On the tpch lineitem table, the results are: Before: BytesWritten: 236.45 MB Per Column Sizes: l_comment: 75.71 MB l_commitdate: 8.64 MB l_discount: 11.19 MB l_extendedprice: 33.02 MB l_linenumber: 4.56 MB l_linestatus: 869.98 KB l_orderkey: 8.99 MB l_partkey: 27.02 MB l_quantity: 11.58 MB l_receiptdate: 8.65 MB l_returnflag: 1.40 MB l_shipdate: 8.65 MB l_shipinstruct: 1.45 MB l_shipmode: 2.17 MB l_suppkey: 21.91 MB l_tax: 10.68 MB After: BytesWritten: 198.63 MB (84%) Per Column Sizes: l_comment: 75.71 MB (100%) l_commitdate: 8.64 MB (100%) l_discount: 2.89 MB (25.8%) l_extendedprice: 33.13 MB (100.33%) l_linenumber: 1.50 MB (32.89%) l_linestatus: 870.26 KB (100.032%) l_orderkey: 9.18 MB (102.11%) l_partkey: 27.10 MB (100.29%) l_quantity: 4.32 MB (37.31%) l_receiptdate: 8.65 MB (100%) l_returnflag: 1.40 MB (100%) l_shipdate: 8.65 MB (100%) l_shipinstruct: 1.45 MB (100%) l_shipmode: 2.17 MB (100%) l_suppkey: 10.11 MB (46.14%) l_tax: 2.89 MB (27.06%) The table is overall 84% as big (i.e. 16% smaller). A few columns got marginally bigger. If the file filled the 1 GB, I'd expect the overhead to decrease even more. The restructuring to use a virtual call doesn't seem to change things much and will go away when we codegen the scanner. Here's what they look like with this patch (note this is on the before data files, so only string cols are dictionary encoded). Before query times: Insert Time: 8.5 sec select : 2.3 sec select avg(l_orderkey): .33 sec After query times: Insert Time: 9.5 sec <-- Longer due to doing dictionary encoding select : 2.4 sec <-- kind of noisy, possibly a slight slow down select avg(l_orderkey): .33 sec Change-Id: I213fdca1bb972cc200dc0cd9fb14b77a8d36d9e6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/238 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:52:16 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Alex Behm	9ff09cd3f4	IMPALA-70: Respect tbl properties to allow empty strings to be treated as NULL	2014-01-08 10:50:28 -08:00
Henry Robinson	ead69d377f	IMPALA-249, IMPALA-252: Fixes for static partition keys.	2014-01-08 10:50:14 -08:00
Alex Behm	1b2e8280d4	Fix NULL issues.	2014-01-08 10:49:32 -08:00
Alex Behm	673d7b97cf	IMPALA-190: Insert with NULL partition keys results in SIGSEGV.	2014-01-08 10:49:22 -08:00
ishaan	5138a720bb	IMP-768: Enable the python test framework to check for insert results.	2014-01-08 10:48:22 -08:00
ishaan	09d6d931f4	Change the way data is loaded	2014-01-08 10:48:09 -08:00
Lenni Kuff	ef48f65e76	Add test framework for running Impala query tests via Python This is the first set of changes required to start getting our functional test infrastructure moved from JUnit to Python. After investigating a number of option, I decided to go with a python test executor named py.test (http://pytest.org/). It is very flexible, open source (MIT licensed), and will enable us to do some cool things like parallel test execution. As part of this change, we now use our "test vectors" for query test execution. This will be very nice because it means if load the "core" dataset you know you will be able to run the "core" query tests (specified by --exploration_strategy when running the tests). You will see that now each combination of table format + query exec options is treated like an individual test case. this will make it much easier to debug exactly where something failed. These new tests can be run using the script at tests/run-tests.sh	2014-01-08 10:46:50 -08:00
Michael Ubell	0750384b41	IMP-497 Insert with limit, remove extra files from test.	2014-01-08 10:46:33 -08:00
Michael Ubell	116241f1d1	IMP-497 Insert with limit.	2014-01-08 10:46:33 -08:00
Michael Ubell	7536510b69	IMP-258 Test writing nulls.	2014-01-08 10:46:31 -08:00

12 Commits