impala

mirror of https://github.com/apache/impala.git synced 2025-12-31 06:02:51 -05:00

Author	SHA1	Message	Date
Victor Bittorf	2d7f2e19b2	IMPALA 938: Infer schema from Parquet file Syntax is "CREATE TABLE name LIKE fileformat '/path/to/file'". Supports all options that CREATE TABLE does. Currently only PARQUET is supported. Run testdata/bin/create-load-data.sh after pulling this patch. Change-Id: Ibb9fbb89dbde6acceb850b914c48d12f22b33f55 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2720 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3158	2014-06-20 17:38:01 -07:00
Skye Wanderman-Milne	edbbe6035e	Decimal: read from Avro Allows reading decimal columns with or without codegen. Includes tests based on a data file posted on HIVE-5823. Change-Id: Ie541c6b98bd24543691850cb45a434af60b5a5a6 (cherry picked from commit 6983dcefdf70cce14724e17d03bc061ffb8f671c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2596 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-05-16 22:26:11 -07:00
Nong Li	87295a4e06	Decimal implementation. This patch implements decimal support for text based formats. Change-Id: I8e2c9e512ed149fe965216a72cb21fffd4f18e75 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1669 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com> Reviewed-on: http://gerrit.ent.cloudera.com:8080/2238 Tested-by: jenkins	2014-04-14 21:07:32 -07:00
Lenni Kuff	cc1c0c61fd	IMP-1291: Support "extended" ASCII characters as delimiters in text files This fixes how we validate delimiters to be in line with Hive. A delimiter must fit in a single byte and can be specified in the following formats, as far as I can tell (there isn't documentation): - A single ASCII or unicode character (ex. '\|') - An escape character in octal format (ex. \001. Stored in the metastore as a unicode character: \u0001). - A signed decimal integer in the range [-128:127]. Used to support delimiters for ASCII character values between 128-255 (-2 maps to ASCII 254). Previously, we were not handling the "signed integer" case so there was no way to specify a delimiter in the "extended" ASCII range of 128-255. To support result validation, the test infrastructure had to be updated to support reading/writing different character encodings. Change-Id: Ie3c4d444dc9c6e60192093ed0c0f6f151eab16bc Reviewed-on: http://gerrit.ent.cloudera.com:8080/1848 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/1888	2014-03-13 13:00:15 -07:00
Skye Wanderman-Milne	561da008c7	IMPALA-729: fix resource management in Parquet scanner for multiple row groups We weren't attaching resources to the row batch when starting a new row group, so it was possible for string data to be overwritten. This patch removes CloseStreams() and merges its functionality with AttachCompletedResources() so it's not possible to destroy streams without transferring the resources first. It also merges and removes ScannerContext::Close(). Also adds test cases for IMPALA-720. Change-Id: Ia8f40c7d39d8702716f1d337fe797e2696bd0fcb	2014-01-08 10:56:26 -08:00
Skye Wanderman-Milne	9e17042185	Allow zero bit width dict/RLE decoders. This allows us to read single-value dictionary-encoded columns generated by parquet-mr. Change-Id: I80903d910d0cc3a3e4ebf02e34212d868e94feb4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1098 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	de531e15bd	IMPALA-694: Allow Impala to read files produced by parquet-mr version <= 1.2.8 parquet-mr had a bug where it didn't include the dictionary page's header in the total column size. We now compensate for this by detecting these files and padding the scan range length. This required changing how the scanner detects when it's finished: it now counts the number of rows rather than checking eosr (since the scan range may be longer than the column). Change-Id: Id9933808b965003c0c3b3aa78c32fe29a0c4bcbe Reviewed-on: http://gerrit.ent.cloudera.com:8080/1097 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:27 -08:00
Skye Wanderman-Milne	9147cd7518	IMPALA-525: Adjust IO buffer size based on read length and other memory fixes We were previously wasting memory by always reading into 8MB IO buffers, even when the data read was much less than 8MB. With this patch, the IO manager picks a buffer size closer to the actual amount being read (we don't use the exact size so we can continue to recycle buffers). The minimum IO buffer size is determined via the --min_buffer_size flag, and the max IO buffer size via the --read_size flag. This technique also helps with IMPALA-652, since short columns will not use as much memory as before (we will not use considerably more memory than the size of the table). This patch also changes StringBuffer to use a doubling strategy so it doesn't end up allocating many large unused buffers, and has the scanner context use the requested length as the sync read size if it's larger than the size produced by read_past_size_cb(). These changes help prevent the boundary buffer in the scanner context from allocating excess memory. Change-Id: I0efb3b023ddfddb08bca22d5cb5f9511fb4d6c50 Reviewed-on: http://gerrit.ent.cloudera.com:8080/938 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:01 -08:00
Alex Behm	9a201645cd	IMPALA-496: Fix escaping of field delimiter and escape character in inserts Change-Id: I49c36ae9823b35dcb9e92d1a13bef270657e36f2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/163 Tested-by: jenkins <kitchen-build@cloudera.com> Reviewed-by: Nong Li <nong@cloudera.com> Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-01-08 10:52:09 -08:00
Nong Li	0385d14d69	Fix pre-hive 9 rc file scanner.	2014-01-08 10:48:41 -08:00
Nong Li	783480d6bf	- Cleaned up some TODOs. - Fix tuple template. Fixed strcmp - atoi/atof handle overflows. - added likely/unlikely compiler directive - Runquery now reports mean/stddev for profile runs - removed quoted char	2012-01-18 23:08:29 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00
Nong Li	2880f54d35	Perf Work: - Added perf counter utility - Added google perf tools - Added html data set - Added escape char test - Initial perf tuning	2011-12-30 00:26:27 -08:00
Marcel Kornacker	482e83a396	Removing testdata submodule, we need to have large data files hosted outside of git.	2011-12-16 13:16:28 -08:00
Nong Li	5ae17ad5f9	Adding grep data.	2011-12-06 03:18:30 -08:00
carl	6e47f059cb	Add testdata/data submodule -- maps to CDH/impala-test-data.git	2011-08-04 15:24:53 -07:00

16 Commits