impala

mirror of https://github.com/apache/impala.git synced 2026-01-03 15:00:52 -05:00

Author	SHA1	Message	Date
Attila Jeges	bc56d3c48c	IMPALA-5407: Fix crash in HdfsSequenceTableWriter The following use of sequence file writer can lead to a crash: > set compression_codec=gzip; > set seq_compression_mode=record; > set allow_unsupported_formats=1; > create table seq_tbl like tbl stored as sequencefile; > insert into seq_tbl select * from tbl; This fix removes the MemPool::FreeAll() call from HdfsSequenceTableWriter::Flush(). Freeing the memory pool in Flush() is incorrect because a memory pool buffer is cached by the compressor in the table writer which isn't reset across calls to Flush(). If the file that is being written is big enough, HdfsSequenceTableWriter::AppendRows() will call Flush() multiple times causing memory corruption. Change-Id: Ida0b9f189175358ae54149d0e1af7caa06ae3bec Reviewed-on: http://gerrit.cloudera.org:8080/7394 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-19 06:48:06 +00:00
Michael Ho	f15589573b	IMPALA-5376: Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Reviewed-on: http://gerrit.cloudera.org:8080/6877 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-27 05:19:53 +00:00
Attila Jeges	59b2db6ba7	IMPALA-3079: Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala created corrupt uncompressed sequence files. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: - empty key-lengths block - empty keys block - empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Reviewed-on: http://gerrit.cloudera.org:8080/6107 Reviewed-by: Michael Ho <kwho@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2017-04-25 21:07:53 +00:00
Victor Bittorf	4339133887	Adding SEQUENCEFILE compressed record format Currently we do not support per record compression for SEQUENCEFILE; we do support no compression and block compression. Per record compression is typically very slow (since the compressor is invoked per record in the table) and not widely used. We chose to add support for per record compression as part of our effort to use Impala for all of our testdata loading infrastructure. We have per record compressed tables in testdata, so even though there is no customer demand for per record compression, we need it to migrate our data loading off of Hive. Change-Id: I6ea98ae0d31cceff8236b4b006c3a9fc00f64131 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5302 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins (cherry picked from commit f62a76f8d00b8dbc2846deb36ee5f65031ad846e) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5322	2014-11-19 17:21:36 -08:00
Victor Bittorf	3f75bd6735	Reintroduce SEQUENCEFILE writer tests The sequence writer test had an issue with zlib on certain cluster machines, making this a flaky test. This has passed several times locally and in private builds. This re-enables the test because the failures could not be produced in private builds. Change-Id: I0aeea3a2d000e711e5a84427a7b40592e1eef75b Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5077 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-11-17 11:19:16 -08:00
Victor Bittorf	dbaf718221	IMPALA-1185: Make Avro and Seq writers unsupported Avro and Sequence writers are only available if query option ALLOW_UNSUPPORTED_FORMATS is set to true, prints an error otherwise. Change-Id: I597039f7c68f708fda10f848531eb557d6910f92 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4539 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-09-26 12:28:03 -07:00
Nong Li	d52a620737	Add support for writing compressed text. Change-Id: I314b925594801ae4b5c47248d998801aa0b37270 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4205 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-09-07 22:08:30 -07:00
Victor Bittorf	f2ef06bef1	SEQUENCEFILE: Add support for writing sequence files. This supports both uncompressed and block compressed formats. Row compressed formats are not supported. The type of compression is specified using a query parameter COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY. Note: this patch only has basic testing. More extensive testing will be done when this avro writer is used in data loading. Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541 Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com> Tested-by: jenkins	2014-08-17 12:45:10 -07:00

8 Commits