mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
As proposed in Jira, this implements decoding and encoding of text buffers for Impala/Hive text tables. Given a table with 'serialization.encoding' property set, similarly to Hive, Impala should be able to encode the inserted data into charset specified, consequently saving it into a text file. The opposite decoding operation should be performed upon reading data buffers from text files. Both operations employ boost::locale::conv library. Since Hive doesn't encode line delimiters, charsets that would have delimiters stored differently from ASCII are not allowed. One difference from Hive is that Impala implements 'serialization.encoding' only as a per partition serdeproperty to avoid confusion of allowing both serde and tbl properties. (See related IMPALA-13748) Note: Due to precreated non-UTF-8 files present in the patch 'gerrit-code-review-checks' was performed locally. (See IMPALA-14100) Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65 Reviewed-on: http://gerrit.cloudera.org:8080/22049 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>