Commit Graph

12 Commits

Author SHA1 Message Date
Tim Armstrong
6311f39cd4 IMPALA-5591: set should handle negative values
The parser didn't account for the possibility of negative
numeric literals.

Testing:
Added a test that sets a negative value. Query tests send the whole
"set" statement to the backend for execution so exercise the parser.

Ran core tests.

Change-Id: I5c415dbed6ba1122919be75f5811444d88ee03b4
Reviewed-on: http://gerrit.cloudera.org:8080/7316
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-29 05:26:56 +00:00
Tim Armstrong
95ed4434f2 IMPALA-3202,IMPALA-2079: rework scratch file I/O
Refactor BufferedBlockMgr/TmpFileMgr to push more I/O logic into
TmpFileMgr, in anticipation of it being shared with BufferPool.
TmpFileMgr now handles:
* Scratch space allocation and recycling
* Read and write I/O

The interface is also greatly changed so that it is built around Write()
and Read() calls, abstracting away the details of temporary file
allocation from clients. This means the TmpFileMgr::File class can
be hidden from clients.

Write error recovery:
Also implement write error recovery in TmpFileMgr.

If an error occurs while writing to scratch and we have multiple
scratch directories, we will try one of the other directories
before cancelling the query. File-level blacklisting is used to
prevent excessive repeated attempts to resize a scratch file during
a single query. Device-level blacklisting is not implemented because
it is problematic to permanently take a scratch directory out of use.

To reduce the number of error paths, all I/O errors are now handled
asynchronously. Previously errors creating or extending the file were
returned synchronously from WriteUnpinnedBlock(). This required
modifying DiskIoMgr to create the file if not present when opened.

Also set the default max_errors value in the thrift definition file,
so that it is in effect for backend tests.

Future Work:
* Support for recycling variable-length scratch file ranges. I omitted
  this to avoid making the patch even large.

Testing:
Updated BufferedBlockMgr unit test to reflect changes in behaviour:
* Scratch space is no longer permanently associated with a block, and
  is remapped every time a new block is written to disk .
* Files are now blacklisted - updated existing tests and enable the
  disable blacklisting test.

Added some basic testing of recycling of scratch file ranges in
the TmpFileMgr unit test.

I also manually tested the code in two ways. First by removing permissions
for /tmp/impala-scratch and ensuring that a spilling query fails cleanly.
Second, by creating a tiny ramdisk (16M) and running with two scratch
directories: one on /tmp and one on the tiny ramdisk. When spilling, an
out of space error is encountered for the tiny ramdisk and impala spills
the remaining data (72M) to /tmp.

Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Reviewed-on: http://gerrit.cloudera.org:8080/5141
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-05 02:26:24 +00:00
Matthew Jacobs
f067929f3a IMPALA-3535: Ignore invalid per-pool default query options
In 2.5 we added the ability to set per-pool default query
options. A string of key-value pairs can be specified with a
pool configuration. However, if any options fail to parse,
then all the options are ignored. We want that behavior (and
returning an error) when parsing the process-wide default
query options on startup and when parsing the options sent
from a client (e.g. in beeswax server) because an error can
be returned immediately for the triggering action at that
time (i.e. starting the impalad or submitting a query with
the options set). This behavior is bad for the pool default
query options because (a) the configuration is set by the
administrator and there's nothing we can do until a query is
submitted and (b) one invalid option shouldn't mean that
other valid options aren't set.

Change-Id: If04733b775963091b0314c65286df126fd812358
Reviewed-on: http://gerrit.cloudera.org:8080/3056
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-17 10:09:05 -07:00
Michael Ho
cbcda93dfb IMPALA-3334: Fix some bugs in query options' parsing.
This change fixes two problems:

1. The query options OPTIMIZE_PARTITION_KEY_SCANS and
   DISABLE_STREAMING_PREAGGREGATIONS are both boolean
   so they should accept 'true' and '1' as input values.
   Previously, these two options are treated as int and
   value such as 'true' doesn't work with them.

2. The break statement in the case statement of the option
   SCAN_NODE_CODEGEN_THRESHOLD was 'stolen' by the option
   DISABLE_STREAMING_PREAGGREGATIONS when it was added.
   This change adds the missing break statement back for
   SCAN_NODE_CODEGEN_THRESHOLD.

Change-Id: I5c74a1e5c49e3bda15a91b40740fc7310303207b
Reviewed-on: http://gerrit.cloudera.org:8080/2776
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:31 -07:00
Michael Ho
968c61c940 IMPALA-2824: Restore query options after each test.
A failed test case inside a test file will leave the rest of
the test cases in the file unexecuted. Some test cases may
modify some query options such as memory limit and then
restore them in the subsequent test cases in the same file.
The failure of those test cases will leave the query options
modified, causing cascading failures to other test cases
which aren't expected to be run with the modified query
options (e.g. lowered memory limit). This problem may lead
to broken builds which are recorded in IMPALA-2724 and
IMPALA-2824.

This change fixes the problem above by checking if a test
case modifies any query option and if so, restore those
modified query options to their default values. This change
makes the assumption that a test should not modify an option
specified in its test vector so it's safe to restore the
modified query options to their default values.

Change-Id: Ib88d1dcb6a65183e1afc8eef0c764179a9f6a8ce
Reviewed-on: http://gerrit.cloudera.org:8080/1774
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-01-26 03:13:05 +00:00
Alex Behm
ecdd5688b9 Nested Types: Tuple pointers are owned by the containing RowBatch by default.
This patch makes the ownership of the memory backing the tuple pointers of
a RowBatch dependent on whether the legacy joins and aggs are enabled:

By default, the memory is malloc'd and owned by the RowBatch:
If enable_partitioned_hash_join=true and enable_partitioned_aggregation=true
then the memory is owned by the RowBatch and is freed upon its destruction.
This mode is more performant especially with SubplanNodes in the ExecNode tree
because the tuple pointers are not transferred and do not have to be re-created
in every Reset().

Memory is allocated from MemPool:
Otherwise, the memory is allocated from the RowBatch's tuple pool. As a result,
the pointer memory is transferred just like tuple data, and must be re-created
in Reset(). This mode is required for the legacy join and agg which rely on the
tuple pointers being allocated from the RowBatch's tuple pool, so they can
acquire ownership of the tuple pointers.

Performance impact for nested types:
Initial cluster runs and profiling on nested TPCH identified excessive
malloc/frees as a major performance bottleneck. This change paves the way
for further optimizations which yielded a 2x improvement in response time
for most nested TPCH queries.

Change-Id: I4ac58b18058ce46b4db89fbe117b0bcad19e9ee7
Reviewed-on: http://gerrit.cloudera.org:8080/807
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-09-14 13:43:01 -07:00
Vlad Berindei
452ebee59d IMPALA-1906: PARQUET_FILE_SIZE query option overflows for values >= 2GB.
The value of PARQUET_FILE_SIZE overflows when RoundUp() is called because this function
returns an int32. Even with this change, this value will still overflow when calling the
HDFS API since it is passed to hdfsOpenFile() as blocksize, which is an int32 parameter
(see HDFS-8949).

Changes:
- Return an error if PARQUET_FILE_SIZE is set to a value greater than or equal to 2GB.
  - If PARQUET_FILE_SIZE is set in an Impala session to a value greater than or equal to
    2GB, then every query will fail with an error message.
  - If PARQUET_FILE_SIZE is changed to a value greater than or equal to 2GB as an impalad
    argument, impalad will not start and log an error.
- Ceil(), RoundUp(), RoundDown() return int64.

Change-Id: Ie4f2551b72954e2a57db5594e4789e3f7434d578
Reviewed-on: http://gerrit.cloudera.org:8080/678
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Vlad Berindei <vlad.berindei@cloudera.com>
Tested-by: Internal Jenkins
2015-08-25 23:28:13 +00:00
Alex Behm
f696861c5c Throw error on unrecognized test sections.
Our .test file parser used to not abort tests when there
is a malformed test/section. This patch changes that behavior
to report an error and treat the test as failed.

Quite a few tests were not well-formed, and were not executed
as a result. This patch fixes those tests.

Arguably, the test file parser should be more flexible in which places
to accept comments, but this patch does not address that problem.

Change-Id: If53358eb0cb958b68e51940b071e64c1d6c3ec6f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5468
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-12-02 18:08:09 -08:00
Nong Li
d52a620737 Add support for writing compressed text.
Change-Id: I314b925594801ae4b5c47248d998801aa0b37270
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4205
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-07 22:08:30 -07:00
Victor Bittorf
f2ef06bef1 SEQUENCEFILE: Add support for writing sequence files.
This supports both uncompressed and block compressed formats. Row compressed formats are
not supported. The type of compression is specified using a query parameter
COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY.

Note: this patch only has basic testing. More extensive testing will be done when this
avro writer is used in data loading.

Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-08-17 12:45:10 -07:00
Dan Hecht
09bd8b7c27 Fix SetStmt.toSql().
It needs to handle the "SET" case.  Also, add some missing test cases
for "SET".  Also, cleanup test_set/set.test.

Change-Id: I34f6005ef17e196d94366e5301251a2987746fbf
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3620
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 41890b5a13f9429f058fb12453c78323df11fc7d)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3655
2014-07-30 11:37:11 -07:00
Dan Hecht
1fee56cb26 IMPALA-1080: Implement "SET <query_option>" as SQL statement.
Also add support for "SET", which returns a table of query options and
their respective values.

The front-end parses the option into a (key, value) pair and then the
existing backend logic is used to set the option, or return the result
sets.

Change-Id: I40dbd98537e2a73bdd5b27d8b2575a2fe6f8295b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3582
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
Tested-by: jenkins
(cherry picked from commit aa0f6a2fc1d3fe21f22cc7bc56887e1fdb02250b)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3614
2014-07-25 10:25:09 -07:00