Commit Graph

68 Commits

Author SHA1 Message Date
Adam Tamas
7295edcc26 IMPALA-9680: Fixed compressed inserts failing
Modified the insert testfiles to get which database they need to
use for 'CREATE TABLE LIKE' dynamically.

Tests:
Did targeted exhaustive testruns in test_insert.py and
test_mt_dop.py and did a full exhaustive testrun.

Change-Id: Ib3c7ba02190f57a7ed40311c95a3dd9eca9b474d
Reviewed-on: http://gerrit.cloudera.org:8080/15816
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2020-05-11 19:32:08 +00:00
Adam Tamas
02d84dcf50 IMPALA-9665: Fixed database not found errors in query_test.test_insert
Fixed the usage of the unique_database in the test_insert.py  to wait with the
tests until the database is synced.

Testing:
-tests/run-tests.py query_test/test_insert.py --exploration_strategy=exhaustive

Change-Id: I9b7aa3775dd4375f536d76f2e236ce126f8c78cd
Reviewed-on: http://gerrit.cloudera.org:8080/15766
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-21 06:12:56 +00:00
Adam Tamas
c32849a391 IMPALA-8980: Remove functional*.alltypesinsert from EE tests
-Modified the ‘test_insert.py’ so the tests can run parallel.
  -Every test will create its own temporary tables for insert testing.
-Swapped out the  SETUP tags to Truncate table QUERY statement.
  -Becouse the SETUP tag is not used anymore, the correspondig
  code was removed.
-A test query in ‘insert.test’. The test was incorrect so modified
to test for the right behavior.

Testing:
-tests/run-tests.py query_test/test_insert.py
-impala-py.test tests/query_test/test_insert.py
-the same for test_insert_permutation.py and test_load.py

Change-Id: I257e936868917a2fcc6c030f6c855b247e8a0eea
Reviewed-on: http://gerrit.cloudera.org:8080/15529
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-14 12:18:21 +00:00
Sahil Takiar
8b8a49e617 IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames
Writes to text tables on ABFS are failing because HADOOP-15860 recently
changed the ABFS behavior when writing files / folders that end with a
'.'. ABFS explicitly does not allow files / folders that end with a dot.
From the ABFS docs: "Avoid blob names that end with a dot (.), a forward
slash (/), or a sequence or combination of the two."

The behavior prior to HADOOP-15860 was to simply drop any trailing dots
when writing files or folders, but that can lead to various issues
because clients may try to read back a file that should exist on ABFS,
but doesn't. HADOOP-15860 changed the behavior so that any attempt to
write a file or folder with a trailing dot fails on ABFS.

Impala writes all text files with a trailing dot due to some odd
behavior in hdfs-table-sink.cc. The table sink writes files with
a "file extension" which is dependent on the file type. For example,
Parquet files have a file extension of ".parq". For some reason, text
files had no file extension, so Impala would try to write text files of
the following form:
"244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0.".

Several tables created during dataload, such as alltypes, already use
the '.txt' extension for their files. These tables are not created via
Impala's INSERT code path, they are copied into the table. However,
there are several tables created during dataload, such as
alltypesinsert, that are created via Impala. This patch will change
the files in these tables so that they end in '.txt'.

This patch adds the ".txt" extension to all written text files and
modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to
a filename if there is no file extension.

Testing:
* Ran core tests
* Re-ran affected ABFS tests
* Added test to validate that the correct file extension is used for
Parquet and text tables
* Manually validated that without the addition of the '.txt' file
extension, files are not written with a trailing dot

Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8
Reviewed-on: http://gerrit.cloudera.org:8080/14621
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-06 05:49:31 +00:00
Sahil Takiar
e8fda1f224 IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS
This test makes the following changes / fixes when running Impala tests
on ABFS:
* Skips some tests in test_lineage.py that don't work on ABFS / ADLS
(they were already skipped for S3)
* Skips some tests in test_mt_dop.py; the test creates a directory that
ends with a period (and ABFS does not support writing files or
directories that end with a period)
* Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with
purge tests fail against ABFS due to trash misbehavior"); I removed
these flags and looped the tests overnight with no failures, so it is
likely whatever bug was causing this has now been fixed
* Now that HADOOP-15860 has been resolved, and the agreed upon behavior
for ABFS is that it will fail if a client tries to write a file /
directory that ends with a period, I added a new entry to the SkipIfABFS
class called file_or_folder_name_ends_with_period and applied it where
necessary

Testing:
* Ran core tests on ABFS

Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285
Reviewed-on: http://gerrit.cloudera.org:8080/14636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-06 05:44:01 +00:00
Tim Armstrong
548106f5e1 IMPALA-8451,IMPALA-8905: enable admission control for dockerised tests
This gives us some additional coverage for using admission
control in a simple but realistic configuration.

What are the implications of this change for test stability and
flakiness?

On one hand were are adding some more unpredictability
to tests, because they may be queued for an arbitrary amount of
time. On the other, we can prevent queries from contending over
memory. Currently we rely on luck to prevent concurrent queries
from forcing each other out-of-memory.

I think the unpredictability from the queueing is
preferable, because we can generally work around these by
fixing tests that are sensitive to being queued, whereas
contention over memory requires us to use crude workarounds
like forcing tests to execute serially.

Added observability for the configured queue wait time for each pool.
I noticed that I did not have a direct way to observe the effective
value when I set configs. This is IMPALA-8905.

I had to tweak tests in a few ways:
* Tests with large strings needed higher memory limits.
* Hardcoded instances of default-pool had to handle root.default
  as well.
* test_query_mem_limit needed to run without a mem_limit. I
  created a special pool root.no-limits with no memory limits
  to allow that.

Testing:
Ran the dockerised build 5-6 times to flush out flaky tests.

Change-Id: I7517673f9e348780fcf7cd6ce1f12c9c5a55373a
Reviewed-on: http://gerrit.cloudera.org:8080/13942
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-27 01:54:39 +00:00
Zoltan Borok-Nagy
3e9cac0cac IMPALA-8854: fix acid insert tests
test_acid_nonacid_insert has been failing lately. HMS became more
strict about checking the capabilities of its clients. Seems like
the Python client doesn't set any capabilities for itself therefore
HMS rejects its attempts of creating and dropping tables.

Now instead of using the RESET utility from the e2e test framework
(to drop and re-create tables), the test is using a unique database
and creates the tables through Impala. Different file formats are
exercised with the help of the DEFAULT_FILE_FORMAT query option.

Change-Id: I3a82338a7820d0ee748c961c8656fa3319c3929c
Reviewed-on: http://gerrit.cloudera.org:8080/14064
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-15 13:02:55 +00:00
Tim Armstrong
cef76db392 IMPALA-8854: skip test_acid_insert
The test is failing because of a Hive version change in some
configurations. Disabling for now until it can be fixed.

Change-Id: I3bc5cce8b9c3843b5bb8ac4d29b2219411f671b4
Reviewed-on: http://gerrit.cloudera.org:8080/14056
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2019-08-14 09:27:14 +00:00
Csaba Ringhofer
a0c00e508f Bump CDP_BUILD_NUMBER to 1318335
The main reason for bumping is to include HIVE-21838.
Also skips / fixes some tests.

Change-Id: I432e8c02dbd349a3507bfabfef2727914537652c
Reviewed-on: http://gerrit.cloudera.org:8080/14005
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-08 15:29:13 +00:00
Zoltan Borok-Nagy
48bb93d474 IMPALA-8636: fix flakiness of ACID INSERT tests
I had to add @UniqueDatabase.parametrize(sync_ddl=True) to some e2e
tests because they were broken in exhaustive mode. When the tests run
with sync_ddl=True then the test files are executed against multiple
impalads which means that each statement in the .test file is executed
against a random impalad.

Change-Id: Ic724e77833ed9ea58268e1857de0d33f9577af8b
Reviewed-on: http://gerrit.cloudera.org:8080/13966
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-01 17:44:26 +00:00
Zoltan Borok-Nagy
6360657cb4 IMPALA-8636: Implement INSERT for insert-only ACID tables
This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Reviewed-on: http://gerrit.cloudera.org:8080/13559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-27 13:45:51 +00:00
Abhishek
97a6a3c807 IMPALA-8617: Add support for lz4 in parquet
A new enum value LZ4_BLOCKED was added to the THdfsCompression enum, to
distinguish it from the existing LZ4 codec. LZ4_BLOCKED codec represents
the block compression scheme used by Hadoop. Its similar to
SNAPPY_BLOCKED as far as the block format is concerned, with the only
difference being the codec used for compression and decompression.

Added Lz4BlockCompressor and Lz4BlockDecompressor classes for
compressing and decompressing parquet data using Hadoop's
lz4 block compression scheme.

The Lz4BlockCompressor treats the input
as a single block and generates a compressed block with following layout
  <4 byte big endian uncompressed size>
  <4 byte big endian compressed size>
  <lz4 compressed block>
The hdfs parquet table writer should call the Lz4BlockCompressor
using the ideal input size (unit of compression in parquet is a page),
and so the Lz4BlockCompressor does not further break down the input
into smaller blocks.

The Lz4BlockDecompressor on the other hand should be compatible with
blocks written by Impala and other engines in Hadoop ecosystem. It can
decompress compressed data in following format
  <4 byte big endian uncompressed size>
  <4 byte big endian compressed size>
  <lz4 compressed block>
  ...
  <4 byte big endian compressed size>
  <lz4 compressed block>
  ...
  <repeated untill uncompressed size from outer block is consumed>

Externally users can now set the lz4 codec for parquet using:
  set COMPRESSION_CODEC=lz4
This gets translated into LZ4_BLOCKED codec for the
HdfsParquetTableWriter. Similarly, when reading lz4 compressed parquet
data, the LZ4_BLOCKED codec is used.

Testing:
 - Added unit tests for LZ4_BLOCKED in decompress-test.cc
 - Added unit tests for Hadoop compatibility in decompress-test.cc,
   basically being able to decompress an outer block with multiple inner
   blocks (the Lz4BlockDecompressor description above)
 - Added interoperability tests for Hive and Impala for all parquet
   codecs. New test added to
   tests/custom_cluster/test_hive_parquet_codec_interop.py

Change-Id: Ia6850a39ef3f1e0e7ba48e08eef1d4f7cbb74d0c
Reviewed-on: http://gerrit.cloudera.org:8080/13582
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-19 04:43:43 +00:00
Abhishek
51e8175c62 IMPALA-8450: Add support for zstd in parquet
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.

Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.

HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
  set COMPRESSION_CODEC=ZSTD:<clevel>;
  set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT

Testing:
  - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
    clevel and a random clevel. The test unit decompresses an input
    compressed data and validates the result. It also tests for
    expected behavior when passing an over/under sized buffer for
    decompressing.
  - Added unit tests for valid/invalid values for COMPRESSION_CODEC.
  - Added e2e test in test_insert_parquet.py which tests writing/read-
    ing (null/non-null) data into/from a table (w different data type
    columns) using multiple codecs. Other existing e2e tests were
    updated to also use parquet/zstd table format.
  - Manual interoperability tests were run between Impala and Hive.

Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-05 11:15:04 +00:00
Michael Ho
ed0a2b6010 IMPALA-7176: Increase wait time in test_insert_mem_limit
test_insert_mem_limit in test_insert.py will wait for all fragments
to exit before proceeding after the test query hits a memory limit.
On some slower EC2 instances with Centos6, the default wait time of
60s may occasionally cause the test to fail due to time out waiting
for all fragments to exit.

This change increases the timeout to 180 seconds.

Testing done:
- Looped test_insert_mem_limit for 100 times on Centos6;

Change-Id: I2e14bef79c6c6fb0004270319f1c491194260438
Reviewed-on: http://gerrit.cloudera.org:8080/13292
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-10 00:38:54 +00:00
Tim Armstrong
23d7a6dce6 IMPALA-8492: reenable large string tests in docker
IMPALA-4865 is fixed so these now pass. I noticed
that the IMPALA-4874 test occasionally hit
"Memory Limit Exceeded" when looped, so I reduced
the data size there slightly.

Testing:
Looped the tests locally against a dockerised minicluster
for a while.

Change-Id: I030f4eff2d3fb771fc92b760efb13170e68285dc
Reviewed-on: http://gerrit.cloudera.org:8080/13233
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-05 00:00:31 +00:00
Tim Armstrong
2ca7f8e7c0 IMPALA-7995: part 1: fixes for e2e dockerised impala tests
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.

The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
  the default webserver port. It failed previously because it
  relied on start-impala-cluster.py setting -webserver_port
  for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
  non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
  it.
* Fix some tests that had 'localhost' hardcoded - instead it should
  be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
  the same for all dockerised impalads.

Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:

  ./buildall.sh -noclean -notests -ninja
  ninja -j $IMPALA_BUILD_THREADS docker_images
  export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
  export FE_TEST=false
  export BE_TEST=false
  export JDBC_TEST=false
  export CLUSTER_TEST=false
  ./bin/run-all-tests.sh

Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-13 02:42:32 +00:00
Philip Zeyliger
214f61a180 IMPALA-8250: Clean up JNI warnings.
Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp	Tue Jan 15 10:43:31 2019 +0000
  +++ b/src/share/vm/prims/jniCheck.cpp	Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +    thisone = true;
  +    fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +    frame fr = os::current_frame();
  +    char buf[6000];
  +    tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id());
  +    print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
     tty->print_cr("WARNING in native method: %s", msg);
     thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
         tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s",
           thr->get_pending_jni_exception_check());
         thr->print_stack();
  +      dump_native_stack(thr, true, 10);
       )
       thr->clear_pending_jni_exception_check(); // Just complain once
     }
   }

  +
  +
   /**
    * Add to the planned number of handles. I.e. plus current live & warning threshold
    */
  @@ -254,9 +276,12 @@
         tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
             live_handles, planned_capacity);
         thr->print_stack();
  +      dump_native_stack(thr, true, 0);
       )
       // Complain just the once, reset to current + warn threshold
       add_planned_handle_capacity(handles, 0);
  +  } else {
  +    dump_native_stack(thr, false, 0);
     }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Reviewed-on: http://gerrit.cloudera.org:8080/12660
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-08 03:35:09 +00:00
poojanilangekar
3a3ab7ff8f IMPALA-6544/IMPALA-7070: Disable tests which fail due to S3's eventual consistency
This patch is a temporary fix to disable tests which fail due to
S3's eventually consistent behavior. The permanent fix would
involve running tests with S3Guard enabled.

Change-Id: I676faa191bec8b156e430661c22ee69242eeba9d
Reviewed-on: http://gerrit.cloudera.org:8080/12203
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-12 03:42:48 +00:00
Tim Armstrong
11a48234ec IMPALA-7648: add tests for OOM from scanning big string
This extends test_insert_large_string to exercise the OOM code path.

Change-Id: I0d1e9b2e8cf6e167da2542950ae90717d4865e9b
Reviewed-on: http://gerrit.cloudera.org:8080/12123
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-12-27 22:23:40 +00:00
Bikramjeet Vig
cf28d8acbd IMPALA-7994: Prevent test_insert_large_string from causing OOM issues
test_insert_large_string uses upto 4GB of untracked memory which results
in random OOMs during exhaustive testing on release builds. Queries run
faster on release builds which might result in a different set of tests
running together when compared to those on debug builds. This can result
in queries requiring more memory running together with
test_insert_large_string and eventually encounter OOM errors.

Testing:
Successfully ran exhaustive tests twice on release build.

Change-Id: I6c950f6860b2f86865dbc5ce60055175e2c0bebc
Reviewed-on: http://gerrit.cloudera.org:8080/12110
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-12-19 21:28:04 +00:00
Tim Armstrong
58cd69ac48 IMPALA-402: test for random partitioning in insert
This adds a basic regression test for the bug reported in IMPALA-402.

Testing:
Exhaustive build.

Looped the modified test overnight.

Change-Id: I4bbca5c64977cadf79dabd72f0c8876a40fdf410
Reviewed-on: http://gerrit.cloudera.org:8080/11799
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-06 00:00:12 +00:00
Sean Mackrory
7a022cf36a IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support.
HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the
ADLS Gen2 service. It's in the hadoop-azure module as a replacement for
WASB. Filesystem semantics should be the same, so skipped tests and
other behavior changes have simply mirrored what is done for ADLS Gen1
by default. Tests skipped on ADLS Gen1 due to eventual consistency of
the Python client can be run against ADLS Gen2.

Change-Id: I5120b071760e7655e78902dce8483f8f54de445d
Reviewed-on: http://gerrit.cloudera.org:8080/11630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-20 06:43:00 +00:00
Tim Armstrong
d05f73f415 IMPALA-7647: Add HS2/Impyla dimension to TestQueries
I used some ideas from Alex Leblang's abandoned patch:
https://gerrit.cloudera.org/#/c/137/ in order to run .test files through
HS2. The advantage of using Impyla is that much of the code will be
reusable for any Python client implementing the standard Python dbapi
and does not require us implementing yet another thrift client.

This gives us better coverage of non-trivial result sets from HS2,
including handling of NULLs, error logs and more interesting result
sets than the basic HS2 tests.

I added HS2 coverage to TestQueries, which has a reasonable variety of
queries and covers the data types in alltypes. I also added
TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage
of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of
results sets with NULLs was limited so I added a couple of queries.

Places where results differ from Beeswax:
* Impyla is a Python dbapi client so must convert timestamps into python datetime
  objects, which only have microsecond precision. Therefore result
  timestamps within nanosecond precision are truncated.
* The HS2 interface reports the NULL type as BOOLEAN as a workaround for
  IMPALA-914.
* The Beeswax interface reported VARCHAR as STRING, but HS2 reports
  VARCHAR.

I dealt with different results by adding additional result sections so
that the expected differences between the clients/protocols were
explicit.

Limitations:
* Not all of the same methods are implemented as for beeswax, so some
  tests that have more complicated interactions with the client will not
  work with HS2 yet.
* We don't have a way to get the affected row count for inserts.

I also simplified the ImpalaConnection API by removing some unnecessary
methods and moved some generic methods to the base class.

Testing:
* Confirmed that it detected IMPALA-7588 by re-applying the buggy patch.
* Ran exhaustive and CentOS6 tests.

Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e
Reviewed-on: http://gerrit.cloudera.org:8080/11546
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-09 00:45:10 +00:00
Tianyi Wang
21d92aacbf IMPALA-7019: Schedule EC as remote & disable failed tests
This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.

Testing: It passes core tests.

Cherry-picks: not for 2.x.

Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-22 01:10:14 +00:00
Joe McDonnell
1e6544f7da IMPALA-7023: Wait for fragments to finish for test_insert.py
The arrangement of tests in test_insert.py changed with
IMPALA-7010, splitting out the memory limit tests into
test_insert_mem_limit(). On exhaustive, the combination
of test dimensions means test_insert_mem_limit() executes
11 different combinations. Each of these statements can
use a large amount of memory and this is not cleaned
up immediately. This has been causing
test_insert_overwrite(), which immediately follows
test_insert_mem_limit(), to hit the process memory limit.

This changes test_insert_mem_limit() to make it wait
for its fragments to finish.

Change-Id: I5642e9cb32dd02afd74dde7e0d3b31bddbee3ccd
Reviewed-on: http://gerrit.cloudera.org:8080/10426
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-16 23:33:57 +00:00
Tim Armstrong
25c13bfdd6 IMPALA-7010: don't run memory usage tests on non-HDFS
Moved a number of tests with tuned mem_limits. In some cases
this required separating the tests from non-tuned functional
tests.

TestQueryMemLimit used very high and very low limits only, so seemed
safe to run in all configurations.

Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662
Reviewed-on: http://gerrit.cloudera.org:8080/10370
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-11 22:41:49 +00:00
Michael Ho
ed72910e96 IMPALA-6262: Always initialize runtime profile for DataSink
This change moves the creation of the runtime profile from DataSink::Prepare()
to the ctor of DataSink derived classes. This makes sure that DataSink::Close()
and other functions can access the profile even if the DataSink fails to initialize.

Testing done: Added a test case which triggers failure in the initialization of output
expressions in a HdfsTableSink. Impalad crashed consistently without the fix.

Change-Id: I2a683000ef180027b929dbebe78bc2a530a4767e
Reviewed-on: http://gerrit.cloudera.org:8080/8770
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-07 09:47:09 +00:00
Tim Armstrong
5b670f49b6 IMPALA-5640: re-enable gzip for parquet insert tests
This addresses a gap in test coverage. There are no known bugs here so
we expect this to work.

Testing:
Ran exhaustive build.

Change-Id: I4bea8bac37bb1e72f3ba0b2e162e6fc544aec8a8
Reviewed-on: http://gerrit.cloudera.org:8080/7398
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-12 00:18:44 +00:00
Lars Volker
933f2ce7fd IMPALA-4876: Remove _test suffix from test names
After IMPALA-4735 has been fixed, this suffix is no longer needed to
make test names prefix-free.

Change-Id: Ie63d145c94c02ec67e81d0c51a33d20685fba73e
Reviewed-on: http://gerrit.cloudera.org:8080/5886
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-02-03 05:16:09 +00:00
David Knupp
f590bc0da6 IMPALA-4750: Rename test infra classes so they don't mimic test classes.
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-

git grep -l 'TestDimension' | xargs \
    sed -i 's/TestDimension/ImpalaTestDimension/g'

git grep -l 'TestMatrix' | xargs \
    sed -i 's/TestMatrix/ImpalaTestMatrix/g'

git grep -l 'TestVector' | xargs \
    sed -i 's/TestVector/ImpalaTestVector/g'

The tests all passed in an exhaustive run on the upstream jenkins
server:

http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/

Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
2017-01-26 23:40:22 +00:00
Lars Volker
8ea21d099f IMPALA-2523: Make HdfsTableSink aware of clustered input
IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.
- It adds tests to the Analyzer test suite.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Reviewed-on: http://gerrit.cloudera.org:8080/4863
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-11-22 02:51:20 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Taras Bobrovytsky
609b80410e Clean up Python test import statements
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.

Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-07-15 23:26:18 +00:00
Tim Armstrong
3810b7c413 IMPALA-3780: avoid many small reads past end of block
The text scanner had some pathological behaviour when reading
significantly past the end of it scan range. E.g. reading a 256mb string
that's split across blocks. ScannerContext defaulted to issuing 1kb
reads, even if the scan node requested significantly more data. E.g. if
the Parquet scanner called ReadBytes(16mb), this was chopped up into
1kb reads, which were reassembled in boundary_buffer_.

Increase the minimum read size in this case to 64kb. Reading that amount
of data should not have any significant overhead even if we only read
a few bytes past the end of the scan range.

ScannerContext implements a saner default algorithm that will work better
if scanners make many small reads: it starts with 64kb reads and doubles
the size of each successive read past the end of the scan range. We
also correct pass the 'read_past_size' into GetNextBuffer(), so that
we always read the right amount of data.

Also save some time by pre-sizing the boundary buffer to the correct
size instead of reallocating it multiple times.

Testing:
Add test case that exercises the code paths for very large strings.

Performance:
The queries in the test case are vastly faster than before. E.g. 0.6s
vs ~60s for the count(*) query.

Change-Id: Id90c5dea44f07dba5dd465cf325fbff28be34137
Reviewed-on: http://gerrit.cloudera.org:8080/3518
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-07-08 19:42:18 -07:00
Sailesh Mukil
ed7f5ebf53 IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
Previously Impala disallowed LOAD DATA and INSERT on S3. This patch
functionally enables LOAD DATA and INSERT on S3 without making major
changes for the sake of improving performance over S3. This patch also
enables both INSERT and LOAD DATA between file systems.

S3 does not support the rename operation, so the staged files in S3
are copied instead of renamed, which contributes to the slow
performance on S3.

The FinalizeSuccessfulInsert() function now does not make any
underlying assumptions of the filesystem it is on and works across
all supported filesystems. This is done by adding a full URI field to
the base directory for a partition in the TInsertPartitionStatus.
Also, the HdfsOp class now does not assume a single filesystem and
gets connections to the filesystems based on the URI of the file it
is operating on.

Added a python S3 client called 'boto3' to access S3 from the python
tests. A new class called S3Client is introduced which creates
wrappers around the boto3 functions and have the same function
signatures as PyWebHdfsClient by deriving from a base abstract class
BaseFileSystem so that they can be interchangeably through a
'generic_client'. test_load.py is refactored to use this generic
client. The ImpalaTestSuite setup creates a client according to the
TARGET_FILESYSTEM environment variable and assigns it to the
'generic_client'.

P.S: Currently, the test_load.py runs 4x slower on S3 than on
HDFS. Performance needs to be improved in future patches. INSERT
performance is slower than on HDFS too. This is mainly because of an
extra copy that happens between staging and the final location of a
file. However, larger INSERTs come closer to HDFS permformance than
smaller inserts.

ACLs are not taken care of for S3 in this patch. It is something
that still needs to be discussed before implementing.

Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Reviewed-on: http://gerrit.cloudera.org:8080/2574
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:49 -07:00
Michael Brown
2826df93d2 IMPALA-3427: test_insert_wide_table: use unique_database fixture
test_insert_wide_table runs in parallel attempts to DROP/CREATE tables
with the same name into the same database. A race thus exists in which
one tests can stomp on each others' tables.

The fix is to use the unique_database fixture, in which each test can
run in parallel within its own database, and thus its own table.

Change-Id: If3b22f1b089d20c6024d6d680af82f102342394e
Reviewed-on: http://gerrit.cloudera.org:8080/2871
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Michael Brown <mikeb@cloudera.com>
2016-05-12 14:17:44 -07:00
Vlad Berindei
b6c20b2a40 Allow Impala to run against local filesystem.
Allow Impala to start only with a running HMS (and no additional services like HDFS,
HBase, Hive, YARN) and use the local file system.

Skip all tests that need these services, use HDFS caching or assume that multiple impalads
are running.

To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and
WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has
permissions since this is the location where the test data will be extracted.

Test coverage (with core strategy) in comparison with HDFS and S3:
HDFS             1348 tests passed
S3               1157 tests passed
Local Filesystem 1161 tests passed

Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03
Reviewed-on: http://gerrit.cloudera.org:8080/1352
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Readability: Alex Behm <alex.behm@cloudera.com>
2015-12-05 06:48:32 +00:00
Casey Ching
074e5b4349 Remove hashbang from non-script python files
Many python files had a hashbang and the executable bit set though
they were not intended to be run a standalone script. That makes
determining which python files are actually scripts very difficult.
A future patch will update the hashbang in real python scripts so they
use $IMPALA_HOME/bin/impala-python.

Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba
Reviewed-on: http://gerrit.cloudera.org:8080/599
Reviewed-by: Casey Ching <casey@cloudera.com>
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-04 05:26:07 +00:00
ishaan
09e5eaeda2 Introduce classes for pytest's skipif markers.
This patch encapsulates pytests's skipif markers in classes. It leads to the following
benefits:
  - Provide context and grouping for tests being skipped.
  - As we improve test reporting, annotations will give us a better idea of coverage.

Change-Id: Ib0557fb78c873047c214bb62bb6b045ceabaf0c9
Reviewed-on: http://gerrit.cloudera.org:8080/297
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/343
2015-04-19 03:09:59 +00:00
Dan Hecht
c8fb10f50a S3: Some more work toward enabling additional S3 test coverage
Add skip markers for S3 that can be used to categorize the tests that
are skipped against S3 to help see what coverage is missing.  Soon
we'll be reworking some tests and/or adding new tests to get back the
important gaps.

Also, add a mechanism to parameterize paths in the .test files, and
start using these new variables.  This is a step toward enabling some
more tests against S3.

Finally, a fix for buildall.sh to stop the minicluster before applying
the metastore snapshot. Otherwise, this fails since the ms db is in
use.

Change-Id: I142434ed67bed407e61d7b2c90f825734fc0dce0
Reviewed-on: http://gerrit.cloudera.org:8080/127
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-03 08:29:13 +00:00
ishaan
11cd7d1d46 Blacklist tests that don't work on s3
This patch introduces a new pytest marker that skip tests that currently don't work when
s3 is used as the underlying file system. The set of blacklisted tests is a superset of
tests that cannot be run with s3. Follow up patches will remove some of the test files
from the blacklist.

Change-Id: I39a58223d3435f0bd6496ffd00a2d483b751693d
Reviewed-on: http://gerrit.cloudera.org:8080/82
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-02-24 01:43:28 +00:00
Nong Li
d52a620737 Add support for writing compressed text.
Change-Id: I314b925594801ae4b5c47248d998801aa0b37270
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4205
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-09-07 22:08:30 -07:00
Victor Bittorf
820e1c070b Support writing to Avro files
Introduces support for writing tables stored as Avro files. This supports writing all
data types except TIMESTAMP. Supports the following COMPRESSION_CODECs: NONE, DEFLATE,
SNAPPY.

Change-Id: Ica62063a4f172533c30dd1e8b0a11856da452467
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3863
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 15c6066d05d5077bee0d5123d26777b0715eb9c6)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4056
2014-08-27 13:41:42 -07:00
Victor Bittorf
f2ef06bef1 SEQUENCEFILE: Add support for writing sequence files.
This supports both uncompressed and block compressed formats. Row compressed formats are
not supported. The type of compression is specified using a query parameter
COMPRESSION_CODEC with values NONE, GZIP, BZIP2, and SNAPPY.

Note: this patch only has basic testing. More extensive testing will be done when this
avro writer is used in data loading.

Change-Id: Id284bd4f3a28e27e49d56b1127cdc83c736feb61
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3541
Reviewed-by: Victor Bittorf <victor.bittorf@cloudera.com>
Tested-by: jenkins
2014-08-17 12:45:10 -07:00
Nong Li
fd35cee887 Reorganize/reduce end to end test time.
This patch does a few things:
1) Move the metadata tests into their own folder under tests/. I think it's useful to
loosely categorize them so it's easier to run a subset of the tests that are most
useful for the changes you are making.

2) Reduce the test vectors for query_tests. We should have identical coverage in
the daily exhaustive runs but the normal runs should be much better. In particular,
deemphasizing scanner tests since that code is more stable now.

3) Misc test cleanup/consolidate python test files/etc.

Change-Id: I03c2f34877aed192c2a50665bd5e15fa85e12f1e
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3831
Tested-by: jenkins
Reviewed-by: Nong Li <nong@cloudera.com>
2014-08-17 12:43:57 -07:00
Ippokratis Pandis
15f26f3991 Fixing a bug in test_insert that was inserting dummy records in tinytable for text/codec
Change-Id: I3e7ac5cd97de6ff4cd0eab411ac4ed4ff6c77508
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3751
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-08-05 13:19:50 -07:00
Ippokratis Pandis
99b23f1285 Skipping to run TestUnsupportedInsertFormats test for compressed test.
Realized that if we run it with text/codec it will successfully insert a record
("hi", "there") in tinytable, which is undesirable. Skipping this test for
compressed text.

Change-Id: I8076f4fcac17d7f7530ac4993b0d4b3bd89d5fa0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3668
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
2014-08-05 13:19:49 -07:00
Ippokratis Pandis
572a4aed95 Updating the expectation of TestUnsupportedInsertFormats test.
Previously it was expecting that inserting in any combination of text/codec where
codec!=none would fail. With the read compressed text patch, such an insert will
succeed by inserting uncompressed text, ignoring the compression type. This is a
temporary fix for this (code coverage) test to succeed, until the text writers are
in.

Change-Id: Ib34bf661ee90e3bcbb76a225df8809ae13e835fa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3659
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2014-08-05 13:19:49 -07:00
Lenni Kuff
b3ebfddadd Allow tests to access query result column values by col alias or col position
For example, you can now do something like:
result_set = execute("select * from tbl")
result_row = result_set[0]
result_row['col_alias'] or result_row[4]

to access column values. If the column alias/position does not exist an exception is
thrown.

Change-Id: Ie4b65619ed17fd90bf39e0966a7fc7e1180dbc5c
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2719
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2922
2014-06-09 23:24:26 -07:00
Skye Wanderman-Milne
60db4d4d82 CDH-18416: Don't inline ReadWriteUtil::ReadZLong()
For wide Avro tables, ReadZLong() would get inlined many times into a
single function body, causing LLVM to crash. Not inlining doesn't seem
to have a performance impact on narrow tables, and helps with wide
tables.

This change also adds tests over wide (i.e. many-column) tables. The
test tables are produced by specifying shell commands to generate test
tables in functional_schema_template.sql, which are executed in
generate-schema-statements.py. In the SQL templates, sections starting
with a ` are treated as shell commands. The output of the shell
command is then used as the section text. This is only a starting
point; it isn't currently implemented for all sections, and may have
to be tweaked if we use this mechanism for all tables.

Change-Id: Ife0d857d19b21534167a34c8bc06bc70bef34910
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2206
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
(cherry picked from commit 1c5951e3cce25a048208ab9bb3a3aed95e41cf67)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2353
Tested-by: jenkins
2014-04-28 15:58:15 -07:00