This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
We've supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).
In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in
io buffer read from DiskIoMgr). The ORC lib can materialize ORC column
binaries into its representation (orc::ColumnVectorBatch). Then we
transform values in orc::ColumnVectorBatch into impala::Tuples in
hdfs-orc-scanner. We don't need to do anything about decoding/decompression
since they are handled by the ORC lib. Fortunately, the ORC lib already
supports complex types, we can still leverage it to support complex types.
What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need in the form required by the ORC
lib (Get list of ORC type ids from tuple descriptors)
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
Impala's representation (Slots/Tuples/RowBatches)
To format the materialization, we implement several ORC column readers
in hdfs-orc-scanner. Each kind of reader treats a column type and
transforms outputs of the ORC lib into tuple/slot values.
Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.
* Run exhaustive tests in DEBUG and RELEASE builds.
Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Reviewed-on: http://gerrit.cloudera.org:8080/12168
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The concurrent test driver did not pick them up because the name
prefix did not match the workload dirname. The query test driver
used a hardcoded prefix.
Testing done: Ran tests/stress/concurrent_select.py,
tests/query_test/test_tpch_nested_queries.py locally; latter
passed, former hit IMPALA-5855 after correctly locating all 22
new tpch_nested query files.
Change-Id: Ie067b201ae20b4f4c61a98be7ac1ec5a3f8febd8
Reviewed-on: http://gerrit.cloudera.org:8080/7891
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins
This is intended to be merged at the same time as Part 2 but is
separated out to make the change more reviewable. Part 2 assumes
that it does not need special logic to handle this mode (e.g.
because the old aggs and joins don't use reservation).
Disable the --enable_partitioned_{aggregation,hash_join} options
and remove all product and test code associated with them.
Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Reviewed-on: http://gerrit.cloudera.org:8080/7102
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This patch addresses warning messages from pytest re: the imported
TestMatrix, TestVector, and TestDimension classes, which were being
collected as potential test classes. The fix was to simply prepend
the class names with Impala-
git grep -l 'TestDimension' | xargs \
sed -i 's/TestDimension/ImpalaTestDimension/g'
git grep -l 'TestMatrix' | xargs \
sed -i 's/TestMatrix/ImpalaTestMatrix/g'
git grep -l 'TestVector' | xargs \
sed -i 's/TestVector/ImpalaTestVector/g'
The tests all passed in an exhaustive run on the upstream jenkins
server:
http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/
Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1
Reviewed-on: http://gerrit.cloudera.org:8080/5794
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Many of our test scripts have import statements that look like
"from xxx import *". It is a good practice to explicitly name what
needs to be imported. This commit implements this practice. Also,
unused import statements are removed.
Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8
Reviewed-on: http://gerrit.cloudera.org:8080/3444
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
We do not support nested types combined with the old aggs and joins.
Change-Id: I81401dd4d482d46e678091989ac9d178ac771d01
Reviewed-on: http://gerrit.cloudera.org:8080/1078
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins