This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prototype of HdfsJsonScanner implemented based on rapidjson, which
supports scanning data from splitting json files.
The scanning of JSON data is mainly completed by two parts working
together. The first part is the JsonParser responsible for parsing the
JSON object, which is implemented based on the SAX-style API of
rapidjson. It reads data from the char stream, parses it, and calls the
corresponding callback function when encountering the corresponding JSON
element. See the comments of the JsonParser class for more details.
The other part is the HdfsJsonScanner, which inherits from HdfsScanner
and provides callback functions for the JsonParser. The callback
functions are responsible for providing data buffers to the Parser and
converting and materializing the Parser's parsing results into RowBatch.
It should be noted that the parser returns numeric values as strings to
the scanner. The scanner uses the TextConverter class to convert the
strings to the desired types, similar to how the HdfsTextScanner works.
This is an advantage compared to using number value provided by
rapidjson directly, as it eliminates concerns about inconsistencies in
converting decimals (e.g. losing precision).
Added a startup flag, enable_json_scanner, to be able to disable this
feature if we hit critical bugs in production.
Limitations
- Multiline json objects are not fully supported yet. It is ok when
each file has only one scan range. However, when a file has multiple
scan ranges, there is a small probability of incomplete scanning of
multiline JSON objects that span ScanRange boundaries (in such cases,
parsing errors may be reported). For more details, please refer to
the comments in the 'multiline_json.test'.
- Compressed JSON files are not supported yet.
- Complex types are not supported yet.
Tests
- Most of the existing end-to-end tests can run on JSON format.
- Add TestQueriesJsonTables in test_queries.py for testing multiline,
malformed, and overflow in JSON.
Change-Id: I31309cb8f2d04722a0508b3f9b8f1532ad49a569
Reviewed-on: http://gerrit.cloudera.org:8080/19699
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Convert SkipIf.not_hdfs to SkipIf.not_dfs for tests that require
filesystem semantics, adding more feature test coverage with Ozone.
Creates a separate not_scratch_fs flag for scratch dir tests as they're
not supported with Ozone yet. Filed IMPALA-11730 to address this.
Preserves not_hdfs for a specific test that uses the dfsadmin CLI to put
it in safemode.
Adds sfs_ofs_unsupported for SmallFileSystem tests. This should work for
many of our filesystems based on
ebb1e2fa99/ql/src/java/org/apache/hadoop/hive/ql/io/SingleFileSystem.java (L62-L87). Makes sfs tests work on S3.
Adds hardcoded_uris for IcebergV2 tests where deletes are implemented as
hardcoded URIs in parquet files. Adding a parquet read/write library for
Python is beyond the scope if this patch.
Change-Id: Iafc1dac52d013e74a459fdc4336c26891a256ef1
Reviewed-on: http://gerrit.cloudera.org:8080/19254
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Impala supports reading ORC files by default for quite some time.
Removed enable_orc_scanner flag and related code and test, disabling
ORC support is no longer possible.
Removed notes on how to disable ORC support from docs.
Change-Id: I7ff640afb98cbe3aa46bf03f9bff782574c998a5
Reviewed-on: http://gerrit.cloudera.org:8080/18188
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Base on the discussion of the previous patch, we decide to not allow
ordinals in the HAVING clause since 4.0. It's a non-standard feature
that unintentionally supported by Impala 3.x and earlier versions.
This patch disables it by default, and add a feature flag to turn it on
for users that do depend on it.
Tests:
- Modify existing FE tests to test on the flag.
- Add custom cluster test to verify the flag works.
Change-Id: I0a57b8b65b046fae483e485e8391f8222fa586a5
Reviewed-on: http://gerrit.cloudera.org:8080/17415
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-10113 adds a test for disabling the incremental_metadata_updates
flag to verify the metadata propagation still working correctly. The
test invokes two test files which is used in metadata/test_ddl.py. One
test file is about hdfs caching. It should only be run on HDFS file
system. So we should mark the test with "SkipIf.not_hdfs".
Tests:
- Run CORE test on S3 build.
Change-Id: I0b922de84cff0a1e0771d5a8470bdd9f153f85f0
Reviewed-on: http://gerrit.cloudera.org:8080/16616
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds a feature flag, enable_incremental_metadata_updates, to
turn off incremental metadata (i.e. partition level metadata)
propagation from catalogd to coordinators. It defaults to true. When
setting to false, catalogd will send metadata updates in table
granularity (the legacy behavior).
Also fixes a bug of logging an empty aggregated partition update log
when no partitions are changed in a DDL.
Tests:
- Run CORE tests with this flag set to true and false.
- Add tests with enable_incremental_metadata_updates=false.
Change-Id: I98676fc8ca886f3d9f550f9b96fa6d6bff178ebb
Reviewed-on: http://gerrit.cloudera.org:8080/16436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The previous approach could lead to hangs or cryptic error messages
because it removed the ORC data type from a lookup table.
Instead check explicitly in the planner for ORC scans and throw a
more helpful error message.
Testing:
Added custom cluster test to exercise code and check error message.
Change-Id: I209e79b18745c48d0182800a916d6566083f4609
Reviewed-on: http://gerrit.cloudera.org:8080/11835
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>