impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
stiga-huang	cbb6fa1cf2	Update GIT_HASH for version 3.4.2 Change-Id: I354685e72a8835abbf2e102b7c8192416324a394 3.4.2-rc2 3.4.2	2024-06-17 06:18:42 +08:00
stiga-huang	dd62dd98b9	IMPALA-11648: validate-java-pom-versions.sh should skip pom.xml in toolchain bin/validate-java-pom-versions.sh validates the pom.xml files have consistent version strings. However, it checks all files in IMPALA_HOME when building from the tarball. There are some pom.xml files in the toolchain directory that should be skipped. This patch modifies the find command used in the script from find ${IMPALA_HOME} -name pom.xml to find ${IMPALA_HOME} -path ${IMPALA_TOOLCHAIN} -prune -o -name pom.xml -print to list pom.xml files excluding the toolchain directory. More examples about how to use `find -prune` can be found in this blog: https://www.theunixschool.com/2012/07/find-command-15-examples-to-exclude.html Tests: - Built from the tarball locally - Modified version strings in some pom.xml files and verified validate-java-pom-versions.sh is still able to find them. Change-Id: I55bbd9c85ab0e4a7c054ee2abd70eae0f55c8a01 Reviewed-on: http://gerrit.cloudera.org:8080/19122 Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-06-17 06:17:10 +08:00
stiga-huang	8c53f574d9	Update GIT_HASH for version 3.4.2 3.4.2-rc1	2024-06-14 15:21:21 +08:00
Xiang Yang	a5e5aa16d8	IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. Independent linux packaging related content to package/CMakeLists.txt to make it more clearly. This patch also add LICENSE and NOTICE file in the final package. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Backport note for 3.4.x: - Resolved conflicts in CMakeLists.txt and modified package/CMakeLists.txt accordingly. Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Reviewed-on: http://gerrit.cloudera.org:8080/20263 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21410 Reviewed-by: Xiang Yang <yx91490@126.com> Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-05-21 11:11:25 +00:00
Grant Henke	f507a02b60	IMPALA-9577: [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: `eb2b70d4b9` While making this change, I removed all ntp related packages and special handling as they should not be needed in a development environment any more. I also added curl and gawk which were missing in my Docker ubuntu environment and broke my testing. Testing: I tested with the steps below using Docker for Mac: docker rm impala-dev docker volume rm impala docker run --privileged --interactive --tty --name impala-dev -v impala:/home -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash apt-get update apt-get install sudo adduser --disabled-password --gecos '' impdev echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers su - impdev cd ~ sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala cd ~/Impala export IMPALA_HOME=`pwd` git remote add fork https://github.com/granthenke/impala.git git fetch fork git checkout kudu-system-time $IMPALA_HOME/bin/bootstrap_development.sh source $IMPALA_HOME/bin/impala-config.sh (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest) (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest) $IMPALA_HOME/bin/start-impala-cluster.py ./tests/run-tests.py query_test/test_kudu.py Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Reviewed-on: http://gerrit.cloudera.org:8080/15568 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alexey Serbin <aserbin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21422 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-05-14 21:10:12 +00:00
stiga-huang	fe7299d2cf	IMPALA-12999: Add log4j.properties to the DEB/RPM packages log4j.properties is required to configure log4j before logs from it are redirected to glog (done in GlogAppender#Install()). This is crucial to show error logs during initialization, especially while lauching the JVM. See the JIRA description for an example. This copies log4j.properties from fe/src/test/resources directly since it hasn't changed for years. Change-Id: Iee0b9699ef313aa8e94bd351fa51fad3ea0cdf57 Reviewed-on: http://gerrit.cloudera.org:8080/21293 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21299 Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-15 10:01:24 +00:00
stiga-huang	3df9c0f139	IMPALA-12979: Avoid using wildcard in CLASSPATH Wildcard (*) in the classpath might not be resolved correctly. To be robust, this patch modifies the script to list jars deployed by the package and add them one by one to the CLASSPATH. Tests: - Verified on CentOS 7.9 Change-Id: Ib77d13684dbb6ed0ef8315fbc65fa6ef18ead120 Reviewed-on: http://gerrit.cloudera.org:8080/21265 Reviewed-by: Zihao Ye <eyizoha@163.com> Reviewed-by: Xiang Yang <yx91490@126.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
Xiang Yang	ff56cf9609	IMPALA-12362: (part-2/4) Optimize default configurations for packaging module. To avoid absolutely paths and keep it simple, optimize the default configurations for packaging module by remove or change some entries. At the same time, add license header to 'package/conf/-site.xml' and rename them to '-site.xml.template' to force administrator making configurations appropriate for their cluster. Testing: - Manually deploy packages on Ubuntu22.04 and verify it. Change-Id: Ifda229b779a3d6fca647bb81fe23dd61ad7e5d66 Reviewed-on: http://gerrit.cloudera.org:8080/20928 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21264 Reviewed-by: Xiang Yang <yx91490@126.com> Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
Xiang Yang	63bc975e03	IMPALA-12362: (part-1/4) Refactor service management scripts. Uniform all service management scripts to 'bin/impala.sh', administrator can customize environment variables based on their cluster at 'conf/impala-env.sh', as well as set flags at 'conf/impalad_flags...'. Usually administrator can override the environment variables in 'conf/impala-env.sh' with commandline arguments. The same is true for flags in 'conf/impalad_flags...'. This flexibility can be used in scenarios such as supporting multi-instance deployments. The directory structure has been adjusted as follows: - put java libs to 'lib/jars' directory. - put native libs to 'lib/native' directory. - put impalad binary to 'sbin' directory. Testing: - Manually deploy packages on Ubuntu22.04 and verify it. Backport Notes: - Resolved trivial conflicts in CMakeLists.txt Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147 Reviewed-on: http://gerrit.cloudera.org:8080/20921 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21263 Reviewed-by: Xiang Yang <yx91490@126.com> Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
zhangyifan27	3fe856f8cd	IMPALA-12288: Add BUILD_WITH_NO_TESTS option to remove test targets This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not to generate test targets. In order to be consistent with the previous test workflow, this option is only set ON when building impala using the 'buildall.sh' script with '-notest' and '-package' flags. This is useful for a packaging build which do not need to build all test binaries. Testing: - Ran 'buildall.sh -release -package' with and without '-notests' flag and verified generated executables. Backport notes: resolved conflicts in following files - be/src/codegen/CMakeLists.txt - be/src/exec/parquet/CMakeLists.txt - be/src/testutil/CMakeLists.txt - be/src/util/CMakeLists.txt Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a Reviewed-on: http://gerrit.cloudera.org:8080/20294 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21262 Reviewed-by: Yifan Zhang <chinazhangyifan@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
Bikramjeet Vig	3cf30008cf	IMPALA-10052: Expose daemon health endpoint for statestore and catalog This change exposes the daemon health of statestored and catalogd via an HTTP endpoint '/healthz'. If the server is healthy, this endpoint will return HTTP code 200 (OK). If it is unhealthy, it will return 503 (Service Unavailable). This is consistent with the endpoint added for impalads in IMPALA-8895. Testing: - Extended test in test_web_pages.py Change-Id: I7714734df8e50dabbbebcb77a86a5a00bd13bf7c Reviewed-on: http://gerrit.cloudera.org:8080/16295 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21261 Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
Laszlo Gaal	69ded221ed	IMPALA-9845: Point Maven and Ant downloads to stable locations Ant released a new version in May 2020, which made the URL in bootstrap_system.sh obsolete. At the same time Apache created new rules for the download locations, moving older releases to archive.apache.org. This patch changes the download URLs for Maven and Ant to point to the stable locations at archive.apache.org. These locations don't change when a new version of a project is released, so downloads pulling a specific version will not be affected by a new release. At the same time new releases are stored at the archive site as well, so this location works for all versions. Backport note: Just need to change the URL of ant. Change-Id: I1875f260b931ef096fc91a4723f91310225c55c9 Reviewed-on: http://gerrit.cloudera.org:8080/16062 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21260 Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
stiga-huang	483dc98305	IMPALA-10262: (Addendum) fix incorrect paths of shared libs in building the package This fixes the wrong paths for adding shared libs to the package due to env var IMPALA_TOOLCHAIN_PACKAGES_HOME doesn't exist in the 3.4.x branch. Change-Id: I6469364a87441acdc7c70a8997bcaa21d720dde7 Reviewed-on: http://gerrit.cloudera.org:8080/21259 Reviewed-by: Zihao Ye <eyizoha@163.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-04-12 09:43:19 +00:00
stiga-huang	8e9c5a5d17	IMPALA-10262: RPM/DEB Packaging Support This patch bases on a previous patch contributed by Shant Hovsepian: https://gerrit.cloudera.org/c/16612/ It adds a new option, -package, to buildall.sh for building a package for the current OS type (e.g. CentOS/Ubuntu). You can also use "make/ninja package" to build the package. Scripts for launching the services and the required configuration files are also added. Tests: - Built on Ubuntu 18.04/20.04 and CentOS 7 using ./buildall.sh -noclean -skiptests -release -package - Deployed the RPM package on a CDP cluster. Verifed the scripts. - Deployed the DEB package on a docker container. Verified the scripts. Resolved trivial backport conflicts in: - CMakeLists.txt - bin/bootstrap_system.sh - bin/jenkins/build-all-flag-combinations.sh - buildall.sh - docker/install_os_packages.sh Non-trivial backport notes: CMake function cmake_host_system_information does not recognize keys of DISTRIB_ID and DISTRIB_VERSION_ID (required version >= 3.22). Currently version used in branch-3.4 is 3.14.3. Details to remove using them: - One usage of DISTRIB_ID is to skip packaging impala-shell on redhat8. Removes it since redhat8 is not supported on branch-3.4. - Another usage of DISTRIB_ID is to determine the package file type (DEB vs RPM) based on the OS. Replaces it with content of the os-release files. - Removes the usage of OS_DISTRIB_VERSION_ID in the package file name Tests: - Built on Ubuntu 18.04 Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Reviewed-on: http://gerrit.cloudera.org:8080/18939 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/21130 Tested-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Zihao Ye <eyizoha@163.com>	2024-04-07 05:05:26 +00:00
Joe McDonnell	54faec0523	IMPALA-10057: Fix log spew by using jars in the classpath Some tests saw log spew that causes the INFO log files to be filled with output like this: E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) ... It turns out that the catalogd/impalad use a CLASSPATH in tests that refers to fe/target/classes. The maven command that runs frontend tests recompiles these classes and causes the files in fe/target/classes to be deleted and recreated. There are race conditions where this causes the symptoms above. This changes the CLASSPATH to use the frontend jars, which are not impacted by the machinations on fe/target/classes. To find the appropriate jar, set-classpath.sh needs to know the Impala version. This adds IMPALA_VERSION in bin/impala-config.sh to provide an easy to use environment variable. To make the versioning more uniform, this modifies bin/save-version.sh to use this environment variable. It also adds a check to make sure that the Java pom.xml files use the same version as the environment variable. It fails the build if the Java pom.xml files do not match. Testing: - Ran core jobs - Checked the log file sizes on jobs - Changed a Java pom.xml's version and verified that bin/validate-java-pom-versions.sh fails Merge conflicts: - Change version string "4.1.0" to "3.4.2". Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb Reviewed-on: http://gerrit.cloudera.org:8080/18415 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-on: http://gerrit.cloudera.org:8080/18879 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-on: http://gerrit.cloudera.org:8080/21129 Reviewed-by: Zihao Ye <eyizoha@163.com>	2024-04-07 05:05:26 +00:00
ttttttz	2963b29a32	IMPALA-12565: Fix crash triggered by calling pmod() UDF When the pmod() UDF is called, if the divisor is 0, it will cause the impalad to crash. In this case, the result of the pmod() UDF should be NULL. Tests: * add a test in exprs.test Change-Id: Idcc274564a4b5b0872eb0c0c882c2f15e3247785 Reviewed-on: http://gerrit.cloudera.org:8080/20709 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-10 08:07:18 +08:00
liuyao	ee9c20f936	IMPALA-5476: Fix catalogd restart brings stale metadata ImpaladCatalog#updateCatalog() doesn't trigger a full topic update request when detecting catalogServiceId changes. It just updates the local catalogServiceId and throws an exception to abort applying the DDL/DML results. This causes a problem when catalogd is restarted and the DDL/DML is executed on the restarted instance. In this case, only the local catalogServiceId is updated to the latest. The local catalog remains stale. Then when dealing with the following updates from statestore, the catalogServiceId always matches, so updates will be applied without exceptions. However, the catalog objects usually won't be updated since they have higher versions (from the old catalogd instance) than those in the update. This brings the local catalog out of sync until the catalog version of the new catalogd grows larger enough. Note that in dealing with the catalog updates from statestore, if the catalogServiceId unmatches, impalad will request a full topic update. See more in ImpalaServer::CatalogUpdateCallback(). This patch fixes this issue by checking the catalogServiceId before invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match the one in the DDL/DML result, wait until it changes. The following update from statestore will change it and unblocks the DDL/DML thread. Testing add several tests in tests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Reviewed-on: http://gerrit.cloudera.org:8080/17645 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-09 21:39:43 +08:00
guojingfeng	4b7fc7773e	IMPALA-10310: Fix couldn't skip rows in parquet file on NextRowGroup In practice we recommend that hdfs block size should align with parquet row group size.But in fact some compute engine like spark, default parquet row group size is 128MB, and if ETL user doesn't change the default property spark will generate row groups that smaller than hdfs block size. The result is a single hdfs block may contain multiple parquet row groups. In planner stage, length of impala generated scan range may be bigger than row group size. thus a single scan range contains multiple row group. In current parquet scanner when move to next row group, some of internal stat in parquet column readers need to reset. eg: num_buffered_values_, column chunk metadata, reset internal stat of column chunk readers. But current_row_range_ offset is not reset currently, this will cause errors "Couldn't skip rows in file hdfs://xxx" as IMPALA-10310 points out. This patch simply reset current_row_range_ to 0 when moving into next row group in parquet column readers. Fix the bug IMPALA-10310. Testing: * Add e2e test for parquet multi blocks per file and multi pages per block * Ran all core tests offline. * Manually tested all cases encountered in my production environment. Change-Id: I964695cd53f5d5fdb6485a85cd82e7a72ca6092c Reviewed-on: http://gerrit.cloudera.org:8080/16697 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-09 21:38:46 +08:00
Zoltan Borok-Nagy	96ed0cf8ff	IMPALA-10257: Relax check for page filtering HdfsParquetScanner::CheckPageFiltering() is a bit too strict. It checks that all column readers agree on the top level rows. Column readers have different strategies to read columns. One strategy reads ahead the Parquet def/rep levels, the other strategy reads levels and values simoultaneously, i.e. no readahead of levels. We calculate the ordinal of the top level row based on the repetition level. This means when we readahead the rep level, the top level row might point to the value to be processed next. While top level row in the other strategy always points to the row that has been completely processed last. Because of this in CheckPageFiltering() we can allow a difference of one between the 'current_row_' values of the column readers. I also got rid of the DCHECK in CheckPageFiltering() and replaced it with a more informative error report. Testing: * added a test to nested-types-parquet-page-index.test Change-Id: I01a570c09eeeb9580f4aa4f6f0de2fe6c7aeb806 Reviewed-on: http://gerrit.cloudera.org:8080/16619 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-09 21:38:31 +08:00
Csaba Ringhofer	d7209700b9	IMPALA-9572: Fix DCHECK in nested Parquet scanning The issue occurred when there were skipped pages and a column inside a collection was scanned, but its position was not needed. The repetition level still needs to be read in this case, as the skipped ranges are set in top level rows, so collection items need to know which top level row do they belong to. A DCHECK in StrideWriter's constructor was hit, otherwise the code ran correctly in release mode. The DCHECK is moved to functions where the condition would actually cause problems. Testing: - added and ran a regression test Change-Id: I5e8ef514ead71f732c73f910af7fd1aecd37bb81 Reviewed-on: http://gerrit.cloudera.org:8080/15598 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-09 21:38:00 +08:00
Zoltan Borok-Nagy	8210310182	IMPALA-9952: Fix page index filtering for empty pages As IMPALA-4371 and IMPALA-10186 points out, Impala might write empty data pages. It usually does that when it has to write a bigger page than the current page size. If we really need to write empty data pages is a different question, but we need to handle them correctly as there are already such files out there. The corresponding Parquet offset index entries to empty data pages are invalid PageLocation objects with 'compressed_page_size' = 0. Before this commit Impala didn't ignore the empty page locations, but generated a warning. Since invalid page index doesn't fail a scan by default, Impala continued scanning the file with semi-initialized page filtering. This resulted in 'Top level rows aren't in sync' error, or a crash in DEBUG builds. With this commit Impala ignores empty data pages and still able to filter the rest of the pages. Also, if the page index is corrupt for some other reason, Impala correctly resets the page filtering logic and falls back to regular scanning. Testing: * Added unit test for empty data pages * Added e2e test for empty data pages * Added e2e test for invalid page index Change-Id: I4db493fc7c383ed5ef492da29c9b15eeb3d17bb0 Reviewed-on: http://gerrit.cloudera.org:8080/16503 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-09 21:35:19 +08:00
stiga-huang	d491e325fb	IMPALA-12755: Disable snapshots from cdh.rcs.releases.repo This is a 3.x-only change. There is no longer the pom file for cdh-root in version 6.x-SNAPSHOT at https://repository.cloudera.com/artifactory/cdh-releases-rcs This patch disables using snapshots from this repo so the pom file can be downloaded from the native-toolchain S3 bucket. This also bumps the Maven version installed by bootstrap_system.sh and bootstrap_build.sh to v3.9.2 since the older version can't be downloaded. Test: - Run CORE tests Change-Id: I14b6aa49ee6e877b5020348d7bddb00f88a284b9 Reviewed-on: http://gerrit.cloudera.org:8080/20995 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-03-09 13:27:07 +00:00
stiga-huang	bc15ed61a6	IMPALA-11406: Fix incorrect duration log for authorization IMPALA-8443 extends EventSequence.markEvent() to return the duration between the last and the current event. However, the duration is calculated using the start time, not the last time it's invoked, which causes misleading time in logs of "Authorization check took n ms". This fixes the bug and also adds a log for the analysis duration. Change-Id: I8b665f1b4ac86577711598ce9d845cf82fedbcd7 Reviewed-on: http://gerrit.cloudera.org:8080/18682 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-25 15:16:33 +08:00
ttttttz	d1f0595efe	IMPALA-12102: Avoid memory leaks in the handling of JNI exceptions During the processing of JNI Exceptions, some local references were not released in a timely manner, which may lead to memory leaks in the JVM. Testing: - Manually verified that the memory leak doesn't occur in the local development environment. Change-Id: I4843df07dd0f9d3dc237f91db4ec00721ebbd680 Reviewed-on: http://gerrit.cloudera.org:8080/19810 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-25 15:12:10 +08:00
stiga-huang	1292d18ad6	IMPALA-11444: Fix wrong results when reading wide rows from ORC After IMPALA-9228, ORC scanner reads rows into scratch batch where we perform conjuncts and runtime filters. The survived rows will be picked by the output row batch. We loop this until the output row batch is filled (1024 rows by default) or we finish reading the ORC batch (1024 rows by default). Usually the loop will have only 1 iteration since the scratch batch capacity is also 1024. All rows of the current ORC batch can be materialized into the scratch batch. However, when reading wide rows that have tuple size larger than 4096 bytes, the scratch batch capacity will be reduced to be lower 1024, i.e. the scratch batch can store less than 1024 rows. In this case, we need more iterations in the loop. The bug is that we didn't commit rows to the output row batch after each iteration. The suvived rows will be ovewritten in the second iteration. This is fixed in a later optimization (IMPALA-9469) which is missing in the 3.x branch. This patch only pick the fix of it. Tests: - Add test on wide tables with 2K columns Change-Id: I09f1c23c817ad012587355c16f37f42d5fb41bff Reviewed-on: http://gerrit.cloudera.org:8080/18745 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-25 15:02:28 +08:00
ttttttz	763b378f4f	IMPALA-11296: Fix infinite loop when reading orc files When querying an ORC table, selecting only the missing fields of ORC files causes the query to be executed indefinitely. The corresponding execution node will see some resident threads that occupy CPU abnormally. The problem is caused by this: when OrcComplexColumnReader.children_.empty() is true, OrcComplexColumnReader.row_idx_ will remain constant, causing an infinite loop at HdfsOrcScanner::TransferTuples(). We should allow empty 'children_' for original files. Testing: - Added a test to test_scanners.py that ensures the query can be executed successfully when selecting only the missing fields of ORC files. Change-Id: Ic7ecf5e9c94ffcc02d3ca6c2ec8d55a685ec3968 Reviewed-on: http://gerrit.cloudera.org:8080/18571 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-25 15:02:24 +08:00
stiga-huang	bf5dd6cc03	Prepare 3.4.1 RC3 Change-Id: I78e6df34da6f6d5510b3b917a755455dea84e6a0 3.4.1 3.4.1-rc3	2022-03-24 17:18:28 +08:00
stiga-huang	eb1ed66fa4	Clone the asf-3.4 branch of Impala-lzo Change-Id: I55ffab97e900de7fed70a67a186c14528137f5af	2022-03-24 09:10:35 +08:00
stiga-huang	2c7e69f3b5	Prepare 3.4.1 RC2 Change-Id: Id6e3387ac6c4989f03b456fc12237b84387899f3 3.4.1-rc2	2022-03-23 14:25:22 +08:00
stiga-huang	134b6492ed	Revert "IMPALA-9242: Filter privileges before returning them to Sentry" This reverts commit `e7d10df2ec`.	2022-03-22 21:11:09 +08:00
stiga-huang	b114369892	Prepare 3.4.1 RC1 3.4.1-rc1	2022-03-15 19:51:15 +08:00
stiga-huang	0d55beb121	Depend on master branch of hadoop-lzo to avoid download issues	2022-03-15 19:50:02 +08:00
Zoltan Borok-Nagy	5ecba2d46b	IMPALA-10426: Fix crash when inserting invalid timestamps Insertion of invalid timestamps causes Impala to crash when it uses the INT64 Parquet timestamp types. This patch fixes the error by checking for null values in Int64TimestampColumnWriterBase::ConvertValue(). Testing: * added e2e tests Change-Id: I74fb754580663c99e1d8c3b73f8d62ea3305ac93 Reviewed-on: http://gerrit.cloudera.org:8080/16951 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `696dafed66`)	2021-01-15 15:30:12 +08:00
stiga-huang	e41fc61a28	IMPALA-9921: Change error messages in checking needsQuotes to TRACE level logs Impala planner uses the HiveLexer to check whether an ident needs to be quoted in toSql results. However, HiveLexer will print error messages to stderr which is redirected to impalad.ERROR, so they appear as ERROR level logs. Actually, they just mean HiveLexer can't parse the ident so they are not Hive keywords so don't need to be quoted. These error messages don't mean anything wrong so shouldn't be ERROR level logs. This patch overrides the HiveLexer used in ToSqlUtils to log the error messages to TRACE level logs. Tests * Manually verify the error messages don't appear in impalad.ERROR and are printed to TRACE level logs. Change-Id: I0e1b5d2963285dc9125d8e0b8ed25c4db6821e0b Reviewed-on: http://gerrit.cloudera.org:8080/16146 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `3b820d7774`)	2021-01-11 21:35:58 +08:00
Yongzhi Chen	4796d13f71	IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad In streaming-aggregation-node.cc , when replicate_input_ is true and num_aggs > 1, it will call AddBatchStreaming several times(more than 1), each time, the out_batch will be used. If a row is not cached, the value will be saved in the out_batch, and out_batch's row count will be increased. The row_count did not set back to 0 when next while loop. Therefore in out_batch, it is possible that not all the tuples are non-null. (For example the rows added when agg_idx = 1, only tuple with 1 not null; the rows added when when agg_idx = 2, only tuple with 2 not null). But in grouping-aggregation-ir.cc, the serialize out code is start from very beginning of out_batch for a agg_idx, it has good chance to hit null tuple. Fix the issue by only serialize the tuples being added by current function call. Tests: Manual tests Unit tests Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16 Reviewed-on: http://gerrit.cloudera.org:8080/16019 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `37b5599a7a`)	2021-01-11 21:35:43 +08:00
Tim Armstrong	b71187dbf9	IMPALA-9725: incorrect spilling join results for wide keys The control flow was broken if the join operator hit the end of the expression values cache before the end of the probe batch, immediately after processing a row for a spilled partition. In NextProbeRow(), the problematic code path was: * The last row in the expression values cache was for a spilled partition, so skip_row=true and it falls out of the loop with 'current_probe_row_' pointing to that row. * probe_batch_iterator->AtEnd() is false, because the expression value cache is smaller than the probe batch, so 'current_probe_row_' is not nulled out. Thus we end up in a state where 'current_probe_row_' is set, but 'hash_table_iterator_' is unset. In the case of a left anti join, this was interpreted by ProcessProbeRowLeftSemiJoins() as meaning that there was no hash table match for 'current_probe_row_', and it therefore returned the row. This bug could only occur under specific circumstances: * The join key takes up > 256 bytes in the expression values cache (assuming the default batch size of 1024). * The join spilled. * The join operator returns rows that were unmatched in the right input, i.e. LEFT OUTER JOIN, LEFT ANTI JOIN, FULL OUTER JOIN. The core of the fix is to null out 'current_probe_row_' when falling out the bottom of the loop in NextProbeRow(). Related DCHECKS were fixed and some control flow was slightly simplified. Testing: Added a test query on TPC-H that reproduces the problem reliably. Ran exhaustive tests. Change-Id: I9d7e5871c35a90e8cf24b8dded04775ee1eae9d8 Reviewed-on: http://gerrit.cloudera.org:8080/15904 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `fcf08d1822`)	2021-01-11 21:35:35 +08:00
xiaomeng	f5988190e6	IMPALA-9483 Add logs for debugging builtin functions throw unknown exception randomly In secure env with high concurrency, queries that call builtin function randomly fail when trying to find the function. For example, "AnalysisException: trim() unknown". Adding more info in exception message to help debugging when it happens again. Change-Id: I30d6eb697695da8d2521acb76d8310ec8f1bbda9 Reviewed-on: http://gerrit.cloudera.org:8080/15607 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `7fa43eef80`)	2021-01-11 21:35:27 +08:00
Joe McDonnell	33aba11c5e	IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case If DownloadUnpackTarball::download()'s wget_and_unpack_package call hits an exception, the exception handler cleans up any created directories. Currently, it erroneously cleans up the directory where the tarballs are downloaded even when it is not a temporary directory. This would delete the entire toolchain. This fixes the cleanup to only delete that directory if it is a temporary directory. Testing: - Simulated exception from wget_and_unpack_package and verified behavior. Change-Id: Ia57f56b6717635af94247fce50b955c07a57d113 Reviewed-on: http://gerrit.cloudera.org:8080/16294 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `bbec0443fc`)	2021-01-11 21:35:21 +08:00
Bikramjeet Vig	1ec0bdf41a	IMPALA-9739: Fix data race during impala graceful shutdown When impala does a graceful shutdown, exit() method is called at the end that performs cleanup which interferes with the shutdown signal handling thread spawned during init() and triggers a data race which gets caught by the thread sanitizer build. This patch fixes that by using an _exit() call instead. Testing: Ran the offending test TestGracefulShutdown on a thread sanitizer build and made sure no data race was flagged. Change-Id: I59bb5326791cd877df4711e23979f9bd88e4659a Reviewed-on: http://gerrit.cloudera.org:8080/16074 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `950e51f9a8`)	2021-01-11 21:35:00 +08:00
stiga-huang	fe4de65820	IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile The hits and requests metrics of partitions are overcounted due to using an updated map. This patch fixes it and adds test coverage on partition metrics. Tests - Run CatalogdMetaProviderTest Change-Id: I10cabce2908f1d252b90390978e679d31003e89d Reviewed-on: http://gerrit.cloudera.org:8080/16080 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `ee70df2e90`)	2021-01-11 21:34:54 +08:00
Tim Armstrong	d938d81698	IMPALA-9787: fix spinning thread with memory-based table invalidation Testing: Before this fix, CPU usage for catalogd in my local env with -invalidate_tables_on_memory_pressure was 100%. After this fix, CPU usage for the catalogd was low. Change-Id: I47d0cfc40ed7247e96322f9533fa594a03d7a8a3 Reviewed-on: http://gerrit.cloudera.org:8080/15994 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `a1485177d4`)	2021-01-11 21:34:47 +08:00
Akos Kovacs	e638bc04a2	IMPALA-7833 Audit and fix string builtins for long string handling Some string built-in functions could crash impalad, in case the result was longer than 1 gig max size. Added some overflow checks. Overflow error messages modified not to hard code max size. Testing: * Added some backend tests to cover overflow check * Ran core tests Change-Id: I93a53845f04e61ff446b363c78db1e49cbd5dc49 Reviewed-on: http://gerrit.cloudera.org:8080/15864 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `5e72ca546e`)	2021-01-08 11:12:41 +08:00
Shant Hovsepian	b68e610d7c	IMPALA-9727: Fix HBaseScanNode explain formatting In the case with more than one hbase predicate the indentation level wasn't correctly formatted in the explain string. Instead of: \| \| 13:SCAN HBASE [default.dimension d] \| \| hbase filters: \| \| d:foo EQUAL '1' \| \| d:bar EQUAL '2' \| \| d:baz EQUAL '3' \| \| predicate: This was produced: \| \| 13:SCAN HBASE [default.dimension d] \| \| hbase filters: d:foo EQUAL '1' d:bar EQUAL '2' d:baz EQUAL '3' \| \| predicate: Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Reviewed-on: http://gerrit.cloudera.org:8080/15749 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `7d260b6028`)	2021-01-08 11:12:34 +08:00
David Knupp	812ad40bac	IMPALA-9721: Fix minor python2/3 syntax regression A minor syntax error slipped past in a recent patch. In python3, the syntax for catching exceptions requires the 'as' keyword. This error was missed in code review. Until automated python3 testing set up, this kind of error is likely to repeat. See IMPALA-9724. Change-Id: I0d36c609a3600c8084efcce0026537227144b27d Reviewed-on: http://gerrit.cloudera.org:8080/15856 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: David Knupp <dknupp@cloudera.com> (cherry picked from commit `6e0085c220`)	2021-01-08 11:12:22 +08:00
Tamas Mate	3ba1ea1d9a	IMPALA-9398: Fix shell history duplication when cmdloop breaks This change adds a new condition to avoid re-reading the impala-shell history when the cmdloop is broken. The loop can break due to exceptions such as KeyboardInterrupt. Testing: - The change was tested manually on local dev env - Added a new EE shell test to verify the history after SIGINT Change-Id: If4faf46134f44d91e56748642f47d448707db53c Reviewed-on: http://gerrit.cloudera.org:8080/15345 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `1a36a0348b`)	2021-01-08 11:05:20 +08:00
Tim Armstrong	0a1266ffca	IMPALA-9643: fix runtime filter race for mt_dop This patch avoids the race with registration of a consumer filter by registering all filters upfront when the filter bank is constructed. Then registration of producers and consumers hands out references to the pre-constructed filters. A nice bonus of this change is that RegisterConsumer() and RegisterProducer() don't mutate anything and we can avoid lock acquisitions. Also adds test infrastructure and fixes TestRuntimeRowFilters to work with mt_dop=4 (it was accidentally not enabled before). That mostly involved modifying the tests to use aggregates of counters instead of picking out lines with regexes. Testing: Added a regression test that reliably failed before this fix. This relies on extending debug actions to allow longer delays, plus a minor extension to the RUNTIME_PROFILE .test file parser to handle spaces in counter names. Ran exhaustive tests. Change-Id: I194c0d2515b6a0e5474e1c0c8647f0e54dc94397 Reviewed-on: http://gerrit.cloudera.org:8080/15715 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `76e4a17fb3`)	2021-01-08 10:59:59 +08:00
Riza Suminto	c1af049b6b	IMPALA-9650: Fix flakiness in RuntimeFilterTest IMPALA-9612 adds RuntimeFilterTest to the set of backend tests. It adds a delay injection code in runtime-filter.cc to reproduce the race condition. However, the delay injection code will be stripped out when Impala is build with release config. This patch remove the NDEBUG macro enclosing the delay injection code so that it will not be stripped out in release build. Testing: - Ran and pass pass backend tests against release build. Change-Id: Ie3a5e68a128a97524755eeee4f8a993f38a0ed48 Reviewed-on: http://gerrit.cloudera.org:8080/15726 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `8d264321ef`)	2021-01-08 10:57:25 +08:00
Riza Suminto	317c65b82b	IMPALA-9612: Fix race condition in RuntimeFilter::WaitForArrival In function RuntimeFilter::WaitForArrival, there is a race condition where condition variable arrival_cv_ may be signaled right after thread get into the loop and before it call arrival_cv_.WaitFor(). This can cause runtime filter to wait the entire RUNTIME_FILTER_WAIT_TIME_MS even though the filter has arrived or canceled earlier than that. This commit avoid the race condition by making RuntimeFilter::SetFilter and RuntimeFilter::Cancel acquire arrival_mutex_ first before checking the value of arrival_time_ and release arrival_mutex_ before signaling arrival_cv_. Testing: - Add new be test runtime-filter-test.cc - Pass core tests. Change-Id: I7dffa626103ef0af06ad1e89231b0d2ee54bb94a Reviewed-on: http://gerrit.cloudera.org:8080/15673 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `5e69ae1d7d`)	2021-01-08 10:57:03 +08:00
Tim Armstrong	ed576c9738	IMPALA-9618: fix some usability issues with dev env Automatically assume IMPALA_HOME is the source directory in a couple of places. Delete the cache_tables.py script and MINI_DFS_BASE_DATA_DIR config var which had both bit-rotted and were unused. Allow setting IMPALA_CLUSTER_NODES_DIR to put the minicluster nodes, most important the data, in a different location, e.g. on a different filesystem. Testing: I set up a dev environment using this code and was able to load data and run some tests. Change-Id: Ibd8b42a6d045d73e3ea29015aa6ccbbde278eec7 Reviewed-on: http://gerrit.cloudera.org:8080/15687 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `5989900ae8`)	2021-01-08 10:42:28 +08:00
Aman Sinha	6364f4eaa1	IMPALA-9602: Fix case-sensitivity for local catalog This patch makes the database and table names lower-case when doing lookups, insertion and invalidations of database objects or table objects in the local catalog cache. The remote catalog already does the right thing by lower-casing these names, so this patch makes the behavior consistent with what the remote catalog does. Testing: - Added unit tests for CatalogdMetaProvider by examining cache hits and misses when loading and invalidating database or tables with upper-case names. - Manually tested as follows: start Impala with local catalog enabled start-impala-cluster.py --catalogd_args="--catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=true" Create database in lower-case: "CREATE DATABASE db1;" Run the following a few times (this errors without the patch): impala-shell.sh -q "DROP TABLE IF EXISTS DB1.ddl_test1 PURGE; CREATE TABLE DB1.ddl_test1 (val string);" Change-Id: I3f368fa9b50e22ec5057d0bf66c3fd51064d4c26 Reviewed-on: http://gerrit.cloudera.org:8080/15653 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit `0f85cbd511`)	2021-01-08 10:41:36 +08:00

1 2 3 4 5 ...

9063 Commits