impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	ff5cbadec4	IMPALA-12864: Deflake test_query_log_size_in_bytes. test_query_log_size_in_bytes is flaky in exhaustive builds. The expected QueryStateRecord is off by around 100KB per query, indicating that the test query might be too complex and lead to variability in final query profile that is being archived. This patch simplifies the test query and relaxes the assertion. This patch also adds QUERY_LOG_EST_TOTAL_BYTES metric (key name "impala-server.query-log-est-total-bytes") that is validated as well in test_query_log_size_in_bytes. query-state-record-test.cc is modified a bit to resolve segmentation fault issue when building unified-be-test-validated-executable target run_clang_tidy.sh. Testing: - Pass test_query_log_size_in_bytes in exhaustive build. Change-Id: Ic9b543ffa931a01c11d58e45774ea46af78b3a25 Reviewed-on: http://gerrit.cloudera.org:8080/21100 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-05 02:51:30 +00:00
jasonmfehr	97cd30c607	IMPALA-12426: Workload Management Supporting Changes Contains several disparate pieces of functionality that support the overall workload management work. 1. TNetworkAddressComparator An unused existing comparator for the Thrift class TNetworkAddress already existed in the thrift-util.h file. This comparator has been moved to the network-util.h file where it now resides in the same place as other utility functions that operator on TNetworkAddress instances. The existing comparator did not consider the uds address. It only considered hostname and port. The new comparator considers all three. Testing is accomplished by porting the existing ctests and adding additional ctests. 2. StringStreamPop This new class extends a std::basic_stringstream<char> to add a function that enables removing a character from the end. Testing is accomplished using new ctests. 3. Ticker This new header-only class notifies a condition variable at periodic intervals. It is a lightweight that sleeps until the configured duration has passed at which point it wakes up and notifies the condition variable. It also enables consumers to offload spurious wakeup guards to this class. Ctests have been added to test the functionality of this new class. 4. TUniqueId Empty Utility Function A new function UUIDEmpty returns true if a provided TUniqueID does not contain a UUID or false otherwise. Ctests have been added to test this new function. 5. run_clang_tidy.sh Additional informational outputs have been added to this script to enable tracking the status of the script and to easily identify errors found by clang tidy. 6. Summary Util Test A ctest was developed for testing the text table generation code that generates the exec summary portion of the query profile. This ctest was developed as part of an idea that did not ultimately pan out. Rather than throwing away that test code, it has been added as a new ctest. Change-Id: Iee23334ec56a18b192a75d052468bf59159b6424 Reviewed-on: http://gerrit.cloudera.org:8080/21048 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-27 17:06:28 +00:00
Joe McDonnell	dd8ddf77c3	IMPALA-12668: Enable clang-tidy checks for implicit fallthrough In switch/case statements, one case can fallthrough to the next case. Sometimes this is intentional, but it is also a common source of bugs (i.e. a missing break/return statement). Clang-Tidy's clang-diagnostic-implicit-fallthrough flags locations where a case falls through to the next case without an explicit fallthrough declaration. This change enables clang-diagnostic-implicit-fallthrough and fixes failing locations. Since Impala uses C++17, this uses C++17's [[fallthrough]] to indicate an explicit fallthrough. This also adjusts clang-tidy's output to suggest [[fallthrough]] as the preferred way to indicate fallthrough. Testing: - Ran core job - Ran clang tidy Change-Id: I6d65c92b442fa0317c3af228997571e124a54092 Reviewed-on: http://gerrit.cloudera.org:8080/20847 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zihao Ye <eyizoha@163.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-01-04 19:36:20 +00:00
gaurav1086	3614a6a776	IMPALA-12390 (part 2): Enable some clang-tidy performance related checks This enables the clang tidy performance check: performance-inefficient-string-concatenation "warning: string concatenation results in allocation of unnecessary temporary strings" Fix: Use StrCat() to concatenate multiple strings Testing: - Ran bin/run_clang_tidy.sh with the new checks - Ran GVO Change-Id: Ibad8bd0f12aab92ad874f5a6b9ec922dce7f3190 Reviewed-on: http://gerrit.cloudera.org:8080/20445 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-19 08:55:04 +00:00
Daniel Becker	817ca5920d	IMPALA-10640: Support reading Parquet Bloom filters - most common types This change adds read support for Parquet Bloom filters for types that can reasonably be supported in Impala. Other types, such as CHAR(N), would be very difficult to support because the length may be different in Parquet and in Impala which results in truncation or padding, and that changes the hash which makes using the Bloom filter impossible. Write support will be added in a later change. The supported Parquet type - Impala type pairs are the following: --------------------------------------- \|Parquet type \| Impala type \| \|---------------------------------------\| \|INT32 \| TINYINT, SMALLINT, INT \| \|INT64 \| BIGINT \| \|FLOAT \| FLOAT \| \|DOUBLE \| DOUBLE \| \|BYTE_ARRAY \| STRING \| --------------------------------------- The following types are not supported for the given reasons: ---------------------------------------------------------------- \|Impala type \| Problem \| \|----------------------------------------------------------------\| \|VARCHAR(N) \| truncation can change hash \| \|CHAR(N) \| padding / truncation can change hash \| \|DECIMAL \| multiple encodings supported \| \|TIMESTAMP \| multiple encodings supported, timezone conversion \| \|DATE \| not considered yet \| ---------------------------------------------------------------- Support may be added for these types later, see IMPALA-10641. If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests whether Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. The Parquet file used in the test was generated with an external tool. - Added unit tests for ParquetBloomFilter in file be/src/util/parquet-bloom-filter-test.cc - A minor, unrelated change was done in be/src/util/bloom-filter-test.cc: the MakeRandom() function had return type uint64_t, the documentation claimed it returned a 64 bit random number, but the actual number of random bits is 32, which is what is intended in the tests. The return type and documentation have been corrected to use 32 bits. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Reviewed-on: http://gerrit.cloudera.org:8080/17026 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2021-06-03 06:32:45 +00:00
Gabor Kaszab	6cb7cecacf	IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions This patch implements Truncate and Bucket partition transforms in Impala BE as built-in functions. The expectation is that these functions give the same result as Iceberg's implementation of the same functions. These built-in functions are invisible so users won't be able to invoke them e.g. from impala-shell. Truncate: - Supported types are IntVal, BigIntVal, StringVal, DecimalVal. - Receives an input from the above types and a width. - Returns the same type as the input. - Expected behaviour is explained here: https://iceberg.apache.org/spec/#truncate-transform-details Bucket: - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal, TimestampVal. - Receives an input from the above types and the number of buckets as IntVal. - Returns IntVal. - Expected behaviour is explained here: https://iceberg.apache.org/spec/#bucket-transform-details Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd Reviewed-on: http://gerrit.cloudera.org:8080/16741 Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-19 03:09:06 +00:00
Joe McDonnell	56ee90c598	IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 The locations for native-toolchain packages in IMPALA_TOOLCHAIN currently do not include the compiler version. This means that the toolchain can't distinguish between native-toolchain packages built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause issues when switching back and forth between branches. This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment variable, which is a location inside IMPALA_TOOLCHAIN that would hold native-toolchain packages. Currently, it is set to the same as IMPALA_TOOLCHAIN, so there is no difference in behavior. This lays the groundwork to add the compiler version to this path when switching to GCC7. Testing: - The only impediment to building with IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is Impala-lzo. With a custom Impala-lzo, compilation succeeds. Either Impala-lzo will be fixed or it will be removed. - Core tests Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b Reviewed-on: http://gerrit.cloudera.org:8080/15991 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-30 16:25:37 +00:00
Gabor Kaszab	9a3108709b	IMPALA-9631: Import HLL functionality from DataSketches This patch imports the functionality needed for HLL approximate algorithm from Apache DataSketches. I decided to copy the necessary files into be/src/thirdparty/datasketches. Note, that the original structure of files was changed during this process as originally hll/ and common/ libraries were both affected but I copied these into the same directory so that Impala can compile them without rewriting the include paths in the files themselves. Also note, that not the whole common/ directory was copied just the files needed for HLL. The git hash of the snapshot I used as a source for the files: a6265b307a03085abe26c20413fdbf7d7a5eaf29 Browse the source files here: https://github.com/apache/incubator-datasketches-cpp Change-Id: I8ca8e77dcbb6b6c3b1e3bca7ab57cb7d3c018bbf Reviewed-on: http://gerrit.cloudera.org:8080/15746 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-24 16:26:21 +00:00
Tim Armstrong	aa4c115e58	IMPALA-9312: Better error handling for run_clang_tidy.sh This turns a buildall.sh failure into a job failure and dumps full output of the script. Change-Id: I4a90e5b48f0d75b0a1c52cfe629c22bd37c70b26 Reviewed-on: http://gerrit.cloudera.org:8080/15090 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2020-01-22 17:18:06 +00:00
Joe McDonnell	27c788f826	IMPALA-7132: Filter out useless output from run_clang_tidy.sh Clang's run-clang-tidy.py script produces a lot of output even when there are no warnings or errors. None of this output is useful. This patch has two parts: 1. Bump LLVM to 5.0.1-p1, which has patched run-clang-tidy.py to make it reduce its own output when passed -quiet (along with other enhancements). 2. Pass -quiet to run-clang-tidy.py and pipe the stderr output to a temporary file. Display this output only if run-clang-tidy.py hits an error, as this output is not useful otherwise. Testing with a known clang tidy issue shows that warnings and errors are still in the output, and the output is clean on a clean Impala checkout. Change-Id: I63c46a7d57295eba38fac8ab49c7a15d2802df1d Reviewed-on: http://gerrit.cloudera.org:8080/10615 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-17 04:58:28 +00:00
Jinchul	061287d912	IMPALA-5754: Rollback the exclusion of clang-tidy check for pcg-cpp In the commit `4feb4f3a`, the third party library pcg-cpp was excluded from the clang-tidy check. It could make unexpected side effect, so fixing some warnings from clang-tidy is better than avoidance of the check. Change-Id: I591d30373cb13f0eb89afbe16d81b1d3fb783365 Reviewed-on: http://gerrit.cloudera.org:8080/8829 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-12-18 18:37:45 +00:00
Jinchul	4feb4f3a54	IMPALA-5754: Improve randomness of rand()/random() Currently implementation of rand/random built-in functions use rand_r of C library. We recognized its randomness was poor. pcg32 of third party library shows better randomness than rand_r. Testing: Revise unit test in expr-test Add E2E test to random.test Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684 Reviewed-on: http://gerrit.cloudera.org:8080/8355 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-12-13 10:04:40 +00:00
Alex Behm	0936e32966	IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Reviewed-on: http://gerrit.cloudera.org:8080/8569 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 22:20:18 +00:00
Henry Robinson	1cbfa423b4	IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Finally, a few changes to allow compilation on RHEL5: * Only use sched_getcpu() if supported * Only include magic.h if available * Workaround for kernels that don't have SOCK_NONBLOCK * Workaround for kernels that don't have O_CLOEXEC (ignore the flag) * Provide non-working implementation of fallocate() * Disable inclusion of linux/fiemap.h - although this exists on RHEL5, it does not compile due to other #includes in env_posix.cc. We disable the path this is used for, since Impala does not call that code. * Use Kudu's implementation of pipe(2), preadv(2) and pwritev(2) where it doesn't exist. In most cases these changes simply force kutil to revert to a different implementation that was already written for OSX support - this patch generalises the logic to provide the implementation whenever the required function doesn't exist. This patch compiles on RHEL5.5 and 6.0, SLES11 and 12, Ubuntu 12.04 and 14.04 and Debian 7.0 and 8.0. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Reviewed-on: http://gerrit.cloudera.org:8080/5715 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-07-28 21:47:07 +00:00
Jim Apple	9434e38abb	clang-tidy should tidy tests; fix alignas error in clang builds. run_clang_tidy.sh was mistakenly using -notests, which doesn't even compile the tests, rather than -skiptests, which compiles (but does not run) the backend tests. When I discovered this, I also found that all clang builds (including tidy and asan) had been broken by my previous alignas commit (`10a4c5a2e4`). This patch fixes that as well. Change-Id: Ib7066039f78d7ee039db619b96e25665b4d63503 Reviewed-on: http://gerrit.cloudera.org:8080/5094 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Internal Jenkins	2016-11-15 23:33:45 +00:00
Jim Apple	14891fe004	IMPALA-3676: Use clang as a static analysis tool This patch adds a script to run clang-tidy over the whole code base. It is a first step towards running clang-tidy over patches as a tool to help users spot bugs before code review. Because of the number of clang-tidy checks, this patch only addresses some of them. In particular, only checks starting with 'clang' are considered. Many of them which are flaky or not part of our style are excluded from the analysis. This patch also exlcudes some checks which are part of our current style but which would be too laborious to fix over the entire codebase, like using nullptr rather than NULL. This patch also fixes a number of small bugs found by clang-tidy. Finally, this patch adds the class AlignedNew, the purpose of which is to provide correct alignment on heap-allocated data. The global new operator only guarantees 16-byte alignment. A class that includes a member variable that must be aligned on a k-byte boundary for k>16 can inherit from AlignedNew<k> to ensure correct alignment on the heap, quieting clang's -Wover-aligned warning. (Static and stack allocation are required by the standard to respect the alignment of the type and its member variables, so no extra code is needed for allocation in those places.) Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06 Reviewed-on: http://gerrit.cloudera.org:8080/4758 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-11-04 00:13:12 +00:00

16 Commits