impala

mirror of https://github.com/apache/impala.git synced 2026-01-23 21:00:25 -05:00

Author	SHA1	Message	Date
Juan Yu	99db6812a9	Fix udf_samples building issue if machine has a very old boost installed udf_samples makefile doesn't use ${CLANG_INCLUDE_FLAGS} so it will use the default boost installation. If dev env has a very old boost installed, you could get the following comiling error. ../udf/udf.h:143:3: error: unknown type name 'uint8_t' uint8_t* Allocate(int byte_size); ^ Change-Id: I3878b9d73d6022855b0cfbbdbee17eaf4c2557e1 Reviewed-on: http://gerrit.cloudera.org:8080/692 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-08-26 02:59:27 +00:00
Martin Grund	5afd5bc8f6	Toolchain Cleanup and ASAN Improvements This patch provides the last fixes to finally enable the toolchain: - Remove static OpenSSL dependency - Fixing inline assembly problems in ASAN - Issues with non-relocatable LLVM 3.3 - adds manual system includes to fix issues with hardcoded header paths in clang. When the toolchain is enabled and we build for ASAN we use a specific toolchain file to build with LLVM-trunk as the main compiler. Even though this uses LLVM-trunk for compiling the Impala code, this will use LLVM 3.3 for codegen. In addition, this enables us to follow up with TSAN and LEAKSAN. Change-Id: I0abb914ca3f192cb7edd83ead134bc9e2d02071f Reviewed-on: http://gerrit.cloudera.org:8080/556 Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-08-21 20:14:31 +00:00
Martin Grund	384ae3ab08	Fixes for Toolchain Issues If a static version of zlib and bzip2 is picked up we assumed that it would be compiled with -fPIC. However, this is not always the case. Thus in the non-toolchain case we specifically dynamic link with zlib and bzip2 for the dynamic targets. In addition, this patch removes static linking of libgcc in the toolchain case as LLVM is not able to find the exception handling symbols even if they are present in the binary. Static linking of libgcc is postponed. Next, if Impala is build with -notests the external data source thrift files would not be generated. This patch make sure the dependencies are expressed correctly. Finally, if a user would have google perftools installed on the system we would accidentally pick up the system libraries and the thirdparty headers which will end in linker errors. This patch fixes the path issues. Change-Id: Ic000101c33da26d75a0cd733f7ef02f1bd694937 Reviewed-on: http://gerrit.cloudera.org:8080/460 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-06-15 23:14:32 +00:00
Martin Grund	81f247b171	Optional Impala Toolchain This patch allows to optionally enable the new Impala binary toolchain. For now there are now major version differences in the toolchain dependencies and what is currently kept in thirdparty. To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the folder where the binaries are available. In addition this patch moves gutil from the thirdparty directory into the source tree of be/src to allow easy propagation of compiler and linker flags. Furthermore, the thrift-cpp target was added as a dependency to all targets that require the generated thrift sources to be available before the build is started. What is the new toolchain: The goal of the toolchain is to homogenize the build environment and to make sure that Impala is build nearly identical on every platform. To achieve this, we limit the flexibility of using the systems host libraries and rather rely on a set of custom produced binaries including the necessary compiler. Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b Reviewed-on: http://gerrit.cloudera.org:8080/427 Reviewed-by: Adar Dembo <adar@cloudera.com> Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-06-13 03:11:44 +00:00
Henry Robinson	75b16d5b8e	Rewrite header comments to use Doxygen-compatible /// The command line used: git ls-files .h \| xargs sed -i '14,$s/^$ \/\/$ /\1\/ /g' ...then some manual fix-up to remove false positives on inlined functions that contain comments. Change-Id: Ia835ae21f189d5a8dc5627fb3983081a0bd1f1e2 Reviewed-on: http://gerrit.cloudera.org:8080/305 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-05-07 23:07:57 +00:00
Martin Grund	2eb12e9593	Deprecating namespace directive declarations (std, boost) This patch removes all occurrences of "using namespace std" and "using namespace boost(.*)" from the codebase. However, there are still cases where namespace directives are used (e.g. for rapidjson, thrift, gutil). These have to be tackled in subsequent patches. To reduce the patch size, this patch introduces a new header file called "names.h" that will include many of our most frequently used symbols iff the corresponding include was already added. This means, that this header file will pull in for example map / string / vector etc, only iff vector was already included. This requires "common/names.h" to be the last include. After including `names.h` a new block contains a sorted list of using definitions (this patch does not fix namespace directive declarations for other than std / boost namespaces.) Change-Id: Iebe4c054670d655bc355347e381dae90999cfddf Reviewed-on: http://gerrit.cloudera.org:8080/338 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-04-18 01:26:47 +00:00
Henry Robinson	44f57e5fb6	IMPALA-1122: Compute stats with partition granularity This patch adds the ability to compute and drop column and table statistics at partition granularity. The following commands are added. Detail about the implementation follows. COMPUTE INCREMENTAL STATS <tbl_name> [PARTITION <partition_spec>] This variant of COMPUTE STATS will, ultimately, do the same thing as the traditional COMPUTE STATS statement, but does so by caching the intermediate state of the computation for each partition in the Hive MetaStore. If the PARTITION clause is added, the computation is performed for only that partition. If the PARTITION clause is omitted, incremental stats are updated only for those partitions with missing incremental stats (e.g. one column does not have stats, or incremental stats was never computed for this partition). In this patch, incremental stats are only invalidated when a DROP STATS variant is executed. Future patches can automatically invalidate the statistics after REFRESH or INSERT queries, etc. DROP INCREMENTAL STATS <tbl_name> PARTITION <part_spec> This variant of DROP stats removes the incremental statistics for the given table. It does not recalculate the statistics for the whole table, so this should be used only to invalidate the intermediate state for a partition which will shortly be subject to COMPUTE INCREMENTAL STATS. The point of this variant is to allow users to notify Impala when they believe a partition has changed significantly enough to warrant recomputation of its statistics. It is not necessary for new partitions; Impala will detect that they do not have any valid statistics. -------- This is achieved by adapting the existing HLL UDA via swapping its finalize method for a new one which returns the intermediate HLL buckets, rather than aggregating and then disposing of them. This intermediate state is then returned to Impala's catalog-op-executor.cc, which then passes the intermediate state back to the frontend to be ultimately stored in the HMS. This intermediate state is computed on a per-partition basis by grouping the input to the UDA by partition. Thus, the incremental computation produces one row for each partition selected (the set of which might be quite small, if there are few partitions without valid incremental stats: this is the point of the new commands). At the same time, the query coordinator aggregates the output of the UDA to produce table-level statistics. This computation incorporates any existing (and not re-computed) intermediate partition state which is passed to the coordinator by the frontend. The resulting statistics are saved to the table as normal. Intermediate statistics are serialised to the HMS by writing a Thrift structure's serialised form to the partition's 'parameters' map. There is a schema-imposed limit of 4000 characters to the serialised string, which is exacerbated by the fact that the Thrift representation must first be base-64 encoded to avoid type errors in the HMS. The current patch breaks the encoded structure into 4k chunks, and then recombines them on read. The alltypes table (11 columns) takes about three of these chunks. This may mean that incremental stats are not suitable for particularly wide tables: these structures could be zipped before encoding for some space savings. In the meantime, the NDV estimates are run-length encoded (since they are generally sparse); this can result in substantial space savings. Change-Id: If82cf4753d19eb532265acb556f798b95fbb0f34 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4475 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com> Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5408	2014-11-25 09:13:37 -08:00
Skye Wanderman-Milne	c8b2017093	Add decimal UDF/UDA support. Change-Id: Ie48c1cb8e978c7282593b7f602dd68added6d3fd Reviewed-on: http://gerrit.ent.cloudera.com:8080/2625 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: jenkins (cherry picked from commit 5048f04b332c13b1bff32fb257272b0fea4b8584) Reviewed-on: http://gerrit.ent.cloudera.com:8080/2739	2014-05-29 20:49:53 -07:00
Skye Wanderman-Milne	44125729dc	UDF/UDA memory management improvements * AggFnEvaluator now uses the UDF mem pool (I'm planning to change this to per-exec node pools in the expr refactoring) * FunctionContext::TrackAllocation()/Free() actually use the UDF's mem tracker * Added FunctionContextImpl::Close() which sets warnings for leaked allocations Change-Id: I792ffd49102a92b57e34df18d8ff5f5d0fd27370 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1792 Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com> Tested-by: Skye Wanderman-Milne <skye@cloudera.com> (cherry picked from commit 41a5f7cfa718789fa3b2de3a31f085411fb5000c) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1954 Tested-by: jenkins	2014-03-17 20:38:25 -07:00
Nong Li	6b9a7de02e	Add symbol resolution during analysis for create function stmts. Before this, we had to specify the entire mangled symbol. This can be quite long and quite tedious (take a look at some of the create UDA test cases that specify all the symbols). This patch adds some code to convert from the user function signature to the mangled name. This means the user can specify the unmangled name and we can do the symbol lookup. The mangling rules are pretty convoluted but if it is messed up, the user can always specify the full symbol. Some other minor cleanup in: - JNI from FE to BE - UDFs/UDAs that are loaded as test data Change-Id: I733dbf3a72cb7b06221c27e622d161bcca0d74a8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/624 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:20 -08:00
Nong Li	b93b15f10f	Integrate function context with mempool. Change-Id: I55edb6cb89b67eb2c8031ac3a4f119df92a0896f Reviewed-on: http://gerrit.ent.cloudera.com:8080/565 Tested-by: jenkins Reviewed-by: Nong Li <nong@cloudera.com>	2014-01-08 10:53:05 -08:00
Nong Li	93bece32ae	Rename UdfContext to FunctionContext. Change-Id: I45da3f51a66c3e2cc4580c26733269f30ab9be83 Reviewed-on: http://gerrit.ent.cloudera.com:8080/546 Tested-by: jenkins Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>	2014-01-08 10:53:01 -08:00
Nong Li	052ab52c19	UDF/UDA interface, some examples and developer libs. Change-Id: I9dd97d11ebebd5ba996f8317063eb7eff3bcc6f6 Reviewed-on: http://gerrit.ent.cloudera.com:8080/216 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins	2014-01-08 10:52:45 -08:00

13 Commits