impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
Xiang Yang	8af0ce8ed6	IMPALA-13001: Support graceful and force shutdown for impala.sh This patch add graceful and force shutdown support for impala.sh. This patch also keep the stdout and stderr log when startup. This patch also fix some bugs in the impala.sh, including: - empty service name check. - restart command cannot work. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Change-Id: Ib7743234952ba6b12694ecc68a920d59fea0d4ba Reviewed-on: http://gerrit.cloudera.org:8080/21297 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-15 12:03:18 +00:00
Xiang Yang	050805d21b	IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. Independent linux packaging related content to package/CMakeLists.txt to make it more clearly. This patch also add LICENSE and NOTICE file in the final package. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Reviewed-on: http://gerrit.cloudera.org:8080/20263 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-12 14:48:00 +00:00
Michael Smith	e6ed98c22b	IMPALA-11201: update gitignore files Updates gitignore for files generated during bootstrap_development. Fixes deleting tracked files in be/src/thirdparty. Includes ignore rules for past versions of shell dependencies and updates ignores for current versions. Change-Id: I03deba5e7fb151ef8e34039becdcc3fb47684084 Reviewed-on: http://gerrit.cloudera.org:8080/18499 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-10 03:06:59 +00:00
Zoltan Borok-Nagy	dbc69e9cae	Update .gitignore with VSCode artifacts Added .vscode/ directory to .gitignore. Change-Id: Ifc4083787b132f6455023c9b2f52a82a1b8626a7 Reviewed-on: http://gerrit.cloudera.org:8080/15061 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-17 20:56:57 +00:00
Tim Armstrong	85c9895c11	Update gitignore files This adds in a handful of files that I had on my local machine Change-Id: I357441fab00ac031fbc70c40e4574e7a723fdedd Reviewed-on: http://gerrit.cloudera.org:8080/14858 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-12-06 04:38:09 +00:00
Tim Armstrong	f689daef7f	IMPALA-8622,IMPALA-8696: fix docker dependencies, add image list Adds a plain-text space-separated image list in docker/docker-images.txt. This is generated based on the images built by CMake, so is kept in sync with images added to or removed from the CMake file. Duplicated logic per image is removed - instead there is a helper function that is called for each daemon image to be built. Rips out the timestamp mechanism that was intended to avoid unnecessary container rebuilds, but has turned out to be brittle. Instead the containers are rebuilt each time the rule is invoked. This moves some subdirectories so that the image tag matches the subdirectory, to simplify the build scripts. Change-Id: I4d8e215e9b07c6491faa4751969a30f0ed373fe3 Reviewed-on: http://gerrit.cloudera.org:8080/13899 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com>	2019-07-23 23:57:43 +00:00
Fang-Yu Rao	931a8f0ba7	IMPALA-4865: Reject Expr Rewrite When Appropriate Avoided rewrite if the resulting string literal exceeds a defined limit. Testing: Added three statements in testFoldConstantsRule() to verify that the expression rewrite is accepted only when the size of the rewritten expression is below a specified threshold. Change-Id: I8b078113ccc1aa49b0cea0c86dff2e02e1dd0e23 Reviewed-on: http://gerrit.cloudera.org:8080/12814 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-04-28 17:26:23 +00:00
Fredy Wijaya	5fa076e95c	IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201 This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also refactors the bootstrap_toolchain.py to be more generic for dealing with CDP components, e.g. Ranger and Hive 3. The patch also fixes some TODOs to replace the rangerPlugin.init() hack with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger build. Testing: - Ran core tests - Manually verified that no regression when starting Hive 3 with USE_CDP_HIVE=true Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada Reviewed-on: http://gerrit.cloudera.org:8080/13002 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-17 05:30:33 +00:00
Paul Rogers	29df1d55f6	Additions to .gitignore Adds entries for Eclipse-created files and for a couple of temporary files commonly created during front-end debugging. Change-Id: Ia8ea436f5e108cc08389f43d639a4cb7315271c1 Reviewed-on: http://gerrit.cloudera.org:8080/12430 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-11 10:57:20 +00:00
Tim Armstrong	ea826ca0d9	IMPALA-7948: part 1: initial docker container build This builds an impala_base container that has all of the build artifacts required to run the impala processes, then builds impalad, catalogd and statestore containers based on that with the right ports exposed. The images are based on the Ubuntu 16.04 image to align with the most common development environment. The container build process is integrated with CMake and is designed to integrate with the rest of the build so that the container build depends on the artifacts that will go into the container. You can build the images with the following command, which will create images called "impala_base", "impalad", "catalogd" and "statestored": ninja -j $IMPALA_BUILD_THREADS docker_images The images need some refinement to be truly useful. The following will be done in future patches: * IMPALA-7947 - integrate with start-impala-cluster.py to automatically create docker network with containers running on it * Mechanism to pass in command-line flags * Mechanisms to update the various config files to point to the docker host rather than "localhost", which doesn't point to the right thing inside the container. * Mechanisms to set mem_limit, JVM heap sizes, etc, automatically. Testing: Manually started up the containers connected to a user-defined bridge network, tweaked the configurations to point to the HMS/HDFS/etc running on my host. I then used "docker ps" to figure out the port mappings for beeswax and debug webserver. Confirmed that I could run a query and access debug pages: $ impala-shell.sh -i localhost:32860 -q "select coordinator()" Starting Impala Shell without Kerberos authentication Opened TCP connection to localhost:32860 Connected to localhost:32860 Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build d7870fe03645490f95bd5ffd4a2177f90eb2f3c0) Query: select coordinator() Query submitted at: 2018-12-11 15:51:04 (Coordinator: http://8063e77ce999:25000) Query progress can be monitored at: http://8063e77ce999:25000/query_plan?query_id=1b4d03f0f0f1fcfb:b0b37e5000000000 +---------------+ \| coordinator() \| +---------------+ \| 8063e77ce999 \| +---------------+ Fetched 1 row(s) in 0.11s Change-Id: Ifea707aa3cc23e4facda8ac374160c6de23ffc4e Reviewed-on: http://gerrit.cloudera.org:8080/12074 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-12-18 04:45:32 +00:00
Tim Armstrong	aa654d4b87	Update .gitignore A few unversioned artifacts crept in over time without corresponding .gitignore entries. These are the updates based on the git status output on my dev env. Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a Reviewed-on: http://gerrit.cloudera.org:8080/11787 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-26 22:19:35 +00:00
Fredy Wijaya	d5ada970c6	IMPALA-7381: Prevent build failure after switching to new CDH_BUILD_NUMBER Switching to a new CDH_BUILD_NUMBER requires downloading new CDH components as well as forcing Maven to update its local repository. This patch updates the CDH_COMPONENTS_HOME to include the CDH_BUILD_NUMBER which will automatically download the new CDH components after switching to a new CDH_BUILD_NUMBER. When running a build if it detects that a new CDH_BUILD_NUMBER has changed, the build will force an update to the local Maven repository. This helps to prevent build failure even on a fresh Git clone due to stale local Maven repository. Testing: - Manually tested by running buildall.sh with different CDH_BUILD_NUMBER Change-Id: Ib0ad9c2258663d3bd7470e6df921041d1ca0c0be Reviewed-on: http://gerrit.cloudera.org:8080/11099 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-03 08:23:57 +00:00
Tim Armstrong	dc1282fbc9	IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Reviewed-on: http://gerrit.cloudera.org:8080/8652 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:28:22 +00:00
Vuk Ercegovac	8aaf9ec1fb	Updates several .gitignore files. Ran into these when compiling (generated files from kudu), eclipse setup, and testing. Change-Id: Ife446e40756864f2a19ae4393ac503d17d91996b Reviewed-on: http://gerrit.cloudera.org:8080/7902 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-31 01:40:47 +00:00
Tim Armstrong	d5b0c6b936	Update .gitinore files I noticed a bunch of new things had crept in. Change-Id: Ie6ef085357a3bf026f2b42689ee642192a7791e7 Reviewed-on: http://gerrit.cloudera.org:8080/7590 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2017-08-04 22:44:59 +00:00
Tim Armstrong	6b90aa3a11	IMPALA-4653: fix sticky config variable problem Previously we could get a developer's shell into a bad state where a value of a config variable from a previous impala-config.sh version would override the value from the new impala-config.sh version. This change adds a new mechanism to override settings locally by adding settings to impala-config-local.sh. This alternative approach is more robust, because the config variables will be reset to the intended values when impala-config.sh is re-sourced. impala-config-branch.sh can also be used to override settings in a version-controlled way, e.g. to support having different settings for different branches. I did not convert all variables to use this approach, since many people and Jenkins jobs depend on setting these variables from the environment. The remaining "sticky" variables are ones where default values should not change frequently, e.g. source directory locations and build settings. Change-Id: I930e2ca825142428d17a6981c77534ab0c8e3489 Reviewed-on: http://gerrit.cloudera.org:8080/5545 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-05 01:43:36 +00:00
Lars Volker	08ef70bebe	Remove vim plugin config file from .gitignore Files in be/ get wiped out by clean.sh if they're listed in .gitignore. It is easier to just configure this via a project-specific .vimrc file. Change-Id: I262f7a1ec8daace84a29518ba826c7c3b20fb9e9 Reviewed-on: http://gerrit.cloudera.org:8080/4854 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2016-10-31 22:44:54 +00:00
Jim Apple	0eaff805e2	Add distcc infrastructure. This has been working for several months, and it it was written mainly by Casey Ching while he was at Cloudera working on Impala. Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae Reviewed-on: http://gerrit.cloudera.org:8080/4820 Reviewed-by: Jim Apple <jbapple@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 01:15:50 +00:00
Lars Volker	ef4c9958d0	IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo This change removes some of the occurrences of the strings 'CDH'/'cdh' from the Impala repository. References to Cloudera-internal Jiras have been replaced with upstream Jira issues on issues.cloudera.org. For several categories of occurrences (e.g. pom.xml files, DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to remove the occurrences left after this change. Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8 Reviewed-on: http://gerrit.cloudera.org:8080/4187 Reviewed-by: Jim Apple <jbapple@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-13 00:40:41 +00:00
Lars Volker	e659337871	Add vim-specific files to .gitignore Change-Id: I1abcd8ca0e18178684c916ef6f7d55c25c0814a4 Reviewed-on: http://gerrit.cloudera.org:8080/4562 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-10-04 18:49:50 +00:00
Jim Apple	57fcbf7a28	IMPALA-4171: Remove JAR from repo. By ASF rules, we can't have JARs in releases. The releases are just tarballs of the repo. This patch removes from the repo the single JAR there, which was a version of a JAR that is built during data load, with one string changed. The JAR is used only for testing. Instead of building that jar with the different string and saving the result in git, daa loading will now build the jar twice, with one Java source file slightly changed. Change-Id: Icee7b8c32b08e064dea4a14624acff6021ef5ce1 Reviewed-on: http://gerrit.cloudera.org:8080/4499 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-09-22 02:00:50 +00:00
Tim Armstrong	904265ccb5	Update .gitignore files for ninja, coredumps and pypi packages Change-Id: Ie7d34fbd27150ba6c437207611f71bb95a0e4cba Reviewed-on: http://gerrit.cloudera.org:8080/3814 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-07-29 21:42:07 +00:00
Casey Ching	07bdb6d484	Add .impala_compiler_opts to .gitignore Change-Id: I164a077a91fcbe2cd445637ce958e91082bd56e0 Reviewed-on: http://gerrit.cloudera.org:8080/3012 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Casey Ching <casey@cloudera.com>	2016-05-12 14:17:58 -07:00
Alex Behm	7e76e92bef	Consolidate test and cluster logs under a single directory. All logs, test results and SQL files generated during data loading and testing are now consolidated under a single new directory $IMPALA_HOME/logs. The goal is to simplify archiving in Jenkins runs and debugging. The new structure is as follows: $IMPALA_HOME/logs/cluster - logs of Hadoop components and Impala $IMPALA_HOME/logs/data_loading - logs and SQL files produced in data loading $IMPALA_HOME/logs/fe_tests - logs and test output of Frontend unit tests $IMPALA_HOME/logs/be_tests - logs and test output of Backend unit tests $IMPALA_HOME/logs/ee_tests - logs and test output of end-to-end tests $IMPALA_HOME/logs/custom_cluster_tests - logs and test output of custom cluster tests I tested this change with a full data load which was successful. Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa Reviewed-on: http://gerrit.cloudera.org:8080/2456 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-03-28 19:23:22 +00:00
Martin Grund	1720409545	Automatically enable toolchain for development This patch adds logic to automatically download the pre-built toolchain packages to the local developer machine using the bootstrap_toolchain.py script in case there are not present. There is no manual user intervention necessary to initiate the download process. If desired the script can always be called to re-download the dependencies from a correctly sourced Impala environment. Change-Id: I636160efeadfac4b5c1feb478da5ae5da0c9fd00 Reviewed-on: http://gerrit.cloudera.org:8080/1429 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-12-22 22:22:59 +00:00
Matthew Jacobs	fe87bb1563	Add MetricDefs, static definitions of metric metadata generated from json Adds a static definition of the metric metadata used by Impala. The metric names, descriptions, and other properties are defined in common/thrift/metrics.json file, and the generate_metrics.py script creates a thrift representation. The metric definitions are then available in a constant map which is used at runtime to instantiate metrics, looking them up in the map by the metric key. New metrics should be defined by adding an entry to the list of metrics in metrics.json with the following properties: key: The unique string identifying the metric. If the metric can be templated, e.g. rpc call duration, it may be a format string (in the format used by strings::Substitute()). description: A text description of the metric. May also be a format string. label: A brief title for the metric, not currently used by Impala but provided for external tools. units: The unit of the metric. Must be a valid value of TUnit. kind: The kind of metric, e.g. GAUGE or COUNTER. Must be a valid value of TMetricKind. contexts: The context in which this metric may be instantiated. Usually "IMPALAD", "STATESTORED", "CATALOGD", but may be a different kind of 'entity'. Not currently used by Impala but provided for modeling purposes for external tools. For example, adding the counter for the total number of queries run over the lifetime of the impalad process might look like: { "key": "impala-server.num-queries", "description": "The total number of queries processed.", "label": "Queries", "units": "UNIT", "kind": "COUNTER", "contexts": [ "IMPALAD" ] } TODO: Incorporate 'label' into the metrics debug page. TODO: Verify the context at runtime, e.g. verify 'contexts' contains, e.g. a DCHECK. After the metric definition is added, the generate_metrics.py script will generate the TMetricDefs.thrift that contains a TMetricDef for the metric definition. At runtime, the metric can be instantiated using the key defined in metrics.json. Gauges, Counters, and Properties are instantiated using static methods on MetricGroup. Other metric types are instantiated using static CreateAndRegister methods on their associated classes. TODO: Generate a thrift enum used to lookup metric defs. TODO: Consolidate the instantiation of metrics that are created outside of metrics.h (i.e. collection metrics, memory metrics). TODO: Need a better way to verify if metric definitions are missing. Change-Id: Iba7f94144d0c34f273c502ce6b9a2130ea8fedaa Reviewed-on: http://gerrit.cloudera.org:8080/330 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-05-14 21:27:28 +00:00
Matthew Jacobs	9ea43823e5	Add compile_commands.json to .gitignore Change-Id: Ice093f541798da8fa4aa480b5059c85df2eb84ef Reviewed-on: http://gerrit.cloudera.org:8080/298 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-28 00:22:15 +00:00
Martin Grund	b582cdc22b	IMPALA-1598: Adding Error Codes to Log Messages This patch introduces the concept of error codes for errors that are recorded in Impala and are going to be presented to the client. These error codes are used to aggregate and group incoming error / warning messages to reduce the spill on the shell and increase the usefulness of the messages. By splitting the message string from the implementation, it becomes possible to edit the string independently of the code and pave the way for internationalization. Error messages are defined as a combination of an enum value and a string. Both are defined in the Error.thrift file that is automatically generated using the script in common/thrift/generate_error_codes.py. The goal of the script is to have a central understandable repository of error messages. Adding new messages to this file will require rebuilding the thrift part. The proxy class ErrorMessage is responsible to represent an error and capture the parameters that are used to format the error message string. When error messages are recorded they are recorded based on the following algorithm: - If an error message is of type GENERAL, do not aggregate this message and simply add it to the total number of messages - If an error messages is of specific type, record the first error message as a sample and for all other occurrences increment the count. - The coordinator will merge all error messages except the ones of type GENERAL and display a count. For example, in the case of the parquet file spanning multiple blocks the output will look like: Parquet files should not be split into multiple hdfs-blocks. file=hdfs://localhost:20500/fid.parq (1 of 321 similar) All messages are always logged to VLOG. In the coordinator error messages are merged across all backends to retain readability in the case of large clusters. The current version of this patch adds these new error codes to some of the most important error messages as a reference implementation. Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8 Reviewed-on: http://gerrit.cloudera.org:8080/39 Reviewed-by: Martin Grund <mgrund@cloudera.com> Tested-by: Internal Jenkins	2015-03-01 03:37:32 +00:00
Dan Hecht	7d504b847f	Add thrift, jflex, yacc files to cscope file list. I find it useful to have these indexed by cscope. Also, gitignore files generated with cscope -q (inverted index). Change-Id: I8d5bcd34706c40357b94337db4b72dccecdecbd9 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3910 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Daniel Hecht <dhecht@cloudera.com> (cherry picked from commit 66ecea5c0c9212377d314c5312fec07446960325) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3937 Reviewed-by: Daniel Hecht <dhecht@cloudera.com>	2014-08-19 18:00:55 -07:00
Nong Li	b0a7c4567f	Add a few directories to .gitignore. Change-Id: Ifd81c623c69629d58e7dca6aa63c3d7117f5999e (cherry picked from commit 235d94c4edf039c6ef84f140a4c70ddd1639ba63) Reviewed-on: http://gerrit.ent.cloudera.com:8080/1346 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Nong Li <nong@cloudera.com>	2014-01-22 16:08:13 -08:00
Nong Li	8ada9b4383	Add cluster_logs/ to gitignore. Change-Id: I2957f1939355455afbd01aaaf91074ffaf25be41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/450 Tested-by: jenkins Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:44 -08:00
Lenni Kuff	2f7198292a	Add support for auxiliary workloads, tests, and datasets This change adds support for auxiliary worksloads, tests, and datasets. This is useful to augment the regular test runs with some additional tests that do not belong in the main Impala repo.	2014-01-08 10:50:32 -08:00
Alex Behm	861ba05989	IMPALA-197: Outer join on constant expressions returns incorrect results.	2014-01-08 10:50:09 -08:00
Nong Li	0df9476be1	Parquet data loading.	2014-01-08 10:48:48 -08:00
Nong Li	7001fb103e	Move Impala to CDH4.2 RC2	2014-01-08 10:47:50 -08:00
Nong Li	fbfef4e22e	Fix crash in TopN node with null tuples.	2014-01-08 10:46:54 -08:00
Lenni Kuff	b3fce13b1d	Initial Impala failure testing library + modularize run-workload This adds initial changes for the Impala failure testing library. It also refactors run workload into its own module to it can be used in other tests. The failure testing has two main components - the first is an object model on top on top of Impala services in a cluster. This allows for enumerating the serivces in the cluster and executing commands on remote machines. This initial cut is built on top of the CM service to help with starting/stopping services. The long term goal is to let this run on both a CM cluster and non-CM cluster as well as locally. The other part of the failure injection change is failure_inctor module that uses the Impala service abstraction to select and inject failures into random impala services. This failure testing framework hasn't been completely validated because the product code is not yet ready, but it is important to get this checked in so all new changes to run-workload are based off this refactor. Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5	2014-01-08 10:46:16 -08:00
Lenni Kuff	231b66f37f	A few small fixes Queries now return rows on both our small (query test) data set as well as the 10TB data set. This change also fixes a problem with python not being set properly and adds support for reporting query results using the geometric mean Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886	2014-01-08 10:46:15 -08:00
Nong Li	c5edb8e3d4	Add version file to gitignore.	2014-01-08 10:46:01 -08:00
Henry Robinson	6bf2b3c74e	Add tarball build-step for shell, also shell version number	2014-01-08 10:45:07 -08:00
Henry Robinson	9ca5c88258	.gitignore for shell/gen-py/	2014-01-08 10:44:38 -08:00
Lenni Kuff	04edc8f534	Update benchmark tests to run against generic workload, data loading with scale factor, +more This change updates the run-benchmark script to enable it to target one or more workloads. Now benchmarks can be run like: ./run-benchmark --workloads=hive-benchmark,tpch We lookup the workload in the workloads directory, then read the associated query .test files and start executing them. To ensure the queries are not duplicated between benchmark and query tests, I moved all existing queries (under fe/src/test/resources/* to the workloads directory. You do NOT need to look through all the .test files, I've just moved them. The one new file is the 'hive-benchmark.test' which contains the hive benchmark queries. Also added support for generating schema for different scale factors as well as executing against these scale factors. For example, let's say we have a dataset with a scale factor called "SF1". We would first generate the schema using: ./generate_schema_statements --workload=<workload> --scale_factor="SF3" This will create tables with a unique names from the other scale factors. Run the generated .sql file to load the data. Alternatively, the data can loaded by running a new python script: ./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor] For example: load-data.sh -w tpch -e core -s SF3 Then run against this: ./run-benchmark --workloads=<workload> --scale_factor=SF3 This changeset also includes a few other minor tweaks to some of the test scripts. Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6	2014-01-08 10:44:22 -08:00
Lenni Kuff	e293164b37	Added TPCH functional query tests and schema generation This adds most of the Hive TPCH queries into the functional Impala tests. This code review doesn't actually include the TPCH data. The data set is relatively large. Instead I updated scripts to copy the data from a data host. This change has a few parts: 1) Update the benchmark schema generation/test vector generation to be more generic. This way we can use the same schema creation/data loading steps for TPCH as we do for benchmark tests. 2) Add in schema template for the TPCH workload along with test vectors and dimensions which are used for schema generation. 3) Add in a new test file for each TPC-H query. The Hive TPCH work broke down the queries to generate some "temp" tables, then execute using joins/selects from these temp tables. Since creating the temp tables does some real work it is good to execute these via Impala. Each test a) Runs all the Insert statements to generate the temp tables b) runs the additional TPCH queries 4) Updated all the TPCH insert statements and queries to be parameterized on $TABLE name. This way we can run the tests across all combinations of file format/compression/etc. 5) Updated data loading Change-Id: I6891acc4c7464eaf1dc7dbbb532ddbeb6c259bab	2014-01-08 10:44:06 -08:00
Lenni Kuff	0da77037e3	Updated Impala performance schema and test vector generation This change updates the Impala performance schema and test vector generation techniques. It also migrates the existing benchmark scripts that were Ruby over to use Python. The changes has a few parts: 1) Conversion of test vector generation and benchmark statement generation from Ruby to Python. A result of this was also to update the benchmark test vector and dimension files to be written in CSV format (python doesn't have built-in YAML support) 2) Standardize on the naming for benchmark tables to (somewhat match Query tests). In general the form is: * If file_format=text and compression=none, do not use a table suffix * Abbreviate sequence file as (seq) rc file as (rc) etc * If using BLOCK compression don't append anything to table name, if using 'record' append 'record' 3) Created a new way to adding new schemas. this is the benchmark_schema_template.sql file. The generate_benchmark_statements.py script reads this in and breaks up the sections. The section format is: ==== Data Set Name --- BASE table name --- CREATE STATEMENT Template --- INSERT ... SELECT * format --- LOAD Base statement --- LOAD STATEMENT Format Where BASE Table is a table the other file formats/compression types can be generated from. This would generally be a local file. The thinking is that if the files already exist in HDFS then we can just load the file directly rather than issue an INSERT ... SELECT * statement. The generate_benchmark_statements.py script has been updated to use this new template as well as query HDFS for each table to determine how it should be created. It then outputs an ideal file call load-benchmark-*-generated.sql. Since this file is geneated dynamically we can remove the old benchmark statement files. 4) This has been hooked into load-benchmark-data.sh and run_query has been updated to use the new format as well	2012-07-12 23:12:20 -07:00
Lenni Kuff	462465164d	Updated .gitignore to ignore benchmark result files I accidently missed this file in the last checkin.	2012-06-29 10:15:50 -07:00
Nong Li	f9efe06649	Move IR cross compile output to a better folder for packaging.	2012-06-01 13:14:18 -07:00
Michael Ubell	7b14187bf1	Install snappy library add create-load-data.sh	2012-05-02 07:31:10 -07:00
Nong Li	88237350f0	Change the build to allow debug and release builds to coexist.	2012-02-17 18:14:04 -08:00
Nong Li	783480d6bf	- Cleaned up some TODOs. - Fix tuple template. Fixed strcmp - atoi/atof handle overflows. - added likely/unlikely compiler directive - Runquery now reports mean/stddev for profile runs - removed quoted char	2012-01-18 23:08:29 -08:00
Nong Li	c84fec38d3	- Move thrift out of FE src and into impala/common - Thrift files now build using cmake instead of mvn - Added cmake build to impala/ which drives the build process	2011-12-30 19:35:20 -08:00

1 2

54 Commits