54 Commits

Author SHA1 Message Date
Xiang Yang
8af0ce8ed6 IMPALA-13001: Support graceful and force shutdown for impala.sh
This patch add graceful and force shutdown support for impala.sh.

This patch also keep the stdout and stderr log when startup.

This patch also fix some bugs in the impala.sh, including:
 - empty service name check.
 - restart command cannot work.

Testing:
 - Manually deploy package on Ubuntu22.04 and verify it.

Change-Id: Ib7743234952ba6b12694ecc68a920d59fea0d4ba
Reviewed-on: http://gerrit.cloudera.org:8080/21297
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-15 12:03:18 +00:00
Xiang Yang
050805d21b IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files.
Independent linux packaging related content to package/CMakeLists.txt
to make it more clearly.

This patch also add LICENSE and NOTICE file in the final package.

Testing:
 - Manually deploy package on Ubuntu22.04 and verify it.

Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b
Reviewed-on: http://gerrit.cloudera.org:8080/20263
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-04-12 14:48:00 +00:00
Michael Smith
e6ed98c22b IMPALA-11201: update gitignore files
Updates gitignore for files generated during bootstrap_development.
Fixes deleting tracked files in be/src/thirdparty. Includes ignore rules
for past versions of shell dependencies and updates ignores for current
versions.

Change-Id: I03deba5e7fb151ef8e34039becdcc3fb47684084
Reviewed-on: http://gerrit.cloudera.org:8080/18499
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-05-10 03:06:59 +00:00
Zoltan Borok-Nagy
dbc69e9cae Update .gitignore with VSCode artifacts
Added .vscode/ directory to .gitignore.

Change-Id: Ifc4083787b132f6455023c9b2f52a82a1b8626a7
Reviewed-on: http://gerrit.cloudera.org:8080/15061
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-17 20:56:57 +00:00
Tim Armstrong
85c9895c11 Update gitignore files
This adds in a handful of files that I had on my local machine

Change-Id: I357441fab00ac031fbc70c40e4574e7a723fdedd
Reviewed-on: http://gerrit.cloudera.org:8080/14858
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-12-06 04:38:09 +00:00
Tim Armstrong
f689daef7f IMPALA-8622,IMPALA-8696: fix docker dependencies, add image list
Adds a plain-text space-separated image list in
docker/docker-images.txt. This is generated based on the images built by
CMake, so is kept in sync with images added to or removed from the
CMake file.

Duplicated logic per image is removed - instead there is a helper
function that is called for each daemon image to be built.

Rips out the timestamp mechanism that was intended to avoid unnecessary
container rebuilds, but has turned out to be brittle. Instead the
containers are rebuilt each time the rule is invoked.

This moves some subdirectories so that the image tag matches the
subdirectory, to simplify the build scripts.

Change-Id: I4d8e215e9b07c6491faa4751969a30f0ed373fe3
Reviewed-on: http://gerrit.cloudera.org:8080/13899
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Lars Volker <lv@cloudera.com>
2019-07-23 23:57:43 +00:00
Fang-Yu Rao
931a8f0ba7 IMPALA-4865: Reject Expr Rewrite When Appropriate
Avoided rewrite if the resulting string literal exceeds a defined limit.

Testing:
Added three statements in testFoldConstantsRule() to verify that the
expression rewrite is accepted only when the size of the rewritten
expression is below a specified threshold.

Change-Id: I8b078113ccc1aa49b0cea0c86dff2e02e1dd0e23
Reviewed-on: http://gerrit.cloudera.org:8080/12814
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-04-28 17:26:23 +00:00
Fredy Wijaya
5fa076e95c IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201
This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also
refactors the bootstrap_toolchain.py to be more generic for dealing with
CDP components, e.g. Ranger and Hive 3.

The patch also fixes some TODOs to replace the rangerPlugin.init() hack
with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger
build.

Testing:
- Ran core tests
- Manually verified that no regression when starting Hive 3 with
  USE_CDP_HIVE=true

Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada
Reviewed-on: http://gerrit.cloudera.org:8080/13002
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-17 05:30:33 +00:00
Paul Rogers
29df1d55f6 Additions to .gitignore
Adds entries for Eclipse-created files and for a couple of temporary
files commonly created during front-end debugging.

Change-Id: Ia8ea436f5e108cc08389f43d639a4cb7315271c1
Reviewed-on: http://gerrit.cloudera.org:8080/12430
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-11 10:57:20 +00:00
Tim Armstrong
ea826ca0d9 IMPALA-7948: part 1: initial docker container build
This builds an impala_base container that has all of the build artifacts
required to run the impala processes, then builds impalad, catalogd and
statestore containers based on that with the right ports exposed.
The images are based on the Ubuntu 16.04 image to align with the
most common development environment.

The container build process is integrated with CMake and is designed
to integrate with the rest of the build so that the container build
depends on the artifacts that will go into the container. You can
build the images with the following command, which will create
images called "impala_base", "impalad", "catalogd" and
"statestored":

  ninja -j $IMPALA_BUILD_THREADS docker_images

The images need some refinement to be truly useful.  The following
will be done in future patches:
* IMPALA-7947 - integrate with start-impala-cluster.py to
  automatically create docker network with containers running on it
* Mechanism to pass in command-line flags
* Mechanisms to update the various config files to point to the
  docker host rather than "localhost", which doesn't point to
  the right thing inside the container.
* Mechanisms to set mem_limit, JVM heap sizes, etc, automatically.

Testing:
Manually started up the containers connected to a user-defined bridge
network, tweaked the configurations to point to the HMS/HDFS/etc
running on my host. I then used "docker ps" to figure out the
port mappings for beeswax and debug webserver.

Confirmed that I could run a query and access debug pages:

  $ impala-shell.sh -i localhost:32860 -q "select coordinator()"
  Starting Impala Shell without Kerberos authentication
  Opened TCP connection to localhost:32860
  Connected to localhost:32860
  Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build
  d7870fe03645490f95bd5ffd4a2177f90eb2f3c0)
  Query: select coordinator()
  Query submitted at: 2018-12-11 15:51:04 (Coordinator:
  http://8063e77ce999:25000)
  Query progress can be monitored at:
  http://8063e77ce999:25000/query_plan?query_id=1b4d03f0f0f1fcfb:b0b37e5000000000
  +---------------+
  | coordinator() |
  +---------------+
  | 8063e77ce999  |
  +---------------+
  Fetched 1 row(s) in 0.11s

Change-Id: Ifea707aa3cc23e4facda8ac374160c6de23ffc4e
Reviewed-on: http://gerrit.cloudera.org:8080/12074
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-12-18 04:45:32 +00:00
Tim Armstrong
aa654d4b87 Update .gitignore
A few unversioned artifacts crept in over time without corresponding
.gitignore entries. These are the updates based on the git status output
on my dev env.

Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a
Reviewed-on: http://gerrit.cloudera.org:8080/11787
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-26 22:19:35 +00:00
Fredy Wijaya
d5ada970c6 IMPALA-7381: Prevent build failure after switching to new CDH_BUILD_NUMBER
Switching to a new CDH_BUILD_NUMBER requires downloading new CDH
components as well as forcing Maven to update its local repository.
This patch updates the CDH_COMPONENTS_HOME to include the
CDH_BUILD_NUMBER which will automatically download the new CDH
components after switching to a new CDH_BUILD_NUMBER. When running
a build if it detects that a new CDH_BUILD_NUMBER has changed, the
build will force an update to the local Maven repository. This helps
to prevent build failure even on a fresh Git clone due to stale local
Maven repository.

Testing:
- Manually tested by running buildall.sh with different CDH_BUILD_NUMBER

Change-Id: Ib0ad9c2258663d3bd7470e6df921041d1ca0c0be
Reviewed-on: http://gerrit.cloudera.org:8080/11099
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-03 08:23:57 +00:00
Tim Armstrong
dc1282fbc9 IMPALA-6241: timeout in admission control test under ASAN
The fix for IMPALA-6241 is to increase the timeout for all slow builds.

While testing that fix, I discovered that the ASAN build detection logic
was failing silently, resulting in it assuming that it was testing a
DEBUG build. The error was:

  Unexpected DW_AT_name in first CU:
  /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc;
  choosing DEBUG

The fix for that issue is to remove the build type detection heuristic
and instead just write a file with the build type as part of the build process.

Testing:
Before this change I was able to reproduce locally every 5-10 test
iterations. After this change I haven't seen it reproduce.

Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536
Reviewed-on: http://gerrit.cloudera.org:8080/8652
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-29 03:28:22 +00:00
Vuk Ercegovac
8aaf9ec1fb Updates several .gitignore files.
Ran into these when compiling (generated files from kudu),
eclipse setup, and testing.

Change-Id: Ife446e40756864f2a19ae4393ac503d17d91996b
Reviewed-on: http://gerrit.cloudera.org:8080/7902
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-31 01:40:47 +00:00
Tim Armstrong
d5b0c6b936 Update .gitinore files
I noticed a bunch of new things had crept in.

Change-Id: Ie6ef085357a3bf026f2b42689ee642192a7791e7
Reviewed-on: http://gerrit.cloudera.org:8080/7590
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2017-08-04 22:44:59 +00:00
Tim Armstrong
6b90aa3a11 IMPALA-4653: fix sticky config variable problem
Previously we could get a developer's shell into a bad state where a
value of a config variable from a previous impala-config.sh version
would override the value from the new impala-config.sh version.

This change adds a new mechanism to override settings locally by adding
settings to impala-config-local.sh. This alternative approach is more
robust, because the config variables will be reset to the intended
values when impala-config.sh is re-sourced.

impala-config-branch.sh can also be used to override settings in a
version-controlled way, e.g. to support having different settings for
different branches.

I did not convert all variables to use this approach, since many people
and Jenkins jobs depend on setting these variables from the environment.
The remaining "sticky" variables are ones where default values should
not change frequently, e.g. source directory locations and build
settings.

Change-Id: I930e2ca825142428d17a6981c77534ab0c8e3489
Reviewed-on: http://gerrit.cloudera.org:8080/5545
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-05 01:43:36 +00:00
Lars Volker
08ef70bebe Remove vim plugin config file from .gitignore
Files in be/ get wiped out by clean.sh if they're listed in .gitignore.
It is easier to just configure this via a project-specific .vimrc file.

Change-Id: I262f7a1ec8daace84a29518ba826c7c3b20fb9e9
Reviewed-on: http://gerrit.cloudera.org:8080/4854
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-10-31 22:44:54 +00:00
Jim Apple
0eaff805e2 Add distcc infrastructure.
This has been working for several months, and it it was written mainly
by Casey Ching while he was at Cloudera working on Impala.

Change-Id: Ia4bc78ad46dda13e4533183195af632f46377cae
Reviewed-on: http://gerrit.cloudera.org:8080/4820
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-10-25 01:15:50 +00:00
Lars Volker
ef4c9958d0 IMPALA-4047: Remove occurrences of 'CDH'/'cdh' from repo
This change removes some of the occurrences of the strings 'CDH'/'cdh'
from the Impala repository. References to Cloudera-internal Jiras have
been replaced with upstream Jira issues on issues.cloudera.org.

For several categories of occurrences (e.g. pom.xml files,
DOWNLOAD_CDH_COMPONENTS) I also created a list of follow-up Jiras to
remove the occurrences left after this change.

Change-Id: Icb37e2ef0cd9fa0e581d359c5dd3db7812b7b2c8
Reviewed-on: http://gerrit.cloudera.org:8080/4187
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-13 00:40:41 +00:00
Lars Volker
e659337871 Add vim-specific files to .gitignore
Change-Id: I1abcd8ca0e18178684c916ef6f7d55c25c0814a4
Reviewed-on: http://gerrit.cloudera.org:8080/4562
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-10-04 18:49:50 +00:00
Jim Apple
57fcbf7a28 IMPALA-4171: Remove JAR from repo.
By ASF rules, we can't have JARs in releases. The releases are just
tarballs of the repo.

This patch removes from the repo the single JAR there, which was a
version of a JAR that is built during data load, with one string
changed. The JAR is used only for testing.

Instead of building that jar with the different string and saving the
result in git, daa loading will now build the jar twice, with one Java
source file slightly changed.

Change-Id: Icee7b8c32b08e064dea4a14624acff6021ef5ce1
Reviewed-on: http://gerrit.cloudera.org:8080/4499
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-22 02:00:50 +00:00
Tim Armstrong
904265ccb5 Update .gitignore files for ninja, coredumps and pypi packages
Change-Id: Ie7d34fbd27150ba6c437207611f71bb95a0e4cba
Reviewed-on: http://gerrit.cloudera.org:8080/3814
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-07-29 21:42:07 +00:00
Casey Ching
07bdb6d484 Add .impala_compiler_opts to .gitignore
Change-Id: I164a077a91fcbe2cd445637ce958e91082bd56e0
Reviewed-on: http://gerrit.cloudera.org:8080/3012
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
2016-05-12 14:17:58 -07:00
Alex Behm
7e76e92bef Consolidate test and cluster logs under a single directory.
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.

The new structure is as follows:

$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala

$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading

$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests

$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests

$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests

$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests

I tested this change with a full data load which
was successful.

Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-03-28 19:23:22 +00:00
Martin Grund
1720409545 Automatically enable toolchain for development
This patch adds logic to automatically download the pre-built toolchain
packages to the local developer machine using the bootstrap_toolchain.py
script in case there are not present. There is no manual user
intervention necessary to initiate the download process.

If desired the script can always be called to re-download the
dependencies from a correctly sourced Impala environment.

Change-Id: I636160efeadfac4b5c1feb478da5ae5da0c9fd00
Reviewed-on: http://gerrit.cloudera.org:8080/1429
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-12-22 22:22:59 +00:00
Matthew Jacobs
fe87bb1563 Add MetricDefs, static definitions of metric metadata generated from json
Adds a static definition of the metric metadata used by Impala. The
metric names, descriptions, and other properties are defined in
common/thrift/metrics.json file, and the generate_metrics.py script
creates a thrift representation. The metric definitions are then
available in a constant map which is used at runtime to instantiate
metrics, looking them up in the map by the metric key.

New metrics should be defined by adding an entry to the list of metrics
in metrics.json with the following properties:

key:         The unique string identifying the metric. If the metric can
             be templated, e.g. rpc call duration, it may be a format
             string (in the format used by strings::Substitute()).
description: A text description of the metric. May also be a format
             string.
label:       A brief title for the metric, not currently used by
             Impala but provided for external tools.
units:       The unit of the metric. Must be a valid value of TUnit.
kind:        The kind of metric, e.g. GAUGE or COUNTER. Must be a valid
             value of TMetricKind.
contexts:    The context in which this metric may be instantiated.
             Usually "IMPALAD", "STATESTORED", "CATALOGD", but may be
             a different kind of 'entity'. Not currently used by
             Impala but provided for modeling purposes for external
             tools.

For example, adding the counter for the total number of queries run over
the lifetime of the impalad process might look like:

  {
    "key": "impala-server.num-queries",
    "description": "The total number of queries processed.",
    "label": "Queries",
    "units": "UNIT",
    "kind": "COUNTER",
    "contexts": [
      "IMPALAD"
    ]
  }

TODO: Incorporate 'label' into the metrics debug page.
TODO: Verify the context at runtime, e.g. verify 'contexts' contains,
      e.g. a DCHECK.

After the metric definition is added, the generate_metrics.py script
will generate the TMetricDefs.thrift that contains a TMetricDef for
the metric definition. At runtime, the metric can be instantiated
using the key defined in metrics.json. Gauges, Counters, and
Properties are instantiated using static methods on MetricGroup. Other
metric types are instantiated using static CreateAndRegister methods
on their associated classes.

TODO: Generate a thrift enum used to lookup metric defs.
TODO: Consolidate the instantiation of metrics that are created
      outside of metrics.h (i.e. collection metrics, memory metrics).
TODO: Need a better way to verify if metric definitions are missing.

Change-Id: Iba7f94144d0c34f273c502ce6b9a2130ea8fedaa
Reviewed-on: http://gerrit.cloudera.org:8080/330
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2015-05-14 21:27:28 +00:00
Matthew Jacobs
9ea43823e5 Add compile_commands.json to .gitignore
Change-Id: Ice093f541798da8fa4aa480b5059c85df2eb84ef
Reviewed-on: http://gerrit.cloudera.org:8080/298
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-03-28 00:22:15 +00:00
Martin Grund
b582cdc22b IMPALA-1598: Adding Error Codes to Log Messages
This patch introduces the concept of error codes for errors that are
recorded in Impala and are going to be presented to the client. These
error codes are used to aggregate and group incoming error / warning
messages to reduce the spill on the shell and increase the usefulness of
the messages. By splitting the message string from the implementation,
it becomes possible to edit the string independently of the code and
pave the way for internationalization.

Error messages are defined as a combination of an enum value and a
string. Both are defined in the Error.thrift file that is automatically
generated using the script in common/thrift/generate_error_codes.py. The
goal of the script is to have a central understandable repository of
error messages. Adding new messages to this file will require rebuilding
the thrift part. The proxy class ErrorMessage is responsible to
represent an error and capture the parameters that are used to format
the error message string.

When error messages are recorded they are recorded based on the
following algorithm:

- If an error message is of type GENERAL, do not aggregate this message
  and simply add it to the total number of messages
- If an error messages is of specific type, record the first error
  message as a sample and for all other occurrences increment the count.
- The coordinator will merge all error messages except the ones of type
  GENERAL and display a count.

For example, in the case of the parquet file spanning multiple blocks
the output will look like:

    Parquet files should not be split into multiple hdfs-blocks.
    file=hdfs://localhost:20500/fid.parq (1 of 321 similar)

All messages are always logged to VLOG. In the coordinator error
messages are merged across all backends to retain readability in the
case of large clusters.

The current version of this patch adds these new error codes to some of
the most important error messages as a reference implementation.

Change-Id: I1f1811631836d2dd6048035ad33f7194fb71d6b8
Reviewed-on: http://gerrit.cloudera.org:8080/39
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
2015-03-01 03:37:32 +00:00
Dan Hecht
7d504b847f Add thrift, jflex, yacc files to cscope file list.
I find it useful to have these indexed by cscope.  Also, gitignore
files generated with cscope -q (inverted index).

Change-Id: I8d5bcd34706c40357b94337db4b72dccecdecbd9
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3910
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Daniel Hecht <dhecht@cloudera.com>
(cherry picked from commit 66ecea5c0c9212377d314c5312fec07446960325)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3937
Reviewed-by: Daniel Hecht <dhecht@cloudera.com>
2014-08-19 18:00:55 -07:00
Nong Li
b0a7c4567f Add a few directories to .gitignore.
Change-Id: Ifd81c623c69629d58e7dca6aa63c3d7117f5999e
(cherry picked from commit 235d94c4edf039c6ef84f140a4c70ddd1639ba63)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1346
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: Nong Li <nong@cloudera.com>
2014-01-22 16:08:13 -08:00
Nong Li
8ada9b4383 Add cluster_logs/ to gitignore.
Change-Id: I2957f1939355455afbd01aaaf91074ffaf25be41
Reviewed-on: http://gerrit.ent.cloudera.com:8080/450
Tested-by: jenkins
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
2014-01-08 10:52:44 -08:00
Lenni Kuff
2f7198292a Add support for auxiliary workloads, tests, and datasets
This change adds support for auxiliary worksloads, tests, and datasets. This is useful
to augment the regular test runs with some additional tests that do not belong in the
main Impala repo.
2014-01-08 10:50:32 -08:00
Alex Behm
861ba05989 IMPALA-197: Outer join on constant expressions returns incorrect results. 2014-01-08 10:50:09 -08:00
Nong Li
0df9476be1 Parquet data loading. 2014-01-08 10:48:48 -08:00
Nong Li
7001fb103e Move Impala to CDH4.2 RC2 2014-01-08 10:47:50 -08:00
Nong Li
fbfef4e22e Fix crash in TopN node with null tuples. 2014-01-08 10:46:54 -08:00
Lenni Kuff
b3fce13b1d Initial Impala failure testing library + modularize run-workload
This adds initial changes for the Impala failure testing library. It also refactors
run workload into its own module to it can be used in other tests.

The failure testing has two main components - the first is an object model on top on top
of Impala services in a cluster. This allows for enumerating the serivces in the cluster
and executing commands on remote machines. This initial cut is built on top of the
CM service to help with starting/stopping services. The long term goal is to let this run
on both a CM cluster and non-CM cluster as well as locally.

The other part of the failure injection change is failure_inctor module that uses the
Impala service abstraction to select and inject failures into random impala services.

This failure testing framework hasn't been completely validated because the product code
is not yet ready, but it is important to get this checked in so all new changes to
run-workload are based off this refactor.

Change-Id: I73bf44f0ac881ec17bea7cb05d850b45e2ea5be5
2014-01-08 10:46:16 -08:00
Lenni Kuff
231b66f37f A few small fixes
Queries now return rows on both our small (query test) data set as well as the 10TB
data set. This change also fixes a problem with python not being set properly and
adds support for reporting query results using the geometric mean

Change-Id: Ia432148d96645ecda3f63900b3bfbd29c706d886
2014-01-08 10:46:15 -08:00
Nong Li
c5edb8e3d4 Add version file to gitignore. 2014-01-08 10:46:01 -08:00
Henry Robinson
6bf2b3c74e Add tarball build-step for shell, also shell version number 2014-01-08 10:45:07 -08:00
Henry Robinson
9ca5c88258 .gitignore for shell/gen-py/ 2014-01-08 10:44:38 -08:00
Lenni Kuff
04edc8f534 Update benchmark tests to run against generic workload, data loading with scale factor, +more
This change updates the run-benchmark script to enable it to target one or more
workloads. Now benchmarks can be run like:

./run-benchmark --workloads=hive-benchmark,tpch

We lookup the workload in the workloads directory, then read the associated
query .test files and start executing them.

To ensure the queries are not duplicated between benchmark and query tests, I
moved all existing queries (under fe/src/test/resources/* to the workloads
directory. You do NOT need to look through all the .test files, I've just moved
them. The one new file is the 'hive-benchmark.test' which contains the hive
benchmark queries.

Also added support for generating schema for different scale factors as well as
executing against these scale factors. For example, let's say we have a dataset
with a scale factor called "SF1". We would first generate the schema using:

./generate_schema_statements --workload=<workload> --scale_factor="SF3"
This will create tables with a unique names from the other scale factors.

Run the generated .sql file to load the data. Alternatively, the data can loaded
by running a new python script:
./bin/load-data.py -w <workload1>,<workload2> -e <exploration strategy> -s [scale factor]
For example: load-data.sh -w tpch -e core -s SF3

Then run against this:
./run-benchmark --workloads=<workload> --scale_factor=SF3

This changeset also includes a few other minor tweaks to some of the test
scripts.

Change-Id: Ife8a8d91567d75c9612be37bec96c1e7780f50d6
2014-01-08 10:44:22 -08:00
Lenni Kuff
e293164b37 Added TPCH functional query tests and schema generation
This adds most of the Hive TPCH queries into the functional Impala tests. This
code review doesn't actually include the TPCH data. The data set is relatively
large. Instead I updated scripts to copy the data from a data host.

This change has a few parts:
1) Update the benchmark schema generation/test vector generation to be more
generic. This way we can use the same schema creation/data loading steps for
TPCH as we do for benchmark tests.

2) Add in schema template for the TPCH workload along with test vectors and
dimensions which are used for schema generation.

3) Add in a new test file for each TPC-H query. The Hive TPCH work broke down
the queries to generate some "temp" tables, then execute using joins/selects
from these temp tables. Since creating the temp tables does some real work
it is good to execute these via Impala. Each test a) Runs all the Insert
statements to generate the temp tables b) runs the additional TPCH queries

4) Updated all the TPCH insert statements and queries to be parameterized on
$TABLE name. This way we can run the tests across all combinations of file
format/compression/etc.

5) Updated data loading

Change-Id: I6891acc4c7464eaf1dc7dbbb532ddbeb6c259bab
2014-01-08 10:44:06 -08:00
Lenni Kuff
0da77037e3 Updated Impala performance schema and test vector generation
This change updates the Impala performance schema and test vector generation
techniques. It also migrates the existing benchmark scripts that were Ruby over
to use Python. The changes has a few parts:

1) Conversion of test vector generation and benchmark statement generation from
Ruby to Python. A result of this was also to update the benchmark test vector
and dimension files to be written in CSV format (python doesn't have built-in
YAML support)

2) Standardize on the naming for benchmark tables to (somewhat match Query
tests). In general the form is:
* If file_format=text and compression=none, do not use a        table suffix
* Abbreviate sequence file as (seq) rc file as (rc) etc
* If using BLOCK compression don't append anything to table name, if using
 'record' append 'record'

3) Created a new way to adding new schemas. this is the
benchmark_schema_template.sql file. The generate_benchmark_statements.py script
reads this in and breaks up the sections. The section format is:
====
Data Set Name
---
BASE table name
---
CREATE STATEMENT Template
---
INSERT ... SELECT * format
---
LOAD Base statement
---
LOAD STATEMENT Format

Where BASE Table is a table the other file formats/compression types can be
generated from. This would generally be a local file.

The thinking is that if the files already exist in HDFS then we can just load
the file directly rather than issue an INSERT ... SELECT * statement. The
generate_benchmark_statements.py script has been updated to use this new
template as well as query HDFS for each table to determine how it should be
created. It then outputs an ideal file call load-benchmark-*-generated.sql.
Since this file is geneated dynamically we can remove the old benchmark
statement files.

4) This has been hooked into load-benchmark-data.sh and run_query has been
updated to use the new format as well
2012-07-12 23:12:20 -07:00
Lenni Kuff
462465164d Updated .gitignore to ignore benchmark result files
I accidently missed this file in the last checkin.
2012-06-29 10:15:50 -07:00
Nong Li
f9efe06649 Move IR cross compile output to a better folder for packaging. 2012-06-01 13:14:18 -07:00
Michael Ubell
7b14187bf1 Install snappy library
add create-load-data.sh
2012-05-02 07:31:10 -07:00
Nong Li
88237350f0 Change the build to allow debug and release builds to coexist. 2012-02-17 18:14:04 -08:00
Nong Li
783480d6bf - Cleaned up some TODOs.
- Fix tuple template.  Fixed strcmp
- atoi/atof handle overflows.
- added likely/unlikely compiler directive
- Runquery now reports mean/stddev for profile runs
- removed quoted char
2012-01-18 23:08:29 -08:00
Nong Li
c84fec38d3 - Move thrift out of FE src and into impala/common
- Thrift files now build using cmake instead of mvn
- Added cmake build to impala/ which drives the build process
2011-12-30 19:35:20 -08:00